The emerging AI agents war
Amazon kicks it off
Mark Palko sent me news that Amazon obtained an injunction against Perplexity's shopping bot (link).
Perplexity is best known as a pioneer of AI-assisted web search, a product that I'd confidently say will find its market. It will succeed not because it delivers better search results, but because it offers a far more natural, far simpler user experience.
The recent news concerns something else – Perplexity's shopping bot that browses around and shop for things on behalf of users. This shopping bot is an example of an "AI agent," a term you must have heard of if you follow any tech news.
First, here's an incidental demonstration of my point about AI search.
Doing researching for this post, I typed keywords like "Perplexity shopping with Comet" in a traditional search engine, yielding pages upon pages of recent pieces about Amazon's lawsuit, despite my deliberate omission of the word "Amazon" or anything legal. Using an AI chatbot, with a prompt like "I want to find links to articles that introduce the Comet browser shopping feature offered by Perplexity starting last year. I don't want recent links about Amazon's lawsuit", I got exactly what I wanted. Here's a link to an article about the Comet shopping feature. (It's an "ad" by a company in the AI agent space, which is a different issue altogether.)
Based on currently available products, an AI agent is an automated workflow. In the article linked above, Perplexity Comet's edge is said to be:
Instead of waiting for your next search query, it actively completes tasks, negotiates purchases, and automates shopping workflows that previously required dozens of manual steps.
In the course of online shopping, one might start with an idea of what to buy. Then, one might find articles written about the "best" items in that category, noting pros and cons, and prices. One might then shortlist some options, and pick one. Then, one might select a retailer that sells the selected item, figure out its shipping and return policies, and if satisfactory, complete the transaction.
Perplexity's Comet browser does all these tasks:
The AI agent can auto-fill forms, conduct multi-site research, aggregate reviews, compare pricing, and — critically for commerce — initiate and complete purchase transactions.
It's time to introduce the naughty word: "scraping". This is the crux of Amazon's grievance.
In order for Comet (or any other AI agent) to fulfill those tasks, it must navigate around websites, extract data from webpages, analyze the data, and make decisions. Extracting data from webpages is the well-known activity known as "web scraping".
Web scraping is a strange beast. It has no reason to exist and yet, it's everywhere. When the data science field was created some 15 years ago, a common starting point of a textbook teaching Python is web scraping. Open up some webpage, and grab the data on the page.
Imagine you're the owner of a small on-line seller of widgets. An engineer at a competitor writes a web scraper to compile a database of the products you sell, and the prices. This scraper browses your website page by page, extracting the product and pricing information.
As the owner, you either consider your product pricing catalog confidential or public information.
Most retailers treat it as trade secret – if you take a notepad and start jotting down every product and price in a Walmart or Target, you'd most likely be stopped. Ditto, most online retailers deploy technology to detect and block web scrapers, typically by refusing to serve them webpages (403 errors). These retailers act as if the information presented publicly is protected. This stance has led to an arms race, as developers work around the anti-scraping tech. Regardless of one's view on this, we can agree that if the retailers treat their product catalogs as trade secrets, anyone trying to scrape the data is acting against the retailers' wishes, and may face legal jeopardy. (I never understood why college professors taught web scraping as the first example of a python script.)
Alternatively, some retailers might view their product pricing data as public information, so that they are okay with third-party access. In this world, web scraping bears no legal risk, but it is a poor technical solution nonetheless. The proper approach is to create APIs so that developers can register themselves and request the data they want in an open, orderly fashion.
All retailers have databases that hold their product and pricing data. Their websites grab data from these databases, and present them in nice formats to customers. Web scraping code grabs the data, together with the layers of formatting, spread out across hundreds and thousands of pages, and then removes the packaging, and merges the page-level data, restoring the structure of data. If successful, the output of the web scraper is similar to what the retailers hold in their databases! In practice, it's an inexact copy of the retailers' databases, riddled with errors. If these retailers consent to sharing the data, there are better ways to organize the data exchange.
Whether the retailers condone or condemn web scraping, there is still no reason to use it.
The emergence of AI agents brings this touchy subject to the forefront. The only way shopping bots can function is if they are allowed to browse around websites, collecting data. If Amazon's lawsuit succeeds, it kills not only Perplexity's bot, but also all others.