Rise of AI crawlers and bots causing web traffic havoc

vendredi 22 août 2025, 04:50 , par ComputerWorld

A new report from edge cloud platform provider Fastly reveals what it called “a striking shift in the nature of automated web traffic” with a recent analysis of traffic indicating that AI crawlers make up close to 80% of the AI bot traffic observed. Meta generated more than half, eclipsing both Google and OpenAI combined.

These results are derived from traffic analyzed between April 16 and July 15 across two of the company’s offerings, Next Gen WAF and Bot Management, and illustrate, the company said, how AI-driven automation is reshaping online traffic.

For the purpose of the report, its authors categorized AI bots into two types: Crawlers and Fetchers. Crawler bots, they wrote, “ operate similarly to search engine crawlers — they systematically scan websites to collect content for building searchable indexes or training language models. This process is a precondition to the model’s ‘training’ phase.”

Fetcher bots, on the other hand, they said, “access website content in response to user actions. For example, when a user requests up-to-date information on a specific topic, a fetcher bot retrieves the relevant page in real time. They are also used to help surface website links that match a user’s search query, directing them to the most relevant content. Crawler bots contribute nearly 80% of the total AI bot request volume, with fetcher bots making up the remaining 20%.”

‘Massive surge in AI bot traffic’

Real-time fetching by AI bots, the report states, is a greater challenge than peak crawler rate. Analysis revealed that in one case, a single crawler reached a peak of around 1,000 requests per minute.

[Related: Will IT turn the AI bot battle into a money maker? (And is that even a good idea?) ]

Real-time fetching, on the other hand, is significantly more aggressive: in one instance, a fetcher bot made 39,000 requests per minute to a single website at peak load. “This traffic volume can overwhelm origin servers, strain server resources, consumer bandwidth, and cause expensive DDoS-like effects even without malicious intent,” the report noted.

Other key findings from the report revealed:

Meta’s AI bots alone generate 52% of AI crawler traffic, which is more than double that of Google (23%) or OpenAI (20%).

Sectors facing the highest levels of scraping for training AI models include high-tech, commerce, and media & entertainment.

ChatGPT generates the most real-time traffic to websites, with 98% of fetcher bots requests attributable to OpenAI’s bots.

Asked what prompted the report, Matthew Mathur, senior security researcher at Fastly, said on Thursday, “we’ve seen a massive surge in AI bot traffic, raising a lot of concern across the industry. With our visibility into internet traffic, we’re in a crucial position to show how AI bots are affecting the internet — from excessive load on web infrastructure, unauthorized use of website content, to distorting analytics.”

In response to the findings, Reddy Doddipalli, technical counselor and product lead at Info-Tech Research Group, said, “with the rise of genAI models and embedded systems, the demand for large-scale data aggregation has led to a surge in crawler activity, with bots sifting through billions of web pages to feed ML algorithms.”

While there are many advantages of AI crawlers, he said, the bottom line for organizations, website owners, and users is to understand the negative implications, such as data privacy, security, ethics, intellectual property, infrastructure [and] bandwidth consumption.

Doddipalli says he recommends developing a framework and best practices to manage and mitigate crawler activity. “For example, many of these crawlers now mimic human behavior, bypassing traditional defenses and controls and requiring innovative detection techniques,” he noted, adding that thought is required to ensure AI bot behavior remains a constructive force, not a destructive element.

The findings from Fastly, said David Shipley, head of Canadian firm Beauceron Security, “[are indicative of] how the costs of AI are going to impact everyone. As web hosting providers face higher traffic from these bots, [it is] inevitable those costs are going to be passed on to website owners.”

This is happening at a time when human to website traffic is cratering, negatively impacting sites that depend on ad revenue and adding another pressure to those businesses, he said.

Website owners also face a tough choice, said Shipley, adding that on one hand, “having their blogs and information in the corpus of information used by these tools is valuable when people use tools like ChatGPT in information gathering or shopping decisions. There’s also the future potential for people to be able to use AI agents to interact with the websites and complete orders.”

Matter of time before a DDoS attack

On the other hand, he observed, the relationship “can also be parasitic, with AI racking up costs to improve its training data set and, based on data so far, not acting as a funnel like search engines used to be in sending traffic to websites for conversion for e-commerce.”

According to Shipley, “an additional risk is the danger of disintermediation — particularly if the AI hallucinates and gives people the wrong information about your website [and] business products Or worse, people intentionally poison the AI models to harm your business (and that will happen sooner or later).”

From a security perspective, it’s also likely only a matter of time before someone figures out how to include this massive AI traffic in DDoS attacks, he said.

Bot traffic, said Mathur, especially AI bot traffic, “will continue to grow, so having controls in place to mitigate this is essential. Utilizing directives in robots.txt, implementing technical controls (for example, rate limiting), or investing in a full bot management solution will help protect organizations. Having a strategy is also essential, so that when bot traffic spikes, organizations are already prepared.”

The Fastly report calls for smarter bot management strategies by organizations, but when asked whether any of them already have sound strategies in place, he said, “high profile ones seeing a revenue decline resulting from LLM’s use of their data, or public web resources experiencing performance problems from crawling, have been vocal about this problem and are working to address it.”

However, said Mathur, “there is likely a larger population of smaller sites and developers that don’t have sufficient visibility into bot traffic and don’t fully realize the costs of AI crawling, let alone have proactive strategies in place.”

Lire la suite sur ComputerWorld