|
Navigation
Recherche
|
While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data
samedi 15 novembre 2025, 22:22 , par Slashdot
Early in March 2025, I noticed that a web crawler with a user agent string of meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) was hitting my blog's machine at an unreasonable rate. I followed the URL and discovered this is what Meta uses to gather premium, human-generated content to train its LLMs. I found the rate of requests to be annoying. I already have a PHP program that creates the illusion of an infinite website. I decided to answer any HTTP request that had 'meta-externalagent' in its user agent string with the contents of a bork.php generated file... This worked brilliantly. Meta ramped up to requesting 270,000 URLs on May 30 and 31, 2025... After about 3 months, I got scared that Meta's insatiable consumption of Super Great Pages about condiments, underwear and circa 2010 C-List celebs would start costing me money. So I switched to giving 'meta-externalagent' a 404 status code. I decided to see how long it would take one of the highest valued companies in the world to decide to go away. The answer is 5 months. Read more of this story at Slashdot.
https://tech.slashdot.org/story/25/11/15/2023242/while-meta-crawls-the-web-for-ai-training-data-bruc...
Voir aussi |
56 sources (32 en français)
Date Actuelle
dim. 16 nov. - 01:11 CET
|








