Navigation
Recherche
|
Project Analyzing Human Language Usage Shuts Down Because 'Generative AI Has Polluted the Data'
samedi 21 septembre 2024, 01:01 , par Slashdot
'Generative AI has polluted the data,' she wrote. 'I don't think anyone has reliable information about post-2021 language usage by humans.' She said that open web scraping was an important part of the project's data sources and 'now the web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies.' While there has always been spam on the internet and in the datasets that Wordfreq used, 'it was manageable and often identifiable. Large language models generate text that masquerades as real language with intention behind it, even though there is none, and their output crops up everywhere,' she wrote. Read more of this story at Slashdot.
https://tech.slashdot.org/story/24/09/20/1745236/project-analyzing-human-language-usage-shuts-down-b...
Voir aussi |
56 sources (32 en français)
Date Actuelle
sam. 21 sept. - 05:21 CEST
|