MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
web
Recherche

The Ever-Expanding Job of Preserving the Internet's Backpages

mardi 4 octobre 2022, 21:20 , par Slashdot
A quarter of a century after it began collecting web pages, the Internet Archive is adapting to new challenges. From a report: Within the walls of a beautiful former church in San Francisco's Richmond district, racks of computer servers hum and blink with activity. They contain the internet. Well, a very large amount of it. The Internet Archive, a non-profit, has been collecting web pages since 1996 for its famed and beloved Wayback Machine. In 1997, the collection amounted to 2 terabytes of data. Colossal back then, you could fit it on a $50 thumb drive now.

Today, the archive's founder Brewster Kahle tells me, the project is on the brink of surpassing 100 petabytes -- approximately 50,000 times larger than in 1997. It contains more than 700bn web pages. The work isn't getting any easier. Websites today are highly dynamic, changing with every refresh. Walled gardens like Facebook are a source of great frustration to Kahle, who worries that much of the political activity that has taken place on the platform could be lost to history if not properly captured. In the name of privacy and security, Facebook (and others) make scraping difficult.

Read more of this story at Slashdot.
https://tech.slashdot.org/story/22/10/04/180229/the-ever-expanding-job-of-preserving-the-internets-b...
News copyright owned by their original publishers | Copyright © 2004 - 2024 Zicos / 440Network
Date Actuelle
ven. 29 mars - 09:23 CET