MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
data
Recherche

Archivists Work To Identify and Save the Thousands of Datasets Disappearing From Data.gov

vendredi 31 janvier 2025, 00:45 , par Slashdot
Archivists Work To Identify and Save the Thousands of Datasets Disappearing From Data.gov
An anonymous reader quotes a report from 404 Media: Datasets aggregated on data.gov, the largest repository of U.S. government open data on the internet, are being deleted, according to the website's own information. Since Donald Trump was inaugurated as president, more than 2,000 datasets have disappeared from the database. As people in the Data Hoarding and archiving communities have pointed out, on January 21, there were 307,854 datasets on data.gov. As of Thursday, there are 305,564 datasets. Many of the deletions happened immediately after Trump was inaugurated, according to snapshots of the website saved on the Internet Archive's Wayback Machine. Harvard University researcher Jack Cushman has been taking snapshots of Data.gov's datasets both before and after the inauguration, and has worked to create a full archive of the data.

'Some of [the entries link to] actual data,' Cushman told 404 Media. 'And some of them link to a landing page [where the data is hosted]. And the question is -- when things are disappearing, is it the data it points to that is gone? Or is it just the index to it that's gone?' For example, 'National Coral Reef Monitoring Program: Water Temperature Data from Subsurface Temperature Recorders (STRs) deployed at coral reef sites in the Hawaiian Archipelago from 2005 to 2019,' a NOAA dataset, can no longer be found on data.gov but can be found on one of NOAA's websites by Googling the title. 'Stetson Flower Garden Banks Benthic_Covage Monitoring 1993-2018 -- OBIS Event,' another NOAA dataset, can no longer be found on data.gov and also appears to have been deleted from the internet. 'Three Dimensional Thermal Model of Newberry Volcano, Oregon,' a Department of Energy resource, is no longer available via the Department of Energy but can be found backed up on third-party websites.

Data.gov serves as an aggregator of datasets and research across the entire government, meaning it isn't a single database. This makes it slightly harder to archive than any individual database, according to Mark Phillips, a University of Northern Texas researcher who works on the End of Term Web Archive, a project that archives as much as possible from government websites before a new administration takes over. 'Some of this falls into the 'We don't know what we don't know,'' Phillips told 404 Media. 'It is very challenging to know exactly what, where, how often it changes, and what is new, gone, or going to move. Saving content from an aggregator like data.gov is a bit more challenging for the End of Term work because often the data is only identified and registered as a metadata record with data.gov but the actual data could live on another website, a state.gov, a university website, cloud provider like Amazon or Microsoft or any other location. This makes the crawling even more difficult.'

Phillips said that, for this round of archiving (which the team does every administration change), the project has been crawling government websites since January 2024, and that they have been doing 'large-scale crawls with help from our partners at the Internet Archive, Common Crawl, and the University of North Texas. We've worked to collect 100s of terabytes of web content, which includes datasets from domains like data.gov.' It is absolutely true that the Trump administration is deleting government data and research and is making it harder to access. But determining what is gone, where it went, whether it's been preserved somewhere, and why it was taken down is a process that is time intensive and going to take a while. 'One thing that is clear to me about datasets coming down from data.gov is that when we rely on one place for collecting, hosting, and making available these datasets, we will always have an issue with data disappearing,' Phillips said. 'Historically the federal government would distribute information to libraries across the country to provide greater access and also a safeguard against loss. That isn't done in the same way for this government data.'

Read more of this story at Slashdot.
https://hardware.slashdot.org/story/25/01/30/2215252/archivists-work-to-identify-and-save-the-thousa...

Voir aussi

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Date Actuelle
ven. 31 janv. - 07:04 CET