37 - Archiving Data Journalism
Summary
Abstract
This chapter discusses the challenges of archiving data journalism projects and the steps that data teams can take to ensure their projects are preserved for the future.
Keywords: data journalism, archival practices, archives, digital archives, broken links, web archiving
In the first edition of The Data Journalism Handbook, published in 2012, data journalism pioneer Steve Doig wrote that one of his favourite data stories was the “Murder Mysteries” project by Tom Hargrove. In the project, which was published by the Scripps Howard News Service, Hargrove looked at demographically detailed data about 185,000 unsolved murders and built an algorithm to suggest which murders might be linked. Linked murders could indicate a serial killer at work. “This project has it all,” Doig wrote. “Hard work, a database better than the government's own, clever analysis using social science techniques, and interactive presentation of the data online so readers can explore it themselves.”
By the time of the second edition of The Data Journalism Handbook, six years later, the URL to the project was broken (projects.scrippsnews.com/magazine/murder-mysteries). The project was gone from the web because its publisher, Scripps Howard, was gone. The Scripps Howard News Service had gone through multiple mergers and restructurings, eventually merging with Gannett, publisher of the USA Today local news network.
We know that people change jobs and media companies come and go. However, this has had disastrous consequences for data journalism projects (for more on this issue see, e.g., Boss & Broussard, 2017; Broussard, 2014, 2015a, 2015b; Fisher & Klein, 2016).
Data projects are more fragile than “plain” text-and-images stories that are published in the print edition of a newspaper or magazine.
Ordinarily, link rot is not a big deal for archivists; it is easy to use Lexis- Nexis or ProQuest or another database provider to find a copy of everything published by, say, The New York Times print edition on any day in the 21st century. But for data stories, link rot indicates a deeper problem. Data journalism stories are not being preserved in traditional archives. As such, they are disappearing from the web. Unless news organizations and libraries take action, future historians will not be able to read everything published by The Boston Globe on any given day in 2017.
- Type
- Chapter
- Information
- The Data Journalism HandbookTowards A Critical Data Practice, pp. 274 - 278Publisher: Amsterdam University PressPrint publication year: 2021