18 - Coding With Data in the Newsroom
Summary
Abstract
Newsrooms present unique challenges to coders and technically minded journalists.
Keywords: computational journalism, programming, data cleaning, databases, data visualization
Inevitably, there is a point where data and code become companions. Perhaps when Google Sheets slows down because of the size of a data set; when Excel formulas become too arcane; or when it becomes impossible to make sense of data spanning hundreds of rows. Coding can make working with data simpler, more elegant, less repetitive and more repeatable. This does not mean that spreadsheets will be abandoned, but rather that they will become one of a number of different tools available. Data journalists often jump between techniques as they need: Scraping data with Python notebooks, throwing the result into a spreadsheet, copying it for cleaning in Refine before pasting it back again.
Different people learn different programming languages and techniques; different newsrooms produce their work in different languages, too. This partly comes from an organization's choice of “stack,” the set of technologies used internally (for example, most of the data, visual and development work at The Times (of London) is done in R, JavaScript and React; across the pond ProPublica uses Ruby for many of their web apps). While it is often individuals who choose their tools, the practices and cultures of news organizations can heavily influence these choices. For example, the BBC is progressively moving its data visualization workflow to R (BBC Data Journalism team, n.d.); The Economist shifted their world-famous Big Mac Index from Excel-based calculations to R and a React/d3.js dashboard (Gonzalez et al., 2018). There are many options and no single right answer. The good news for those getting started is that many core concepts apply to all programming languages. Once you understand how to store data points in a list (as you would in a spreadsheet row or column) and how to do various operations in Python, doing the same thing in JavaScript, R or Ruby is a matter of learning the syntax.
For the purpose of this chapter, we can think of data journalism's coding as being subdivided into three core areas: Data work—including scraping, cleaning, statistics (work you could do in a spreadsheet); back-end work—the esoteric world of databases, servers and APIs; and front-end work—most of what happens in a web browser, including interactive data visualizations.
- Type
- Chapter
- Information
- The Data Journalism HandbookTowards A Critical Data Practice, pp. 124 - 127Publisher: Amsterdam University PressPrint publication year: 2021