At a media conference in Dublin last weekend (@cleraunmedia) there was a great deal of talk about digital and data journalism, how to use it, – with the odd nod to how to abuse it – and how it was in some ways helping refine the whole process of keeping the world better informed.
This week the Columbia Journalism Review gives us another look at the process and raises complex ethical questions about where we are being led by this development. In all this, moral issues may arise as to what might happen if we surrender ourselves too blithely to the law of algorithms. Indeed the shadow of HAL 9000 might be already hovering over us and taking control of our far from simple world.
In those two great cinematic epics from the late sixties ad early seventies, 2001: A Space Odyssey and Solaris, the whole question of man and his machines, man as a moral being versus man as a scientific and technological being were raised. These two masterpieces, by Stanley Kubrick and Andrei Tarkovsky respectively, may only now be beginning to become critically relevant to our brave new world. You may remember that HAL derived its acronym from “Heuristically programmed ALgorithmic computer”.
The CJR raised these questions in the context of a BuzzFeed News probe earlier this year into suspicions about players fixing tennis matches. They called it “The Tennis Racket.” The piece featured an innovative use of statistical analysis to identify professional players who may have thrown matches. By analyzing win-loss records and betting odds at both the beginning and ending of a match, BuzzFeed identified cases where there was an unusually large swing (e.g. greater than 10 percent difference). If there were enough of these matches, it cast suspicion on the player.
They anonymized the data and didn’t publish the names of suspicious players. But a group of undergraduate students from Stanford University were able to infer and make public the names of players BuzzFeed had kept hidden.
The Review author, Nicholas Diakopoulos, feels the incident raises interesting questions about where to draw the line in enabling reproducibility of journalistic investigation, especially those that generate statistical indictments of individuals. “As newsrooms adapt to statistical and algorithmic techniques, new questions of media accountability and ethics are emerging.”
He notes how the news industry is rapidly adopting algorithmic approaches to production: automatically monitoring, alerting, curating, disseminating, predicting, and even writing news. This year alone The Washington Post began experimenting with automation and artificial intelligence in producing its Olympics and elections coverage, The New York Times published an anxiety-provoking real-time prediction of the 2016 presidential election results, the Associated Press is designing machine learning that can translate print-stories for broadcast, researchers in Sweden demonstrated that statistical techniques can be harnessed to draw journalists’ attention to potentially newsworthy patterns in data, and Reuters is developing techniques to automatically identify event witnesses from social media.
“While such technologies enable an ostensibly objective and factual approach to editorial decision-making, they also harbor biases that shape how they include, exclude, highlight, or make salient information to users.”
In “The Tennis Racket,” BuzzFeed decided to provide varying levels of transparency that would appeal to different levels of reader expertise. Each level of disclosure added additional nuance, so different stakeholders could access the “granularity” of information most relevant to their interests.
He then explains: “But the flip side of transparency is that, in the case of BuzzFeed, providing the source code and a detailed-enough methodology allowed students to de-anonymize the results relatively quickly and easily. The students re-scraped the data from the online source (though there was some uncertainty in identifying the exact sample used in the original story) with identities preserved, and then cross-referenced with the anonymized BuzzFeed data based on the other data fields available. This allowed them to associate a name with each of the 15 players identified in the original analysis.”
Transparency is now very high on the scale of values of the democratic world – not always adhered to without a degree of hypocrisy. The algorithm industry is well harnessed to provide tools for that. But, as this case shows, its instruments can be blunt and have a potential to perpetrate what might be injustice.
Diakopoulos points out that several prominent ethics codes employed by media organisations now emphasize transparency as a guiding norm. But transparency, he warns, is not a silver bullet for media ethics. It’s complicated. “With so much machinery now being used in the journalistic sausage making, transparency is a pragmatic approach that facilitates the evaluation of the interpretations (algorithmic or otherwise) that underlie newswork.”
For many in the industry building computational products, Diakopoulos says, there are still concerns over algorithmic media production. We need a more accountable media system in which what he calls “these black boxes” are rendered more explainable and trustworthy.
Nicholas Diakopoulos is an assistant professor at the University of Maryland and a fellow at the Tow Center for Digital Journalism.