Data journalism: ‘wreaking havoc with data is a much more creative sport’

Lam Thuy Vo is a crusading data journalist determined to show us how we can find and use hidden knowledge to cause trouble and set the truth loose.

Data journalist Lam Thuy Vo startled and depressed her audience at AIDC 2020 by admitting that she is a coder. Right, we all thought, what am I doing here?

But then she told her story and it all made sense. She started as a videographer, fascinated by people and cultures, and realised that we all leave a trail of data, which is like a record of an interview done continuously with a subject. 

She performed an experiment with a friend, using his facebook data, and adding his emails and phone calls through a relationship which had collapsed. His description of the whole period was vague and full of holes. Presented with this as an exact moment by moment record of his interactions, he was stimulated to remember the story in detail. He is being documented by data which is usable by other people only with his consent. 

She is about to make a story for Buzzfeed about the most multicultural place in the United States. There are kinds of hard data which can codify that comparison between all the cities of the US. What about flights in and out? Languages in emails or software? Religious institutions? Working translators? Educational records? And a sneaky one I hadn’t thought of: what kind of food is sold in the supermarkets? Those companies are deep diving into the communities they serve, looking for minute differences that make a financial difference. 

The answer is unexpected. It is not New York or Los Angeles or even Hawaii, but Anchorage, Alaska. She can make her case for the story, and find the most multicultural meeting places like particular schools and supermarkets and sports grounds. 

The trouble is, she was ending up with humongous spreadsheets, particularly if she was given raw publicly available data. Shuffling through it is mind-numbing, but if she could code she could sort it in milliseconds. 

As she said in her conversation,

‘I studied languages and love cultural texts and stuff like that. Coding did not come naturally to me. It was a lot of  figuring out being wrong. One of the things that I really like about it is it’s similar to my parents who are immigrants, and I like to think about them having instilled in me this like constant fear of imminent doom and you just have to make it work.’

‘.. everything keeps changing all the time, you might as well get used to the fact that change is constant. And that not many, like, very few people will have the luxury of remaining reporters…

So, she says, we all have to grow and challenge ourselves. For Lam Thuy Vo, the answer is Python.

My Australian experience of data is that so much of it is systematically hidden as if the whole world is ‘commercial in confidence’. But Lam Thuy Vo is more optimistic. As she said in a later interview with Screenhub, 

‘Even though people are closing off information so journalists will have a harder time doing work,  I think for every closed door there’s also a lot of other ways in which there are other new tools in which that will allow us to maybe get around them. Or we can find a different side door into the same room.’

There is a lot of data on government websites of various kinds, which is hidden by the sheer crappiness of the design. Companies are taking huge feeds tracking (for instance) our credit card data which they can sometimes be persuaded to share. There is a vast amount of information which can be viewed for different reasons than the intended. 

The point is to look positively at what we can get. To learn, for instance, the top websites which contain archives or data bases which are beyond the usual google snuffle. Just because information is not set up for google does not mean it is not on the net. Freedom of information demands can help. Finding the spaces where information is absent is also suggestive.

We can create alliances with other people who are entitled to ask for their own data until we have created the whole field. The Australian ASIO files is one example. We can read academic theses’ online about particular areas to find their sources, or to utilise their surveys and analyses. Many research companies are allowing summary data into the wild to advertise white papers, which can be a useful starting point for proper interview work.

Another useful trick is to perform your own experiments. As Lam Thuy Vo said

‘One of my favorite examples of that is when Julia Angwin was working at ProPublica. She is an investigative journalist who has gone on to found her own organisation, The Markup. She wanted to prove that the algorithm in Facebook would allow for hate inducing ads and she was able to do an experiment to show that those categories existed and were allowing people to spread hate on Facebook. 

‘That is not something she could have done by investigating the data from a systematic point of view but she was able to poke the system and prove something. And it’s incremental and it’s small, but it’s helpful to show just how flawed that system is.’

‘It’s like surveying the landscape and understanding what you can and cannot get.  And then from there on, based on what you can’t get just have a little bit of fun. Figure out creative ways to investigate the same stories.’

And data journalists are not just stats nerds, though scepticism and a sharp eye for the digital con job are fundamental. The biggest problem is the sheer collapse of the sector and the loss of money to pay for good journalists.

‘We just have to be more creative around it. And we have to find ways of looking where other people don’t have the resources of time to look, and you’d be surprised how much you can get by just putting in the time along with curiosity and creativity. That’s the thing that I find fascinating. I think a lot of people look at people like me who are data journalists and think that we’re these like very boring statisticians who sit there and like find different ways of calculating an average or a median. But actually, data journalism is very different. As I like to say, wreaking havoc with data is a much more creative sport.

‘There are also people who I’d like to call vigilante Data Archive archivists who have been archiving a lot of forums and data on that so there’s a lot of experts that we can lean on as journalists.’

Her website contains a large array of useful teaching resources for journalism, data journalism, mining the social web and Python, just to skim the list.  There is plenty to download for free. 

David Tiley was the Editor of Screenhub from 2005 until he became Content Lead for Film in 2021 with a special interest in policy. He is a writer in screen media with a long career in educational programs, documentary, and government funding, with a side order in script editing. He values curiosity, humour and objectivity in support of Australian visions and the art of storytelling.