How to crowdsource effectively

Nicolas Kayser-Bril

10/10/2012

crowdsourcing

data journalism

outsourcing

social networks

vox populi

Crowdsourcing, a portmanteau word made of crowd and outsourcing, means distributing tasks to a group of people. It can be a very cost-effective way to collect and process information. It can also be an utter failure, resulting in thousand of wasted man-hours, if not prepared carefully.

The first thing to think about is “What do I want?” When talking to a large base of users, it is easy to get carried away in the conversation and forget what you’re after. Below are 3 ways of using crowdsourcing in the newsroom, from the easier to the more complicated.

I want to collect testimonies

This is the modern-day equivalent of vox pop interviews. To get the best overview of the public opinion, it is often simplest to go to a bar, listen to what is said and ask questions.

Now that conversations are shifting online, listening to the crowd has got easier. When you want to collect testimonies and opinions in no special format, the simplest tools are social networks and the discussion under your articles. Of course, most comment sections are saturated with trolls and expletives. What will make the conversation useful is to show your audience that you are listening by answering to comments. (Contrary to a popular belief, whether users show their real names or not makes absolutely no difference).

Some newsrooms also use Facebook to spark and nurture conversations that will yield useful testimonies. ProPublica’s Patient Harm Community, for instance, is a Facebook group that brings together less than 1000 people who suffered from the US health system.

If you are in need of more structured data, including time and position for each testimony, for instance, Ushahidi becomes the tool of choice. Al Jazeera’s Balkans bureau set up an instance of Ushahidi that enabled users to report in real time on the problems they faced during the 2012 snow storms.

The tools should always be adapted to your situation. At Dauphiné Libéré, a journalist asked users for the problems they had with terrestrial television. She received thousands of contributions that made for a very interesting story. But she did not plan in advance for the massive input, so that she had to process all data by hand. In this case, spending a few hours to set up an Ushahidi instance beforehand would have been more efficient.

I want to build my own statistics

Testimonies are one thing, but they only yield information about the people who answered. What Al Jazeera did with Ushahidi, for instance, was a great piece of emergency journalism, but it did not tell you how bad the situation really was. They collected 258 reports via web and SMS in February 2012 alone. That’s a lot but it tells little. Was there so much snow in some areas that the cell phone towers collapsed, preventing people from contributing? Did only the most affected people contribute, hiding the fact that most people were not that bothered by the snow?

As long as these questions remain unanswered, operations such as Al Jazeera’s remain big vox pops. More credible outcomes require running well-designed surveys and processing the results with care.

International organizations are already rethinking the way they collect data. Using cell phone reports, for instance, UNICEF monitored the food supply in Ethiopia (selected reporters would text in the state of the food warehouse at specific intervals). The WHO monitors causes of death in India and the World Bank plans on doing the same for public transport in the Philippines.

So far, no media organization has stepped in. We at Journalism++ are building, with others, an open source tool that enables journalists to simply collect such high-quality data. Called Feowl, it is currently tested in Cameroon to produce statistics on power cuts and will be released in late 2012.

I want to analyze documents

Imagine you received a few thousand juicy documents but did not have the resources to sit down at your desk for 2 months just to read them. What would you do? Twenty years ago, you might have passed them on to a bigger newsroom. Today, you might simply offer them to your audience and go for collaborative analysis.

This kind of crowdsourcing is by far the trickiest. If the documents are complex, you might be asking too much from your audience. If your operation is badly organized, you might drive the few experts in your crowd away.

In 2010, we faced the situation described above. Wikileaks had just released the Afghan war diaries. In less than 36 hours, we built an application that let users browse through the documents, vote on them and comment. Throughout the process, we asked our community for input, so that we managed to bring together a dozen of experts: veterans from the Balkans, army enthusiasts or specialized journalists who dug through the documents and brought valuable insights.

Just a few months later, we were invited by Wikileaks to build a similar platform for the upcoming release of documents, the Iraqi war logs. This time, as we benefited from the media firepower of being a Wikileaks partner, our app received several million hits in a few days. But almost none of them had any specialist knowledge of NATO operations, so that their contributions were fairly useless.

Bringing a few dozens experts to the table can yield better results than bringing millions of non-experts. The Guardian suffered from the same problem with its milestone crowdsourcing of the MPs expenses. For all her efforts, a regular user will not be as productive as a seasoned political journalist to find stories in such dry documents.

This does not mean that large corpuses of documents can only be analyzed by experts. You simply need to turn lay users into experts in their own right. The best example of this approach remains National Geographic’s quest for the tomb of Genghis Khan. Users were asked to mark the ruins of burial sites using satellite shots. What made this experiment distinctive is that users had to pass a 15-minute interactive training course that taught them how to recognize interesting features on a satellite image.

Another way to leverage your community is to ask for non-expert, straightforward tasks. Once, in 2009, we needed to convert a few hundred badly photocopied pages containing addresses of public services. Using an interface developed on a shoestring, we asked the community to do the work with us. In a few days, all documents had been converted.

The Open Knowledge Foundation released a new open source tool to do this kind of simple tasks, PyBossa. If you need to quickly classify or tag pictures, for instance, or if you need to transcribe an audio file, PyBossa will help you get help from your community.