By Jonathon Morgan
If a computer can understand a crisis by reading social media, do we still need journalists and aid workers to document the conflict?
Crisis-relevant social media is the holy grail of the humanitarian data community. Incident reports collected by aid workers on the ground, often in dangerous locations, are woefully inefficient when compared to the instantaneous information sharing enabled by global participation in websites like Facebook, Twitter and YouTube. The mere possibility that these online communities could be used to extract rapid, comprehensive intelligence about even the most violent conflicts is inspiring, and has fundamentally changed crisis response.
However assessing a conflict like the Syrian civil war using social media is the equivalent of aimlessly wandering around Damascus, hoping to stumble upon an interesting discussion. There are tens of thousands of disparate, sometimes enlightening, but often frivolous conversations at any given moment, and the sheer volume of information can be overwhelming. Given these challenges, humanitarian data scientists and other technologists are asking the question: Is it even possible to use networks like Facebook to effectively monitor a crisis?
Enter Brown Moses
Eliot Higgins, known online as Brown Moses, the name of his widely-read blog, has risen to prominence as an authoritative source for information about chemical weapons and arms smuggling in Syria. The New Yorker, New York Times, Guardian and Telegraph have all collaborated with Eliot or covered his work.
Eliot and I met at a conference in Stockholm run by the Stockholm International Peace Research Institute (SIPRI). In his presentation Eliot outlined how, even though he doesn't speak Arabic and has never been to Syria, it’s possible to gather remarkably precise intelligence about militant groups operating in the region simply by looking at posts, photos and videos published on social media.
The mostly manual process Eliot uses to identify, monitor, and analyze roughly 1,700 Facebook pages and YouTube channels has proven to be extremely effective -- for example he was the first to discover that Croatian weapons had been smuggled into Syria after finding evidence in a video he watched online. Unfortunately Eliot’s methods are very time consuming, and though he has developed ad hoc systems for collecting and sharing his findings, most of the information he consumes is inaccessible to other analysts. However, while Eliot had spent years developing domain expertise around Syrian militant groups and arms manufacturing, I had been designing and implementing software systems for ingesting and processing large volumes of information. Eliot and I subsequently discussed how some of his work could be automated by leveraging the same fundamental techniques companies like Twitter and Google use to find patterns in search history or hashtag popularity, and instead try to automatically discover and organize reports of armed combat and chemical warfare from Syrian social media.
This, of course, is easier said than done. Even a hand-picked list of users like the one Eliot had curated still produces tens of thousands of posts per day, containing millions of words that any software program would need to translate and interpret. Plus, while identifying the relevance, topic, important names and places in a status update or blog post is easy for people, it's still fairly new territory for computers. For this project to work, we needed to teach a machine to automatically recognize the same details that we’ve come to expect from trained human experts.
Turning on the Firehose
For the past year at Ushahidi we've been building CrisisNET, with the intention of consuming and interpreting massive amounts of conflict and disaster data. Chris and I had just finished building a prototype of this system when I met Eliot and we started planning our collaboration. His social media feeds were the perfect use case to put our prototype to the test.
When working with crisis information, context is essential. A report of a violent attack is meaningless without an understanding of where it happened, when it happened, and ideally which groups or individuals were involved. Our plan was to run crisis information through a digital pipeline that would add crucial metadata at every step. The system first translates the information into English, then scans the post for key phrases like "barrel bomb" that we know are associated with air combat or chemical weapons. Finally we employ complex natural language processing techniques to identify the names of cities or regions buried in the text, which in turn makes it possible to assign an estimated latitude and longitude to each status update, image or video.
Within 48 hours we had information from each of Eliot's Facebook pages and YouTube channels flowing through the CrisisNET system, but the true test of the technology was to put the data we'd extracted, categorized and geolocated on a map and verify that the information made sense.
The visualization at the top of this article does just that. On the left is a map generated from streaming social media augmented by CrisisNET (you can see an interactive version of the map here), while the map on the right is taken from a BBC report about the Syrian conflict published in March of this year. The BBC map uses data manually collected by humanitarian organizations working in the region, while we focused on Facebook posts and YouTube descriptions that our system had categorized as crisis-relevant, and then geotagged to coordinates within a specific Syrian city. While the manually-generated map is more detailed, the fact that machine-aggregated social media reports collected over two days clearly correlate to the documented situation on the ground has incredible implications. Namely: if we can understand a conflict via social media, do foreign journalists need to be there at all?
Social Media and the Future of Crisis Data
The answer is, of course, yes. Even though we’ve shown that Facebook, Twitter, YouTube and other social networks are a viable source for first-person accounts of regions in crisis, in covering the Syrian conflict we obviously relied heavily on Eliot Higgins' research to generate a meaningful stream of data for our analysis. Subsequently, while it’s likely that our algorithms and computer-assisted insights will continue to improve, it's unlikely that data science and machine learning will ever replace the need for journalists, human rights workers and domain experts in documenting conflict.
With that in mind, however, it is an especially difficult time to be a journalist in regions torn by conflict. Reporters are increasingly targeted by violence, particularly in Syria, where seven journalists have been murdered, and almost 50 killed in crossfire in the last two years alone. That, coupled with the fact that journalists are consistently asked to produce more content in less time, is quickly creating a gap between the knowledge we need to understand regions impacted by conflict and disaster, and what’s possible given the realities of modern newsroom budgets and rising safety concerns.
So, while on-the-ground reporting will clearly always be essential, for this type of documentation to remain plausible, current and transparent its practioners need to incorporate modern data science techniques to augment their information-gathering arsenal. Social media and data-driven analysis are the journalist's new secret weapon in monitoring conflict.