We've posted before about using Python's pandas library for working with data from CrisisNET. While this is popular with the research community, if you have a web development background and/or are planning on using a browser technology like D3 for your visualizations, the conventions enforced by pandas can be a little awkward. Today we'll take a less formal approach using some basical functional programming techniques baked into Python, and get you from API request to sexy visualization in no time.

Our goal is to plot the location and size of Syrian refugee camps on a map of Iraq like we did earlier this week. I'll cover how we made the visualization in a follow-up tutorial, but for day let's focus on retrieving and formatting our data.

Step 1: Get the Data

Let's all the data from the UNHCR source available in CrisisNET.

import requests

params = {  
  'sources':'unhcr',
  'limit':500,
  'apikey':'YOUR-API-KEY'
}

r = requests.get('http://api.crisis.net/item', headers=headers, params=params)  
json_resp = r.json()  

Wow. That was easy.

When querying the API from a real-time application you'll probably want to control the size of the response by setting a limit lower than 500. However, because an extra second or two in network latency won't hurt our analysis, we can live dangerously and grab everything at once.

Step 2: Filter the Data

There are a lot of refugees in the world, and subsequently quite a lot of refugee data. Our visualization is about Iraq, so let's filter the API response down to only the data from that country. (Note: I could've done this in the initial API call simply by attaching a placeName=Iraq filter to the request, but this is a Python tutorial, so bear with me.)

The list of Item documents is available in the data property of the API response.

{
  total: 498,
  data: [
    // List of docs
  ]
}

As noted in the API documentation, each item object in the list keeps its location information in the geo property. For example:

{
  id: "abc123",
  ..., // Other stuff
  geo: {
    addressComponents: {
      adminArea1: 'Iraq',
      adminArea3: 'Anbar'
    }
  },
  ... // More stuff 
}

Aha, now that we know where to look in the item to find the country name, we can move along to the interesting part. Armed with the totally awesome power of Python, let's filter our list of items so we're left with the ones that have an geo.addressComponents.adminArea1 equal to "Iraq." There are a few ways to do this, and we'll take a look at each because learning is great.

The Lame Traditional Filter: for loops

I'm just joking. For loops aren't lame, but you won't find any self-respecting hipster coder using them. Most of you will be very familiar with the for loop, so I'm only including it for reference.

docs = json_resp['data']  
iraq_docs = []  
for doc in docs:  
    if doc['geo']['addressComponents']['adminArea1'] == 'Iraq':
        iraq_docs.append(doc)

There's nothing wrong with this approach, but I find that in more complex scenarios for loops make it difficult to conceptualize how data moves through an application, and subsequently are hard to debug.

The Pythonic Filter: List comprehensions

This is arguably the most Pythonic way to filter a list.

docs = json_resp['data']  
iraq_docs = [doc for doc in docs if doc['geo']['addressComponents']['adminArea1'] == 'Iraq']  

List comprehensions can be a little confusing at first, but in fact this is very similar to the for loop we started with. The statement above says "make me a list using every item in the list docs, as long as the ['geo']['addressComponents']['adminArea1'] property of that item is equal to 'Iraq'."

The Uh, Filter...Filter: filter

Break out the moustache, it's time for the famous higher-order function found in just about every functional programming language. Added bonus: lambda functions! First the snippet then I'll explain more between sips of PBR.

docs = json_resp['data']  
iraq_docs = filter(lambda doc: doc['geo']['addressComponents']['adminArea1'] == 'Iraq', docs)  

Both filter and lambda functions are useful additions to your programming toolbelt, so hop off your fixie and let's take a look at what's happening. First, lambda functions are just anonymous functions. If you've done any web programming with JavaScript, it's like writing:

$('#my-thing').on('click', function() {
  console.log('I haz click');
});

The function you're passing to the on method in the snippet above is an anonymous function. We can do the same thing in Python with a lambda expression. You might also think of it as a function defined in place. So taking the function above:

lambda doc: doc['geo']['addressComponents']['adminArea1'] == 'Iraq'  

This says: "Give me a function that accepts an argument that I'll call doc. That function returns True if doc's ['geo']['addressComponents']['adminArea1'] property is equal to 'Iraq'." It's the equivalent of:

def is_iraq(doc):  
    if doc['geo']['addressComponents']['adminArea1'] == 'Iraq':
        return True
    else:
        return False

Combined with filter, these simple anonymous functions serve as a test for each item in a list. If the item passes the test, it's included in the new list, if not, it's discarded (bonus points: a function used as a test to filter items in a list is also called a "predicate"). So whether you use a lambda expression like I did in the original filter example or you use the more verbose definition we assigned to the is_iraq variable, like this:

iraq_docs = filter(is_iraq, docs)  

You're saying: "Make a list of every item in docs that passes my test," where the "test" is the function you pass to filter as the first parameter.

Step 2: Formatting the Data

Now that we have a list of items about Iraq, let's pull out the information we need for our visualization. We'll be plotting the location of each refugee camp as a circle on a map, and we want to size the circle relative to the number of refugees living at the camp. The number of refugees can be found on the item's totalAffectedPersons property, and the camp's coordinates can be found on the geo.coords property. For example:

{
  totalAffectedPersons: 12345,
  geo: {
    coords: [43.001,36.001]
  }
}

As you might have guessed, we can use a list comprehension for this:

vals = [(doc['totalAffectedPersons'], doc['geo']['coords']) for doc in iraq_docs]  

Here we're making a list of tuples, where the first value in the tuple is the number of refugees in the camp, and the second value is the camp's coordinates. Another approach uses the built-in method, map.

vals = map(lambda doc: (doc['totalAffectedPersons'], doc['geo']['coords']), iraq_docs)  

This is very similar to filter, and is another "higher-order" function that you'll find in most functional programming languages. Basically, "run every item in a list through the function I pass you, and give me the results in a new list."

The Short Version

That was a relatively lengthy explanation, but in practice both the filter and transformation steps are very straightforward. Putting it all together:

import requests

# Get the data
params = {  
  'sources':'unhcr',
  'limit':500,
  'apikey':'YOUR-API-KEY'
}

r = requests.get('http://api.crisis.net/item', headers=headers, params=params)  
json_resp = r.json()

# Filter and format
iraq_docs = [doc for doc in docs if doc['geo']['addressComponents']['adminArea1'] == 'Iraq']  
vals = [(doc['totalAffectedPersons'], doc['geo']['coords']) for doc in iraq_docs]  

Next time I'll walk you through how to take this data and plot it on a map using the D3 JavaScript library. Until then, enjoy your narrow escape from death by pandas.