By Chris Albon

The comma-separated values format (csv) is one of the most simple and popular means to store and share data. CrisisNET's data explorer provides a one-click tool for exporting CrisisNET data as a csv file. However, the explorer tool only has basic functionality for filtering. For a more fine grained filtering of CrisisNET data, we will want to make a request to the CrisisNET API directly, then export the results to csv. In this tutorial, we will make an API pull from CrisisNET, convert data's hierarchical JSON format into a "flat" format (i.e. a spreadsheet or dataframe), and then export it as a csv file using Python.

Preliminaries

The first step is loading the required Python modules. We will be using two modules in this tutorial, requests to make the API pull and pandas to conduct the data wrangling. Let's load the modules.

# Import the required modules
import requests as re  
import pandas as pd  

Make the CrisisNET API request

To pull down data from CrisisNET, we will need two things: an API key (which you can get here) and a request URL. The request url tells CrisisNET which types of data we wish to receive. You can learn more about the requests url in our documentation. In the example below, we are requesting 200 crisis-relevant data points from Facebook.

# Create a variable with your CrisisNET API key
api_key = 'YOUR_API_KEY'

# Setup the request header
headers = {'Authorization': 'Bearer ' + api_key}

# Setup the request's URL
url = 'http://api.crisis.net/item?limit=200&sources=facebook'

# Make the API request
request_data = re.get(url, headers=headers)  

Create the JSON to CSV function

Okay, we have successfully requested data from CrisisNET and stored it in a variable called request_data. This variable is in JSON format. JSON is great, but in order to save the request as a csv, we need to "flatten" the data structure. In the handy Python function below, we do just that and then save the results to csv. I've added detailed comments into the code explaining each line in plain English.

# Define a function that,
def export_to_csv(r, filename):  
    # converts the json into a dataframe,
    request_df = pd.DataFrame(r.json())
    # expands the df.data cell,
    df = request_df['data'].apply(pd.Series)
    # converts df.updatedAt to a datetime object,
    df["updatedAt"] = pd.to_datetime(df["updatedAt"])
    # sets it as the dataframe's index,
    df.index = df['updatedAt']
    # expands the df.geo object,
    geo_df = df['geo'].apply(pd.Series)
    # expands the admin zones,
    geo_admin_df = geo_df['addressComponents'].apply(pd.Series)
    # merges the admin zones back into the dataframe,
    df = pd.concat([df[:], geo_admin_df[:], geo_df[:]], axis=1)
    # seperates the lat and long objects,
    df['latitude'], df['longitude'] = df['coords'].str[1], df['coords'].str[0]
    # expands df.tags,
    tags_df = df['tags'].apply(pd.Series)
    # defines a function called tag_extractor,
    def tag_extractor(x):
        # that, if x is a string,
        if type(x) is float:
            # just returns it untouched
            return x
        # but, if not, convert x to a dict() and return the value
        elif x:
            x = dict(x)
            return x['name']
        # and leave everything else
        else:
            return
    # executes tag_extractor on the tags dataframe,
    tags_df = tags_df.applymap(tag_extractor)
    # renames all the tag columns,
    tags_df = tags_df.rename(columns = lambda x : 'tag_' + str(x))
    # merges everything back together,
    df = pd.concat([df[:], tags_df[:]], axis=1)
    # expands df.language,
    lang_df = df['language'].apply(pd.Series)
    # takes the language code,
    df['lang'] = lang_df['code']
    # and finally returns a csv in unicode encoding (for non-english characters)
    return df.to_csv(filename, encoding='utf-8')

Execute the function on the API request

With the function defined, now we just need to run it. Remember that request_data is the variable of the CrisisNET JSON data and you need to change the file path to wherever you want to save the csv.

# Run the function with two attributes: the input variable and the output csv
export_to_csv(request_data, '/Users/chris/data/crisisnet_export.csv')  

And that's it. Good data hunting.