The Miller Center's REST API for Downloading Data

This document describes the design and use of our data API api.millercenter.org.

endpoint: `https://api.millercenter.org`

Routes

`ANY /speeches`

Downloads all speeches currently in the collection. This option requires pagination, which we describe below.

Query parameters:

LastEvaluatedKey: (optional) Used for pagination. This lists the doc_name property from the last delivered speech.

Examples:

$ curl https://api.millercenter.org/speeches: initiates a download session.

Returned Data

Returns a JSON object with the following structure:

Items: A JSON array of the speech data, as described below.
LastEvaluatedKey (optional): If present, this indicates the last speech contained in this listing of Items. This is used for pagination (described below). If absent, no more data for this query are available.

The Items object is a JSON array, each element of which is a speech object. Speech objects carry these properties:

title: The title of the speech
president: The name of the president who gave the speech
date: The date on which the speech was given
doc_name: This is the file name of the document on millercenter.org. i.e. 'millercenter.org/the-presidency/presidential-speeches/' + doc_name = the mc.org URL of this speech.
transcript: The full text of the speech.

`ANY /speeches/speech`

[Coming Soon. Not Yet Implemented.] Downloads details for a given speech, including text embeddings per sentence.

Example:

$ curl https://api.millercenter.org/speeches/presidents?doc_name=december-2-1919-seventh-annual-message: Downloads detailed data for the December 2, 1919 Annual Message speech.

Returned data

Returns a single JSON object representing the speech.

Pagination

The Miller Center API uses pagination to handle downloading large amounts of data. For calls that generate large responses, clients must break their request into parts, downloading a small amount of data per request, each time telling the API where the last request left off.

Calling the /speeches route with no continuation arguments initiates a download of all speech objects. Each call to /speeches returns no more than 1Mb of data. If there remains additional data to download, the API returns a LastEvaluatedKey element in the response, which in turn contains the property: doc_name. To paginate, the value of this field must be presented as the LastEvaluatedKey argument to the next /speeches call.

Here is code fragment detailing how to use pagination to download the full corpus (N. B., this code is taken from the simple python script we offer for easy downloading. This script may meet your needs, if you are just looking for the data.)


            endpoint = "https://api.millercenter.org/speeches" 


            
                r = requests.post(url = endpoint) 

            data = r.json() 

            items = data['Items'] 
            

           
            while 'LastEvaluatedKey' in data: 

	      parameters = {"LastEvaluatedKey": data['LastEvaluatedKey']['doc_name']} 

              r = requests.post(url = endpoint, params = parameters) 

              data = r.json()

More generically, here is a flow chart of the process of downloading speeches:

Download first chunk of data.
Does it contain a LastEvaluatedKey element?
If not, done. If yes...Until the data we see don't include a LastEvaluatedKey:

Get the LastEvaluatedKey element from the current data object.
From the LastEvaluatedKey, get doc_name attribute.
Get the next data chunk by calling the /speeches route with: LastEvaluatedKey=[doc_name]

The crucial part in this example is how we use doc_name attribute from the LastEvaluatedKey field of the returned data as input to the LastEvaluatedKey parameter of our API call. So long as the API call's response includes a LastEvaluatedKey object, we need to keep calling. When we finally get an object back that doesn't have a LastEvaluatedKey, then we know we're done; we have downloaded the full corpus.

Again, if all you want to do is to download the speech corpus, our simple python script can do that for you easily. Just run it like so:


        $ python download_mc_speeches.py

and you're done.

Rate Limiting

For now we offer this API without requiring authentication. To prevent abuse, rate limiting is in effect. Under normal conditions, this should no impact use of the API. If you see errors about rate limiting or insufficient quota, please use the contact information below to let us know.

Send questions to Miles Efron (mefron@virginia.edu).