The Miller Center's REST API for Downloading Data

This document describes the design and use of our data API api.millercenter.org.

endpoint: https://api.millercenter.org


Routes

ANY /speeches

Downloads all speeches currently in the collection. This option requires pagination, which we describe below.

Query parameters:

Examples:

$ curl https://api.millercenter.org/speeches: initiates a download session.

Returned Data

Returns a JSON object with the following structure:

The Items object is a JSON array, each element of which is a speech object. Speech objects carry these properties:


ANY /speeches/speech

[Coming Soon. Not Yet Implemented.] Downloads details for a given speech, including text embeddings per sentence.

Example:

$ curl https://api.millercenter.org/speeches/presidents?doc_name=december-2-1919-seventh-annual-message: Downloads detailed data for the December 2, 1919 Annual Message speech.

Returned data

Returns a single JSON object representing the speech.

Pagination

The Miller Center API uses pagination to handle downloading large amounts of data. For calls that generate large responses, clients must break their request into parts, downloading a small amount of data per request, each time telling the API where the last request left off.

Calling the /speeches route with no continuation arguments initiates a download of all speech objects. Each call to /speeches returns no more than 1Mb of data. If there remains additional data to download, the API returns a LastEvaluatedKey element in the response, which in turn contains the property: doc_name. To paginate, the value of this field must be presented as the LastEvaluatedKey argument to the next /speeches call.

Here is code fragment detailing how to use pagination to download the full corpus (N. B., this code is taken from the simple python script we offer for easy downloading. This script may meet your needs, if you are just looking for the data.)

More generically, here is a flow chart of the process of downloading speeches:

The crucial part in this example is how we use doc_name attribute from the LastEvaluatedKey field of the returned data as input to the LastEvaluatedKey parameter of our API call. So long as the API call's response includes a LastEvaluatedKey object, we need to keep calling. When we finally get an object back that doesn't have a LastEvaluatedKey, then we know we're done; we have downloaded the full corpus.

Again, if all you want to do is to download the speech corpus, our simple python script can do that for you easily. Just run it like so:

$ python download_mc_speeches.py

and you're done.

Rate Limiting

For now we offer this API without requiring authentication. To prevent abuse, rate limiting is in effect. Under normal conditions, this should no impact use of the API. If you see errors about rate limiting or insufficient quota, please use the contact information below to let us know.

Send questions to Miles Efron (mefron@virginia.edu).