api.millercenter.org.
https://api.millercenter.org
ANY /speeches
LastEvaluatedKey: (optional) Used for pagination. This lists the doc_name property from
the last delivered speech.
$ curl https://api.millercenter.org/speeches: initiates a download session.
Returns a JSON object with the following structure:
Items: A JSON array of the speech data, as described below.
LastEvaluatedKey (optional): If present, this indicates the last speech contained in this listing of
Items. This is used for pagination (described below). If absent, no more data for this query are available.
The Items object is a JSON array, each element of which is a speech object. Speech objects carry these
properties:
title: The title of the speech
president: The name of the president who gave the speech
date: The date on which the speech was given
doc_name: This is the file name of the document on millercenter.org.
i.e. 'millercenter.org/the-presidency/presidential-speeches/' + doc_name = the mc.org URL of this speech.
transcript: The full text of the speech.
ANY /speeches/speech
$ curl https://api.millercenter.org/speeches/presidents?doc_name=december-2-1919-seventh-annual-message: Downloads detailed data for the December 2, 1919 Annual Message speech.
Returns a single JSON object representing the speech.
The Miller Center API uses pagination to handle downloading large amounts of data. For calls that generate large responses, clients must break their request into parts, downloading a small amount of data per request, each time telling the API where the last request left off.
Calling the /speeches route with no continuation arguments initiates a download of all speech objects. Each call to /speeches
returns no more than 1Mb of data. If there remains additional data to download, the API returns a LastEvaluatedKey element in
the response, which in turn contains the property: doc_name. To paginate, the value of this field must be presented as the
LastEvaluatedKey argument to the next /speeches call.
Here is code fragment detailing how to use pagination to download the full corpus (N. B., this code is taken from the simple python script we offer for easy downloading. This script may meet your needs, if you are just looking for the data.)
endpoint = "https://api.millercenter.org/speeches"
r = requests.post(url = endpoint)
data = r.json()
items = data['Items']
while 'LastEvaluatedKey' in data:
parameters = {"LastEvaluatedKey": data['LastEvaluatedKey']['doc_name']}
r = requests.post(url = endpoint, params = parameters)
data = r.json()
More generically, here is a flow chart of the process of downloading speeches:
LastEvaluatedKey element?LastEvaluatedKey:LastEvaluatedKey element from the current data object.LastEvaluatedKey, get doc_name attribute./speeches route with:
LastEvaluatedKey=[doc_name]
The crucial part in this example is how we use doc_name attribute
from the LastEvaluatedKey field of the returned data as input to the LastEvaluatedKey
parameter of our API call. So long as the API call's response includes
a LastEvaluatedKey object, we need to keep calling. When we finally get an object back that
doesn't have a LastEvaluatedKey, then we know we're done; we have downloaded the full corpus.
Again, if all you want to do is to download the speech corpus, our simple python script can do that for you easily. Just run it like so:
$ python download_mc_speeches.py
and you're done.
For now we offer this API without requiring authentication. To prevent abuse, rate limiting is in effect. Under normal conditions, this should no impact use of the API. If you see errors about rate limiting or insufficient quota, please use the contact information below to let us know.
Send questions to Miles Efron (mefron@virginia.edu).