api.millercenter.org
.
https://api.millercenter.org
ANY /speeches
LastEvaluatedKey
: (optional) Used for pagination. This lists the doc_name
property from
the last delivered speech.
$ curl https://api.millercenter.org/speeches
: initiates a download session.
Returns a JSON object with the following structure:
Items
: A JSON array of the speech data, as described below.
LastEvaluatedKey
(optional): If present, this indicates the last speech contained in this listing of
Items
. This is used for pagination (described below). If absent, no more data for this query are available.
The Items
object is a JSON array, each element of which is a speech object. Speech objects carry these
properties:
title
: The title of the speech
president
: The name of the president who gave the speech
date
: The date on which the speech was given
doc_name
: This is the file name of the document on millercenter.org.
i.e. 'millercenter.org/the-presidency/presidential-speeches/' + doc_name
= the mc.org URL of this speech.
transcript
: The full text of the speech.
ANY /speeches/speech
$ curl https://api.millercenter.org/speeches/presidents?doc_name=december-2-1919-seventh-annual-message
: Downloads detailed data for the December 2, 1919 Annual Message speech.
Returns a single JSON object representing the speech.
The Miller Center API uses pagination to handle downloading large amounts of data. For calls that generate large responses, clients must break their request into parts, downloading a small amount of data per request, each time telling the API where the last request left off.
Calling the /speeches
route with no continuation arguments initiates a download of all speech objects. Each call to /speeches
returns no more than 1Mb of data. If there remains additional data to download, the API returns a LastEvaluatedKey
element in
the response, which in turn contains the property: doc_name
. To paginate, the value of this field must be presented as the
LastEvaluatedKey
argument to the next /speeches
call.
Here is code fragment detailing how to use pagination to download the full corpus (N. B., this code is taken from the simple python script we offer for easy downloading. This script may meet your needs, if you are just looking for the data.)
endpoint = "https://api.millercenter.org/speeches"
r = requests.post(url = endpoint)
data = r.json()
items = data['Items']
while 'LastEvaluatedKey' in data:
parameters = {"LastEvaluatedKey": data['LastEvaluatedKey']['doc_name']}
r = requests.post(url = endpoint, params = parameters)
data = r.json()
More generically, here is a flow chart of the process of downloading speeches:
LastEvaluatedKey
element?LastEvaluatedKey
:LastEvaluatedKey
element from the current data object.LastEvaluatedKey
, get doc_name
attribute./speeches
route with:
LastEvaluatedKey=[doc_name]
The crucial part in this example is how we use doc_name
attribute
from the LastEvaluatedKey
field of the returned data as input to the LastEvaluatedKey
parameter of our API call. So long as the API call's response includes
a LastEvaluatedKey
object, we need to keep calling. When we finally get an object back that
doesn't have a LastEvaluatedKey
, then we know we're done; we have downloaded the full corpus.
Again, if all you want to do is to download the speech corpus, our simple python script can do that for you easily. Just run it like so:
$ python download_mc_speeches.py
and you're done.
For now we offer this API without requiring authentication. To prevent abuse, rate limiting is in effect. Under normal conditions, this should no impact use of the API. If you see errors about rate limiting or insufficient quota, please use the contact information below to let us know.
Send questions to Miles Efron (mefron@virginia.edu).