Downloading Data from millercenter.org

Introduction

The Miller Center of Public Affairs often receives requests for bulk downloads of our data. This web page is intended to help satisfy these requests. If you are interested in downloading data from millercenter.org, please read this document. If you have questions, contact Miles Efron (mefron@virginia.edu).

Our most commonly requested data set is the presidential speeches collection. This is a corpus of text data--speeches given by U.S. presidents, from George Washington's time to the contemporary presidency. The collection is not exhaustive; inclusion is an editorial decision by Miller Center staff. However, we have over 1,000 speeches available, and many NLP and computational humanities / social science researchers find the data useful.

With this in mind, we have made the presidential speeches collection available for bulk download. See below for details.

Terms of service

These data are offered as-is, with no warantee or support, for the use of the research and academic communities. The speeches are in the public domain. But please cite like so:

Downloading The Miller Center Speech Archive

Visitors can download the Miller Center speech archive here: https://data.millercenter.org/miller_center_speeches.tgz.

After downloading, expanding the tar archive will create a directory called speeches that contains a large number of json files. Each file contains a single speech, along with metadata such as the title of the speech, its date of delivery, and which president delivered it.

Again, please direct questions to Miles Efron (mefron@virginia.edu).