Singularization Corpus Data

This page collects the redistributable data used in our article “Is or Are: The ‘United States’ in Nineteenth-Century Print Culture” (Bryan Santin, Daniel Murphy, and Matthew Wilkens, 2015; under review).

For efficiency’s sake, there’s a separate, basic list of the 1,540 texts included in the literary corpus for the article. Use it if you just want to see what’s there. In addition, we provide below all the derived data on which the article is based, plus links to our sources where available.

Derived data

A full listing of the literary corpus, including bibliographic metadata and author demographics where known (gender, ethnicity, geographic origin, and occupation). This might be valuable even outside the context of our article, since it summarizes quite a bit of historical research into nineteenth-century authors.
Hand-reviewed, per-volume counts of singular and plural occurrences of “United States” in Wright volume 1 (1789-1850) texts and in volume 2 (1851-1875) texts. The full texts of Wright 2 volumes are available from the Indiana University Digital Library Program.
Singular and plural uses in the literary corpus aggregated by year of publication and subsetted by author gender and by author’s slave-state/free-state affiliation.
Singular and plural instances by date in the Chronicling America newspaper corpus. A Bookworm interface to the Chronicling America data is also available from the Culturomics Project.
Instances by issue date in the Richmond Dispatch corpus. Details concerning the Dispatch project are available from Mining the Dispatch.
Interactive Google Ngram plots for the fraction of singular and plural occurrences of the phrases “the United States is/was/has” in American English volumes, 1800-2000.

Work Product

Research notes in quantitative humanities

Menu

Singularization Corpus Data

Derived data

Leave a comment Cancel reply

Menu

Derived data

Share this:

Leave a comment Cancel reply