Computational Literary History, Fall 2015

Contact information

Course: English 90128: Computational Literary History
Instructor: Matthew Wilkens
Meetings: Monday 6:30-9:15ish
Location: 143 DeBartolo
Office hours: 320 Decio; W 10:00-12:00, F 2:00-3:00, and by appointment. Reserve office hour slots at

[Skip to the schedule of readings and work.]


A graduate-level introduction to problems and methods in computationally assisted literary studies, with an emphasis on large-scale historical issues. Includes substantial instruction in programming techniques.


Contemporary criticism has a problem. We long ago gave up the idea that our task was to appreciate and explain a handful of great texts, replacing that goal with a much more important and ambitious one: to understand cultural production — both today and in centuries past — as a whole by way of the aesthetic objects it creates. But we have continued to practice our craft as if the methods developed in pursuit of the old project were the only ones suited to the new task. In consequence, we have remained less successful and less persuasive concerning the operations of large-scale cultural production than we would like to be.

This course is devoted to new methods and new objects in cultural and literary studies, specifically those enabled by digital media. It is not, however, a course in media studies. We’ll spend most of our time covering both what kinds of criticism are made possible by the availability of digital cultural objects (especially digitized texts), whether those objects are born digital or are post facto electronic surrogates, and how to perform the technical operations necessary to carry out such criticism. The course thus has a substantial technical component, one no more difficult than — but often quite different from — most of your existing experience in literary studies. That said, the idea is certainly not to replace the methods you’ve previously mastered, but to supplement them with new approaches, issues, and questions that will allow you to do better the kinds of cultural and literary criticism you’ve already begun to practice.

We’ll begin with some reflections on the origins of computational literary studies and the rationale for pursuing quantitative work in cultural domains. We’ll move quickly to engagements with representative work in the field and to learning how to perform computational analysis. Much of what we read will fall under the broad heading of data mining; we’ll study what that term means and how to do it. We’ll aim for a mix of theoretical elaborations concerning what is and is not implied by quantitative methods and how those methods integrate with conventional humanities approaches to interpretation, and (the second part of the mix) specific examples of achieved work in the field, plus technical exercises that will help you carry out similar work on your own.

We’ll read as widely as possible in both the major works of computational literary studies and in the often-divisive debates about the value of that work. Much of the schedule is set out below, but be aware that this is a rapidly evolving field; we’ll adjust our readings in response to new developments that emerge over the course of the semester and to our collective interests or hangups.

Four notes:

  1. I’ll repeat that this is a course with significant technical components. There’s no assumption that you’ll enter with any particular computational or mathematical expertise and there will be plenty of help (along with prebuilt tools) available along the way, but you must be willing to work outside your presumptive comfort zone as literary critics to develop the skills necessary to conduct the kinds of research we’ll explore. This is really exciting stuff and it’s not tremendously difficult, but if you shut down early in the face of the command line or a list of numbers, it’ll be impossible to do well.
  2. Our week-to-week assignments and work will be somewhat different from what you’re probably accustomed to. Most notably, there will be — in addition to the usual books and articles to read — graded problem sets or other exercises to complete. We’ll talk more about the form these will take as the semester progresses.
  3. Your final project will also take an unconventional form. You will work in groups of three or four students to produce either a piece of quantitatively informed literary scholarship or a grant proposal to perform the same. All members of the group will receive the same grade for the project (though not necessarily for the course).
  4. When we perform our own analyses, we’ll work primarily with a set of prepared corpora. This limits in some ways the range of problems we might address, but it helps with the well-known issue that 90% of real-world data-related effort is cleaning up your sources. That said, we can explore other corpora as our needs, desires, and technical abilities dictate.

Required texts

In addition, essays from the scholarly literature will be assigned and available via Sakai and/or the open Web.


NB. All dates and assignments subject to change. You’ll also note that there are two open weeks toward the end of the semester; this is by design, to allow for adjustments depending on projects, interests, and new developments in the field. All readings except those from the three required texts are linked here or will be available on Sakai.

Articles behind paywalls are linked via ND’s proxy server and indicated by ‘(Proxy).’ If you don’t have an ND account but do have access from another institution, try removing ‘’ from the linked URL.

Problem sets and related assignments will be posted here or on Sakai the week before they’re due.

Zotero users (and anyone who wants full citations for the works listed here) may be interested in the full bibliography for the course.

Week 1 (8/31): Welcome and introduction



To keep abreast of new developments in computational literary studies — interesting projects and publications, conferences, grant opportunities, job openings, etc. — there are a few sites and tools to check out. I’d seriously recommend getting on Twitter (if you aren’t already).

  • Digital Humanities Now. A curated DH news/aggregation site run by the Roy Rosenzweig Center for History and New Media at George Mason. Helps tame the firehose that is Twitter.
  • Firehose problems notwithstanding, Twitter. This is where much of the conversation in DH happens. No, it’s not all pictures of sandwiches. There are a bunch of lists of DH people to follow. I can’t recommend any one in particular, but I guess you could start with my own following list. Note, though, that (like the course), it’s skewed toward computational people/work at the expense of other corners of the field(s).
  • Humanist. A (very) longstanding mailing list featuring announcements and discussion of all things DH.


  • Set up accounts and tools after class (Twitter, Python, Java, etc.)
  • Anaconda is the preferred Python distribution for the course on Mac, Linux, and Windows. Make sure you install the Python 3.4 version (not 2.7), and be sure you’re getting the version appropriate to your operating system.

Week 2 (9/7): Moretti and quantitative methods



  • UVa’s “Command Line Boot Camp” (if necessary).
  • Guttag, chapters 1-4. Submit answers or code (via the Assignments section of Sakai) for all “finger exercises” in ch. 2-4 (not ch. 1). Assuming you’re working with an iPython (Jupyter) notebook, upload the .ipynb file containing your answers. Each exercise should be labeled more or less clearly. It’s also OK to work with raw Python, but a notebook is the preferred approach. (Answers)

    NB. The finger exercise on p. 23, which asks you to find the roots of an arbitrary integer by exhaustive enumeration, is poorly constructed in a way that produces a trivial answer (because x**1 = x). To avoid this issue, search for roots between 2 and 5 (that is, 1 < pwr < 6).

Note that the Online Python Tutor is a useful resource for visualizing the execution of Python programs. Also, Guttag’s supplementary materials for the textbook — including video lectures, problem sets, and more — are available from MIT’s Open Courseware site.

Week 3 (9/14): Introducing computational techniques


  • Jockers, Matthew. Macroanalysis.


  • Guttag, ch. 5-7. Submit code for the two finger exercises in ch. 7. (Answers)
  • Exercise: String manipulation (ipynb). Save this file and open it from iPython notebook to complete the problems. (Answers)

Week 4 (9/21): Packaged tools



Review and explore these tools. Write up your use of one of them (c. 500 words, via Sakai).


  • Guttag, ch. 8-10. Submit the answer to the one finger exercise in Chapter 10 (p. 130). (Answer) If you really want to (begin to) get your head around classes, inheritance, and object-oriented programming, you should take a crack at Guttag’s problem set #6 (see bottom of linked page). I would have said PS #5, but it depends on Google’s RSS service, which no longer exists. Boo.
    Also, a reminder: Guttag’s OpenCourseWare materials — including video lectures, problem sets, quizzes, etc., but not the answers to the various finger exercises in the book — are all freely available if you need some additional resources.
  • If you’re interested in corpus-linguistics approaches to language change, you might want to have a look at Mark Davies’ Corpus of Contemporary American English and his other corpora. Davies’ tools are a bit like Google Ngrams, but much more sophisticated (at the expense of some ease of use).

Week 5 (9/28): Literary geography



  • Guttag, ch. 11-12. Submit answer to the single finger exercise in ch. 12 (p. 159). (Answer)

Week 6 (10/5): Social network analysis



Review these projects. Write up c. 500 words on one of them, describing either what you learned from it or how you might use its approaches in your own work.


  • Guttag, ch. 13. Nothing to submit, though you may want to practice some of the visualizations Guttag discusses.
  • Exercise: Parsing XML and JSON. Note that the link brings you to a GitHub display of the exercise notebook. To work with it on your own system, download the notebook and load it via the iPython notebook browser. Note, too, that the second part of the exercise (on JSON and dictionaries) may be difficult; if you spend a couple of hours on it and can’t get the thing to work, just submit pseudocode describing the relevant steps to solve the problem. (Solution)

Week 7 (10/12): No class meeting

Week 8 (10/19): Fall break

Week 9 (10/26): Content analysis and topic models



  • Review Robert Nelson’s “Mining the Dispatch.”
  • Review Signs @ 40, a special review of the feminist theory journal Signs that relies heavily on computational analysis.


  • Guttag, ch. 14-16. Submit answer to the single finger exercise in ch. 15 (p.213). You’ll need the springData.txt file. Note that these chapters aren’t the most immediately on-point for the kind of work you’ll generally be doing in text analysis. But they introduce some important concepts (and caveats) in both statistics and programming that will be of use to us later, so we can’t skip them. (Answer)

Week 10 (11/2): Topic models II



Week 11 (11/9): Machine learning I: Unsupervised methods



  • Guttag, ch. 18-19. Nothing to submit, just read these chapters. Chapter 19, in particular, is highly relevant.

Week 12 (11/16): Machine learning II: Supervised methods


Week 13 (11/23): Literary markets



  • Use the code from Stéfan Sinclair’s notebook that we reviewed two weeks ago to classify any two texts of your choice. One of these texts should be philosophical (by whatever definition you choose), the other non-philosophical. Neither text should appear in the training set. (Project Gutenberg is a good source of public-domain texts.) Are the texts classified correctly?

    Submit your code as an iPython notebook via Sakai and be prepared to discuss your results in class. (Answer)

Week 14 (11/30): Statistics, Pedagogy, Group Work

We’ll spend about half the seminar period doing open-ended group work in preparation for next week’s presentations. In addition, one piece of reading and one brief written assignment.


  • Bulmer, M.G. Principles of Statistics. Excerpts from chapters 9 and 10 on statistics significance testing and statistical inference (especially frequentist and Bayesian approaches), respectively. (Sakai)


Submit (via Sakai) 150-200 words describing a quantitative exercise you might assign to an undergraduate literature class in lieu of a paper. This need not (obviously) be a fully elaborated assignment, nor must it require the students to write code. But give some meaningful thought to what students could do in pursuit of specific literary ends using data or tools supplied to them.

Week 15 (12/7): Presentations

Each group will present its ongoing work on the final project to the class. Plan on thirty minutes of presentation and thirty minutes of discussion for each.

Final project

Details about the final project, to be completed in groups of two to four people and submitted by 5:00 pm on December 18, 2015.