Course: English 90128: Computational Literary History
Instructor: Matthew Wilkens
Meetings: Monday 6:30-9:15ish
Location: 143 DeBartolo
Office hours: 320 Decio; W 10:00-12:00, F 2:00-3:00, and by appointment. Reserve office hour slots at bit.ly/mw_oh.
[Skip to the schedule of readings and work.]
A graduate-level introduction to problems and methods in computationally assisted literary studies, with an emphasis on large-scale historical issues. Includes substantial instruction in programming techniques.
Contemporary criticism has a problem. We long ago gave up the idea that our task was to appreciate and explain a handful of great texts, replacing that goal with a much more important and ambitious one: to understand cultural production — both today and in centuries past — as a whole by way of the aesthetic objects it creates. But we have continued to practice our craft as if the methods developed in pursuit of the old project were the only ones suited to the new task. In consequence, we have remained less successful and less persuasive concerning the operations of large-scale cultural production than we would like to be.
This course is devoted to new methods and new objects in cultural and literary studies, specifically those enabled by digital media. It is not, however, a course in media studies. We’ll spend most of our time covering both what kinds of criticism are made possible by the availability of digital cultural objects (especially digitized texts), whether those objects are born digital or are post facto electronic surrogates, and how to perform the technical operations necessary to carry out such criticism. The course thus has a substantial technical component, one no more difficult than — but often quite different from — most of your existing experience in literary studies. That said, the idea is certainly not to replace the methods you’ve previously mastered, but to supplement them with new approaches, issues, and questions that will allow you to do better the kinds of cultural and literary criticism you’ve already begun to practice.
We’ll begin with some reflections on the origins of computational literary studies and the rationale for pursuing quantitative work in cultural domains. We’ll move quickly to engagements with representative work in the field and to learning how to perform computational analysis. Much of what we read will fall under the broad heading of data mining; we’ll study what that term means and how to do it. We’ll aim for a mix of theoretical elaborations concerning what is and is not implied by quantitative methods and how those methods integrate with conventional humanities approaches to interpretation, and (the second part of the mix) specific examples of achieved work in the field, plus technical exercises that will help you carry out similar work on your own.
We’ll read as widely as possible in both the major works of computational literary studies and in the often-divisive debates about the value of that work. Much of the schedule is set out below, but be aware that this is a rapidly evolving field; we’ll adjust our readings in response to new developments that emerge over the course of the semester and to our collective interests or hangups.
- I’ll repeat that this is a course with significant technical components. There’s no assumption that you’ll enter with any particular computational or mathematical expertise and there will be plenty of help (along with prebuilt tools) available along the way, but you must be willing to work outside your presumptive comfort zone as literary critics to develop the skills necessary to conduct the kinds of research we’ll explore. This is really exciting stuff and it’s not tremendously difficult, but if you shut down early in the face of the command line or a list of numbers, it’ll be impossible to do well.
- Our week-to-week assignments and work will be somewhat different from what you’re probably accustomed to. Most notably, there will be — in addition to the usual books and articles to read — graded problem sets or other exercises to complete. We’ll talk more about the form these will take as the semester progresses.
- Your final project will also take an unconventional form. You will work in groups of three or four students to produce either a piece of quantitatively informed literary scholarship or a grant proposal to perform the same. All members of the group will receive the same grade for the project (though not necessarily for the course).
- When we perform our own analyses, we’ll work primarily with a set of prepared corpora. This limits in some ways the range of problems we might address, but it helps with the well-known issue that 90% of real-world data-related effort is cleaning up your sources. That said, we can explore other corpora as our needs, desires, and technical abilities dictate.
- Guttag, John. Introduction to Computation and Programming Using Python (2013).
- Jockers, Matthew. Macroanalysis (2012).
- Moretti, Franco. Graphs, Maps, Trees (2005).
In addition, essays from the scholarly literature will be assigned and available via Sakai and/or the open Web.
NB. All dates and assignments subject to change. You’ll also note that there are two open weeks toward the end of the semester; this is by design, to allow for adjustments depending on projects, interests, and new developments in the field. All readings except those from the three required texts are linked here or will be available on Sakai.
Articles behind paywalls are linked via ND’s proxy server and indicated by ‘(Proxy).’ If you don’t have an ND account but do have access from another institution, try removing ‘.proxy.library.nd.edu’ from the linked URL.
Problem sets and related assignments will be posted here or on Sakai the week before they’re due.
Week 1 (8/31): Welcome and introduction
- Carver, Cecily. “Things I Wish Someone Had Told Me When I Was Learning How to Code.”
- Healy, Kieran. “Fuck Nuance.” (PDF)
To keep abreast of new developments in computational literary studies — interesting projects and publications, conferences, grant opportunities, job openings, etc. — there are a few sites and tools to check out. I’d seriously recommend getting on Twitter (if you aren’t already).
- Digital Humanities Now. A curated DH news/aggregation site run by the Roy Rosenzweig Center for History and New Media at George Mason. Helps tame the firehose that is Twitter.
- Firehose problems notwithstanding, Twitter. This is where much of the conversation in DH happens. No, it’s not all pictures of sandwiches. There are a bunch of lists of DH people to follow. I can’t recommend any one in particular, but I guess you could start with my own following list. Note, though, that (like the course), it’s skewed toward computational people/work at the expense of other corners of the field(s).
- Humanist. A (very) longstanding mailing list featuring announcements and discussion of all things DH.
- Set up accounts and tools after class (Twitter, Python, Java, etc.)
- Anaconda is the preferred Python distribution for the course on Mac, Linux, and Windows. Make sure you install the Python 3.4 version (not 2.7), and be sure you’re getting the version appropriate to your operating system.
Week 2 (9/7): Moretti and quantitative methods
- Moretti, Franco. Graphs, Maps, Trees.
- —. “Conjectures on World Literature.” New Left Review 1 (2000): 54–68. (Link is to the Web version, but get the PDF from there.)
- —. “‘Operationalizing’: or, the function of measurement in modern literary theory.”
- UVa’s “Command Line Boot Camp” (if necessary).
- Guttag, chapters 1-4. Submit answers or code (via the Assignments section of Sakai) for all “finger exercises” in ch. 2-4 (not ch. 1). Assuming you’re working with an iPython (Jupyter) notebook, upload the
.ipynbfile containing your answers. Each exercise should be labeled more or less clearly. It’s also OK to work with raw Python, but a notebook is the preferred approach. (Answers)
NB. The finger exercise on p. 23, which asks you to find the roots of an arbitrary integer by exhaustive enumeration, is poorly constructed in a way that produces a trivial answer (because
x**1 = x). To avoid this issue, search for roots between 2 and 5 (that is,
1 < pwr < 6).
Note that the Online Python Tutor is a useful resource for visualizing the execution of Python programs. Also, Guttag’s supplementary materials for the textbook — including video lectures, problem sets, and more — are available from MIT’s Open Courseware site.
Week 3 (9/14): Introducing computational techniques
- Jockers, Matthew. Macroanalysis.
- Guttag, ch. 5-7. Submit code for the two finger exercises in ch. 7. (Answers)
- Exercise: String manipulation (ipynb). Save this file and open it from iPython notebook to complete the problems. (Answers)
Week 4 (9/21): Packaged tools
- Michel, Jean-Baptiste et al. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science. NB. You may need to create a (free) account with Science to access the full text.
- Hope, Jonathan and Michael Witmore. “The Hundredth Psalm to the Tune of ‘Green Sleeves’: Digital Approaches to Shakespeare’s Language of Genre.” Shakespeare Q. (proxy [what’s this?])
- Clement, Tanya. “‘A Thing Not Beginning and Not Ending’: Using Digital Tools to Distant-Read Gertrude Stein’s The Making of Americans.” Literary and Linguistic Computing. (Proxy)
- Clement, Steger, Unsworth, and Uszkalo. “How Not to Read a Million Books.”
Review and explore these tools. Write up your use of one of them (c. 500 words, via Sakai).
- Guttag, ch. 8-10. Submit the answer to the one finger exercise in Chapter 10 (p. 130). (Answer) If you really want to (begin to) get your head around classes, inheritance, and object-oriented programming, you should take a crack at Guttag’s problem set #6 (see bottom of linked page). I would have said PS #5, but it depends on Google’s RSS service, which no longer exists. Boo.
Also, a reminder: Guttag’s OpenCourseWare materials — including video lectures, problem sets, quizzes, etc., but not the answers to the various finger exercises in the book — are all freely available if you need some additional resources.
- If you’re interested in corpus-linguistics approaches to language change, you might want to have a look at Mark Davies’ Corpus of Contemporary American English and his other corpora. Davies’ tools are a bit like Google Ngrams, but much more sophisticated (at the expense of some ease of use).
Week 5 (9/28): Literary geography
- Wilkens, Matthew. “The Geographic Imagination of Nineteenth-Century American Fiction.” ALH. (Proxy)
- Cordell, Ryan. “Reprinting, Circulation, and the Network Author in Antebellum Newspapers.” ALH (Proxy) and its technical supplement: David A. Smith, Ryan Cordell, and Abby Mullen. “Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers.” ALH. (Proxy)
- Moretti, Franco. Atlas of the European Novel. (excerpt, Sakai, PDF)
- Guttag, ch. 11-12. Submit answer to the single finger exercise in ch. 12 (p. 159). (Answer)
Week 6 (10/5): Social network analysis
- Elson, David K., Nicholas Dames, and Kathleen R. McKeown. “Extracting Social Networks from Literary Fiction.” ACL 2010.
- So, Richard Jean, and Hoyt Long. “Network Analysis and the Sociology of Modernism.” boundary2 40.2 (2013): 147–182. (Proxy)
- Dewitt, Anne. “Advances in the Visualization of Data: The Network of Genre in the Victorian Periodical Press.” Victorian Periodicals Review 48.2 (2015): 161–182. (Proxy)
- Warren, Christopher. “An Entry of One’s Own, or Why Are There So Few Women In the Early Modern Social Network?“
- Easley, David and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World. Cambridge UP. Read chapters 1-2.
Review these projects. Write up c. 500 words on one of them, describing either what you learned from it or how you might use its approaches in your own work.
- Six Degrees of Francis Bacon
- Kindred Britain
- Mapping the Republic of Letters
- Jonathan Goodwin’s Citational Network Graph of Literary Theory Journals and the associated blog post.
- Guttag, ch. 13. Nothing to submit, though you may want to practice some of the visualizations Guttag discusses.
- Exercise: Parsing XML and JSON. Note that the link brings you to a GitHub display of the exercise notebook. To work with it on your own system, download the notebook and load it via the iPython notebook browser. Note, too, that the second part of the exercise (on JSON and dictionaries) may be difficult; if you spend a couple of hours on it and can’t get the thing to work, just submit pseudocode describing the relevant steps to solve the problem. (Solution)
Week 7 (10/12): No class meeting
Week 8 (10/19): Fall break
Week 9 (10/26): Content analysis and topic models
- Grimmer, Justin, and Brandon M. Stewart. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis (2013). (Proxy)
- Quinn, Kevin M. et al. “How to Analyze Political Attention with Minimal Assumptions and Costs.” AJPS. (Proxy)
- Jockers, Matthew, and David Mimno. “Significant Themes in 19th-Century Literature.” Poetics 41.6 (2013): 750–769. (Proxy)
- Review Robert Nelson’s “Mining the Dispatch.”
- Review Signs @ 40, a special review of the feminist theory journal Signs that relies heavily on computational analysis.
- Guttag, ch. 14-16. Submit answer to the single finger exercise in ch. 15 (p.213). You’ll need the springData.txt file. Note that these chapters aren’t the most immediately on-point for the kind of work you’ll generally be doing in text analysis. But they introduce some important concepts (and caveats) in both statistics and programming that will be of use to us later, so we can’t skip them. (Answer)
Week 10 (11/2): Topic models II
- Goldstone, Andrew, and Ted Underwood. “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us.” New Literary History 45.3 (2014): 359–384. (Proxy)
- Tangherlini, Timothy R., and Peter Leonard. “Trawling in the Sea of the Great Unread: Sub-Corpus Topic Modeling and Humanities Research.” Poetics 41.6 (2013): 725–749. (Proxy)
- Schmidt, Benjamin M. “Words Alone: Dismantling Topic Models in the Humanities.” Journal of Digital Humanities 2.1 (2012).
- Guttag, ch. 17. Submit the answer to the finger exercise at the end of the chapter (p. 251).
- Exercise: NLP and entity extraction. (Answer)
Week 11 (11/9): Machine learning I: Unsupervised methods
- Ardanuy, Mariona Coll, and Caroline Sporleder. “Structure-Based Clustering of Novels.” EACL 2014 (2014): 31–39.
- Bamman, David, Ted Underwood, and Noah A. Smith. “A Bayesian Mixed Effects Model of Literary Character.” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore: Association for Computational Linguistics, 2014. 370–379.
- Mauch, Matthias et al. “The Evolution of Popular Music: USA 1960–2010.” Royal Society Open Science 2.5 (2015): 150081.
- Sinclair, Stefan. “Classifying Philosophical Texts.” Read through this iPython notebook, which covers some basics of machine learning with Python and Scikit-Learn.
- Guttag, ch. 18-19. Nothing to submit, just read these chapters. Chapter 19, in particular, is highly relevant.
Week 12 (11/16): Machine learning II: Supervised methods
- Piper, Andrew. “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel.” New Literary History (preprint; forthcoming 2015). (Sakai.) Note that this paper presents a handful of unsupervised approaches, but offers a clear path to the implementation of future supervised work.
- Underwood, Ted. “Understanding Genre in a Collection of a Million Volumes.” University of Illinois, Urbana-Champaign, 2014.
- Ashok, Vikas Ganjigunte, Song Feng, and Yejin Choi. “Success with Style: Using Writing Style to Predict the Success of Novels.” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle: 2013.
- Hettinger, Lena et al. “Genre Classification on German Novels.” TIR: Workshop on Text-based Information Retrieval. Valencia, Spain: 2015.
Week 13 (11/23): Literary markets
- Sapiro, Gisèle. “Translation and Symbolic Capital in the Era of Globalization: French Literature in the United States.” Cultural Sociology 9.3 (2015): 320-46. (Proxy)
- So, Richard Jean. “White Mythologies.” (Sakai)
- Wilkens, Matthew. “The Perpetual Fifties of American Fiction.” (Sakai)
- Use the code from Stéfan Sinclair’s notebook that we reviewed two weeks ago to classify any two texts of your choice. One of these texts should be philosophical (by whatever definition you choose), the other non-philosophical. Neither text should appear in the training set. (Project Gutenberg is a good source of public-domain texts.) Are the texts classified correctly?
Submit your code as an iPython notebook via Sakai and be prepared to discuss your results in class. (Answer)
Week 14 (11/30): Statistics, Pedagogy, Group Work
We’ll spend about half the seminar period doing open-ended group work in preparation for next week’s presentations. In addition, one piece of reading and one brief written assignment.
- Bulmer, M.G. Principles of Statistics. Excerpts from chapters 9 and 10 on statistics significance testing and statistical inference (especially frequentist and Bayesian approaches), respectively. (Sakai)
Submit (via Sakai) 150-200 words describing a quantitative exercise you might assign to an undergraduate literature class in lieu of a paper. This need not (obviously) be a fully elaborated assignment, nor must it require the students to write code. But give some meaningful thought to what students could do in pursuit of specific literary ends using data or tools supplied to them.
Week 15 (12/7): Presentations
Each group will present its ongoing work on the final project to the class. Plan on thirty minutes of presentation and thirty minutes of discussion for each.
Details about the final project, to be completed in groups of two to four people and submitted by 5:00 pm on December 18, 2015.