Course: English 90127: Digital Humanities, English 90172: Digital Humanities II
Instructor: Matthew Wilkens
Meetings: Thursday 5:00-7:30ish
Location: Hesburgh Library Center for Digital Scholarship
Office hours: 320 Decio; T 2:00-5:00 and by appointment. Reserve office hour slots at bit.ly/mw_oh.
[Skip to the schedule of readings and work.]
A graduate-level introduction to problems and methods in digital humanities, with an emphasis on computational and quantitative literary studies, including programming techniques. For DH II students, a leadership role in formulating and carrying out projects in computationally assisted literary studies.
Contemporary criticism has a problem. We long ago gave up the idea that our task was to appreciate and explain a handful of great texts, replacing that goal with a much more important and ambitious one: to understand cultural production as a whole by way of the aesthetic objects it creates. But we have continued to practice our craft as if the methods developed in pursuit of the old project were the only ones suited to the new task. In consequence, we have remained less successful and less persuasive concerning the operations of large-scale cultural production than we would like to be.
This course is devoted to new methods and new objects in cultural and literary studies, specifically those enabled by digital media. It is not, however, a course in media studies. We’ll spend most of our time covering both what kinds of criticism are made possible by the availability of digital cultural objects (especially digitized texts), whether those objects are born digital or are post facto electronic surrogates, and how to perform the technical operations necessary to carry out such criticism. The course thus has a substantial technical component, one no more difficult than — but substantially different from — most of your existing experience in literary studies. That said, the idea is certainly not to replace the methods you’ve previously mastered, but to supplement them with new approaches, issues, and questions that will allow you to do better the kinds of cultural and literary criticism you’ve already begun to practice.
We’ll begin with some reflections on how to define digital humanities and the rationale for pursuing quantitative and/or computationally assisted literary studies. We’ll move quickly to engagements with representative work in the field and to learning how to perform computational work. Much of what we read will fall under the broad heading of data mining; we’ll study what that term means and how to do it. We’ll aim for a mix of theoretical elaborations concerning what is and is not implied by quantitative methods and how those methods integrate with conventional humanities approaches to interpretation, and (the second part of the mix) specific examples of achieved work in the field, plus technical exercises that will help you carry out similar work on your own.
We’ll read as widely as possible in both the major works of digital humanities theory and practice and in the often-divisive debates about the value of that work. Much of the schedule is set out below, but be aware that DH is a rapidly evolving field; we’ll adjust our readings in response to new developments that emerge over the course of the semester and to our collective interests or hangups.
- I’ll repeat that this is a course with significant technical components. There’s no assumption that you’ll enter with any particular computational or mathematical expertise and there will be plenty of help (along with prebuilt tools) available along the way, but you must be willing to work outside your presumptive comfort zone as literary critics to develop the skills necessary to conduct the kinds of research we’ll explore. This is really exciting stuff and it’s not tremendously difficult, but if you shut down early in the face of the command line or a list of numbers, it’ll be impossible to do well.
- Our week-to-week assignments and work will be somewhat different from what you’re probably accustomed to. Most notably, there will be — in addition to the usual books and articles to read — graded problem sets or other exercises to complete. We’ll talk more about the form these will take as the semester progresses.
- Your final project will also take an unconventional form. You will work in groups of three or four students to produce either a piece of quantitatively informed literary scholarship or a grant proposal to perform the same. All members of the group will receive the same grade for the project (though not necessarily for the course).
- When we perform our own analyses, we’ll work primarily with a prepared corpus of nineteenth-century American fiction. There are legal cum technical reasons we can’t do much with the twentieth century, but we can explore other corpora as our needs, desires, and technical abilities dictate.
- Guttag, John. Introduction to Computation and Programming Using Python (2013).
- Jockers, Matthew. Macroanalysis (2012).
- Moretti, Franco. Graphs, Maps, Trees (2005).
- Tufte, Edward. The Visual Display of Quantitative Information (2001).
In addition, essays from the scholarly DH literature will be assigned and available via Sakai and/or the open Web.
NB. All dates and assignments subject to change. You’ll also note that there are two open weeks toward the end of the semester; this is by design, to allow for adjustments depending on projects, interests, and new developments in the field. All readings except those from the four required texts are linked here or will be available on Sakai.
Articles behind paywalls are linked via ND’s proxy server and indicated by ‘(proxy).’ If you don’t have an ND account but do have access from another institution, try removing ‘.proxy.library.nd.edu’ from the linked URL.
Problem sets and related assignments will be posted here or on Sakai the week before they’re due.
Week 1 (1/16): Welcome and introduction
- Carver, Cecily. “Things I Wish Someone Had Told Me When I Was Learning How to Code.”
- Underwood, Ted. “Why Digital Humanities Isn’t Actually ‘The Next Thing in Literary Studies.'”
- Wilkens, Matthew. “Canons, Close Reading, and the Evolution of Method.”
To keep abreast of new developments in digital humanities — interesting projects and publications, conferences, grant opportunities, job openings, etc. — there are a few sites and tools to check out. I’d seriously recommend getting on Twitter (if you aren’t already).
- Digital Humanities Now. A curated DH news/aggregation site run by the Roy Rosenzweig Center for History and New Media at George Mason. Helps tame the firehose that is Twitter.
- Firehose problems notwithstanding, Twitter. This is where much of the conversation in DH happens. No, it’s not all pictures of sandwiches. There are a bunch of lists of DH people to follow. I can’t recommend any one in particular, but I guess you could start with my own following list. Note, though, that (like the course), it’s skewed toward computational people/work at the expense of other corners of the field.
- Humanist. A (very) longstanding mailing list featuring announcements and discussion of all things DH.
- Set up accounts and tools after class (Twitter, Python, Java, Spyder/IDLE, etc.)
Week 2 (1/23): Intro to quantitative methods
- Moretti, Franco. Graphs, Maps, Trees.
- —. “‘Operationalizing’: or, the function of measurement in modern literary theory.”
- UVa’s “Command Line Boot Camp” (if necessary).
- Guttag, chapters 1-4. Submit answers or code (via the Assignments section of Sakai) for all “finger exercises” in ch. 2-4 (not ch. 1). (Answers)
Note that the Online Python Tutor is a useful resource for visualizing the execution of Python programs.
Week 3 (1/30): Packaged tools
- Clement, Steger, Unsworth, and Uszkalo. “How Not to Read a Million Books.”
- Michel, Jean-Baptiste et al. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science. NB. You may need to create a (free) account with Science to access the full text.
- Hope, Jonathan and Michael Witmore. “The Hundredth Psalm to the Tune of ‘Green Sleeves’: Digital Approaches to Shakespeare’s Language of Genre.” Shakespeare Q. (proxy [what’s this?])
Review and explore these tools. Write up your use of one of them (c. 500 words, via Sakai).
- Guttag, ch. 5-7. Submit code for the two finger exercises in ch. 7. (Answers)
Week 4 (2/6): Text processing and entity extraction
- Pumfrey, Stephen, Paul Rayson, and John Mariani. “Experiments in 17th Century English: Manual versus Automatic Conceptual History.” LLC. (proxy)
- Van Dalen-Oskam, Karina. “Names in Novels: An Experiment in Computational Stylistics.” LLC. (proxy)
- Grimmer, Justin, and Brandon M. Stewart. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis.
- Guttag, ch. 8-10. Submit the answer to the one finger exercise in Chapter 10 (p. 130). (Answer)
If you really want to (begin to) get your head around classes, inheritance, and object-oriented programming, you should take a crack at Guttag’s problem set #6 (see bottom of linked page). I would have said PS #5, but it depends on Google’s RSS service, which no longer exists. Boo.
Also, a reminder: Guttag’s OpenCourseWare materials — including video lectures, problem sets, quizzes, etc., but not the answers to the various finger exercises in the book — are all freely available if you need some additional resources.
- Stanford CoreNLP tools. Read the package overview and linked discussions of the included tools (POS tagger, NER, parser, coreference resolver, and sentiment analyzer).
- If you’re interested in the corpus-linguistics approaches described by Pumfrey et al., you might want to have a look at Mark Davies’ Corpus of Contemporary American English and his other corpora. Davies’ tools are a bit like Google Ngrams, but much more sophisticated (at the expense of some ease of use).
- Programming Historian, “Intro to Beautiful Soup.” Submit the CSV you produce after working through the full Beautiful Soup tutorial (
Note: If you’re using Spyder as your IDE, you’ll need to add the system Python module directory to your PYTHONPATH in order for Spyder to see BeautifulSoup once you’ve installed it. On recent Macs, this directory is
- Add that directory via Spyder’s PYTHONPATH manager.
- Tools -> Update module names list.
- Restart Spyder.
For (slightly) more info, see this Stackoverflow post. It’s a good idea, for our purposes, to add the system directory in any case, since that’s where any new non-default packages will be installed. The fully Right Thing To Do would be to use virtual environments, but that’s more complicated than we need.
Week 5 (2/13): Visualization
- Tufte, Edward. The Visual Display of Quantitative Information.
- Manovich, Lev. “Media Visualization: Visual Techniques for Exploring Large Media Collections.” (Word doc; published in Media Studies Futures, ed. Kelly Gates, Blackwell, 2012.)
- Guttag, ch. 11. No exercises for this one, but I’d strongly encourage you to make sure
pylabworks on your machine and that you can reproduce at least the simplest plots in the chapter.
- Exercise: NLP and entity extraction. (Answers/code)
Week 6 (2/20): Social network analysis
- Warren, Christopher. “An Entry of One’s Own, or Why Are There So Few Women In the Early Modern Social Network?“
- Easley, David and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World. Cambridge UP. Read chapters 1-2.
- Elson, David K., Nicholas Dames, and Kathleen R. McKeown. “Extracting Social Networks from Literary Fiction.” ACL 2010.
- Liu, Alan. “From Reading to Social Computing.” MLA Commons.
Review these projects. Write up c. 500 words on one of them, describing either what you learned from it or how you might use its approaches in your own work.
- Six Degrees of Francis Bacon
- Kindred Britain
- Mapping the Republic of Letters
- Jonathan Goodwin’s Citational Network Graph of Literary Theory Journals
- Guttag, ch. 12-13. Submit answer to the single finger exercise in ch. 12 (p. 159). (Answer)
Week 7 (2/27): Mixed techniques
- Jockers, Matthew. Macroanalysis.
- Guttag, ch. 14-16. Submit answer to the single finger exercise in ch. 15 (p.213). You’ll need the springData.txt file. (Answer)
Note that these chapters aren’t the most immediately on-point for the kind of work you’ll generally be doing in text analysis. But they introduce some important concepts (and caveats) in both statistics and programming that will be of use to us later, so we can’t skip them.
Week 8 (3/6): Databases and machine learning
- Underwood, Ted and Jordan Sellers. “The Emergence of Literary Diction.” DHQ.
- Underwood, Ted et al. “Mapping Mutable Genres in Structurally Complex Volumes.” IEEE Big Data 2013.
- Bamman, David, Jacob Eisenstein, and Tyler Schnoebelen. “Gender in Twitter: Styles, Stances, and Social Networks.”
- Ashok, Vikas Ganjigunte, Song Feng, and Yejin Choi. “Success with Style: Using Writing Style to Predict the Success of Novels.” ACL/EMNLP 2013.
- Guttag, ch. 17. Submit the answer to the finger exercise at the end of the chapter (p. 251). (Answer)
- Read all sections of the W3Schools introduction to SQL.
- Exercise: MySQL queries. Submit your answers via Sakai. (Answer)
Week 9 (3/13): Spring break
No class meeting.
Week 10 (3/20): GIS
- Cooper, David, and Ian N Gregory. “Mapping the English Lake District: A Literary GIS.” Trans. Inst. Brit. Geo. (proxy)
- Gregory, Ian N., and Andrew Hardie. “Visual GISting: Bringing together Corpus Linguistics and Geographical Information Systems.” LLC. (proxy)
- Thacker, Andrew. “The Idea of a Critical Literary Geography.” New Formations. (Sakai)
- Moretti, Franco. Atlas of the European Novel. (excerpt, Sakai)
- Take a close look at the ORBIS project, a.k.a “Google Maps for the ancient world.”
- You may also want to review the Racial Dotmap and, regarding historical tilesets and shapefiles, have a look at the Ancient World Mapping Center’s historically corrected map tiles and associated applications.
- Guttag, ch. 18. No exercises, just read. If you want to try Guttag’s code without typing it up yourself, you can grab it from the MITP site or from our GitHub repo.
- UVa Scholars’ Lab Spatial Humanities Project. “Online GIS Using ArcGIS.com.”
- Exercise: After you’ve worked through the tutorial, create and share a map that’s of some interpretive interest to you. We’ll discuss these in class, so upload a link to your map via the assignments section of Sakai.
Note that you can search ArcGIS.com for predefined datasets/layers, some of which are pretty interesting. You can also see a badly flawed version of the literary locations data done in the ArcGIS.com tool; we’ll discuss that map’s significant problems in class. You can also fool around with the same data yourself, either by cloning the relevant layer or by importing the source data. FYI, you could generate the same data yourself by querying the
geodatabase from last week’s exercise as follows:
SELECT lookup_result AS Location, `type` AS Category, lat AS Lat, lon AS Lon, count(*) AS Count FROM results GROUP BY Lat, Lon, Category HAVING count > 11 ORDER BY Count DESC;
We limit the result set to counts over 11 to keep the total number of rows below ArcGIS.com’s ceiling of 1,000.
Week 11 (3/27): GIS II, introduction to topic models
- Wilkens, Matthew. “The Geographic Imagination of Nineteenth-Century American Fiction.” ALH. (proxy)
- Pröll, Simon. “Detecting Structures in Linguistic Maps: Fuzzy Clustering for Pattern Recognition in Geostatistical Dialectometry.” LLC. (proxy)
- Blevins, Cameron. “Topic Modeling Martha Ballard’s Diary.”
- Underwood, Ted. “Topic modeling made just simple enough.”
- Underwood, Ted, and Andrew Goldstone. “What Can Topic Models of PMLA Teach Us about the History of Literary Scholarship?“
- Review Robert Nelson’s “Mining the Dispatch.”
- Mapping in Python. Strictly FYI: The maps in my article were built in R; I can share code if you’d like, but I haven’t yet gotten around to cleaning things up for general use. The rworldmap package is especially useful. If you’re looking for a Python-based solution, you might take a look at Vincent/Vega.
- Guttag, ch. 19. No exercises to complete. You can download working code and data to follow along.
- Exercise: “Getting Started with Topic Modeling and MALLET” from The Programming Historian. No need to submit any results, but do make sure you’ve taken a crack at making sense of the little model you’ve produced. You’ll be asked to do this on a larger scale next week.
Note that if you installed a Java Development Kit (JDK) at the beginning of the semester, you won’t need to do it again here (hence can skip that part of the tutorial).
Week 12 (4/3): Topic modeling II
- Blei, David M. “Probabilistic Topic Models.” Comm. ACM.
- Mimno, David. “Computational Historiography: Data Mining in a Century of Classics Journals.” JCCH. (proxy)
- Mimno, David. “The Details: Training and Validating Big Models on Big Data.” JDH. (video)
- Chaney, A. J. B., and D. M. Blei. “Visualizing Topic Models.” AAAI Conf. Web. Soc. Media.
- Schmidt, Benjamin M. “Words Alone: Dismantling Topic Models in the Humanities.” JDH.
- Exercise: Interpreting topic models. I’ve run a 100-topic model of the Wright American Fiction corpus (which includes 1,543 volumes published in the US between 1789 and 1875). Have a look at the resulting visualization (major thanks to Andrew Goldstone for the vis code) and try to make sense of what’s going on in it. How do the topics change in frequency over time? Which ones appear to be related to one another? Do the topics associated with individual texts that you’ve read seem to square with your knowledge of those texts? Do you see evidence of the limitations identified by Schmidt? What, if anything, are you prepared to say about C19 US fiction in light of the model?
No need to write up your thoughts, but come to class with at least a couple of concrete observations that you’re prepared to discuss at modest length.
Week 13 (4/10): Advanced Machine Learning
Finishing off articles that we didn’t have the chance to discuss in the last couple of weeks.
- Bamman, David, Brendan O’Connor, and Noah A. Smith. “Learning Latent Personas of Film Characters.” ACL 2013.
Post-semester update: See also some related work on literature from Bamman, Underwood, and Smith.
- Wang, X., and A. McCallum. “Topics over Time.” ACM KDDM. (proxy)
- Quinn, Kevin M. et al. “How to Analyze Political Attention with Minimal Assumptions and Costs.” AJPS. (proxy)
Week 14 (4/17): Conclusions and Project Work
Mostly unstructured time to work on group projects. Some reflections on the semester and thoughts for future work.
Week 15 (4/24): Presentations
Group presentations of project work. Each group will have about 30 minutes; be sure to reserve a significant portion of that time for questions and discussion.
Finals Week: Final projects due Friday, 5/8 by 5:00 pm
See the project assignment and guidelines for detailed information.