Video of My Talk on Geolocation at Illinois

I gave a talk on my recent work — titled “Where Was the American Renaissance: Computation, Space, and Literary History in the Civil War Era” — as part of the Uses of Scale planning meeting at Illinois earlier this month. Ted Underwood — convener of the meeting and driving force behind the Uses of Scale project — has posted a video of the event, which includes my talk as well as Ted’s extended intro and a follow-up round table discussion on future directions in literary studies.

The event was lovely; my thanks to Ted for the invitation, to the attendees for some very useful discussion, and to the Mellon Foundation and the University of Illinois for funding the Uses of Scale project, with which I’ve been involved as a co-PI over the past year.

Matthew Sag at Notre Dame, Friday 4/12/13

Matthew Sag, Associate Professor of Law at Loyola University Chicago, will be visiting Notre Dame this Friday (4/12) to give a lunchtime talk on copyright, text analysis, and the legal issues involved in digital humanities research. (Practical details below.)

Professor Sag has written widely on intellectual property law and was the lead author of an influential amicus brief in the recent HathiTrust case that cleared the way for “nonconsumptive” computational use of large digital archives. He’s an important thinker doing work in an area of law that touches more of us in the humanities every day.

All are welcome; hope you can join us. Light lunch will be served. Please do feel free to pass along word to anyone who might be interested!

Professor Sag’s visit is sponsored by the Notre Dame Working Group on Computational Methods in the Humanities and Sciences with generous support from the Office of the Provost.

Details …

  • Who: Matthew Sag (Loyola University Chicago School of Law)
  • What: A talk on — and discussion of — copyright and humanities research
  • Where: LaFortune Gold Room (3rd floor; campus map)
  • When: Friday, April 12, 11:45 am – 1:00 pm

For more information, contact Matt Wilkens (mwilkens@nd.edu).

Nathan Jensen on “Big” Data

An interesting post from Nathan Jensen, a political scientist at Wash U, on the practicalities of working with non-public datasets (via @Ted_Underwood). Worth a read; here are the two main takeaways:

… theory is even more important when using “big data”. You can only really harness the richness of complicated micro data if you have clear micro theories.

Barriers to entry can create rents for a researcher, but they also make it much more difficult to replicate your results. This means that journal reviewers and grant reviewers can hold this against you, and the ultimate impact of your work might be lower. This isn’t a suggestion. It is a warning.

That second point’s a big one.

Kuhn on the comparative difficulty of the disciplines

Noted for my own future use:

Unlike the engineer, and many doctors, and most theologians, the scientist need not choose problems because they urgently need solution and without regard for the tools available to solve them. In this respect, also, the contrast between natural scientists and many social scientists proves instructive. The latter often tend, as the former almost never do, to defend their choice of a research problem—e.g., the effects of racial discrimination or the causes of the business cycle—chiefly in terms of the social importance of achieving a solution. Which group would one then expect to solve problems at a more rapid rate? (Kuhn, Structure of Scientific Revolutions, 164).

How much has this changed as funding in the sciences has moved away from basic research?

DH Grad Course Reflections

This past semester I taught a grad seminar on digital humanities, one with more technical content than has been the case in my previous (undergrad) DH classes. On the whole, it went shockingly well; my students came in with very little background in either programming or statistical or quantitative analysis, and they left with enough of each of those things to do genuinely interesting work on their final projects. More importantly, they now know enough to go much further in the future, which many of them are already promising/threatening to do. I’m very pleased.

A few thoughts on specific aspects of the class and the syllabus. (NB. I posted an initial syllabus back in September, but it had some holes toward the end of the semester; the final, complete version (PDF) is now available.)

I was especially pleased with the response to the weekly problem sets, which were difficult and time consuming. The students ended up mostly working together in study groups, often meeting on campus over the weekend to finish the exercises for Monday’s seminar. This was exactly what I was hoping would happen. There’s no getting around the fact that programming, like language learning, requires hours and days of hands-on practice. Problem sets are an odd form in the humanities and I know there were students who thought the exercises consumed too much time or required more groping for answers than they would have liked, but I think this part of the course worked exactly as designed. I’m glad everyone was willing and able to struggle productively with them. On a semi-related front, grad school can be an isolating experience; one of the very few things I missed about chemistry when I moved to an English PhD program was the camaraderie and feedback I got from group work in the sciences. The problem sets were an opportunity to bring more collaborative structure to an English PhD program.

The biggest problem we faced was lack of time. This is a course that wants to be an intro to programming, an intro to statistics, a survey of recent work in DH (broadly defined), a theoretical treatment of digital media, a chance to think about the future of the discipline, and a grad-level seminar on nineteenth-century American fiction (nineteenth century rather than twentieth due to corpus constraints). We spent two weeks at the beginning of the semester on media studies (McLuhan, Galloway); that was fun, but it was ultimately to the side of our main concerns. The time could have been more profitably spent on an extra week of intro programming concepts and another week later in the semester on advanced computational topics.

The exercises from The Programming Historian were invaluable and I recommend them to anyone teaching a similar course. But/and I’d add a week of more introductory proper CS concepts (branching, looping, variables, return values, etc.) before beginning them. This is true even though the PH exercises really do start from “Hello, world”; they then ramp up by way of very concrete examples, which the students sometimes found difficult to separate from the concepts those examples were meant to convey.

I’d spend two weeks (rather than one) on mapping and GIS, which was popular and useful for several final projects.

I’d also spend more time on visualization in general, maybe assigning Tufte’s book and all of Yau’s (rather than just one chapter). An additional merit of Yau’s book is that it would provide some intro to R, which the students said they’d like.

I’d assign all of Jockers’ Macroanalysis once it’s available. Many thanks to Matt for sharing a handful of draft chapters with us.

Not sure it’s worth reviewing in depth the merits of individual articles and chapters (of which we read many); by the time I teach this course again in a year or two, most of those will probably drop off in favor of newer results. This is both the joy and the frustration of teaching DH at the moment.

Geolocation Correction at Uses of Scale

I’ve just posted a writeup and some data on hand-corrected geolocation extraction over at the Uses of Scale site (associated with the Mellon grant Ted Underwood and I are running). The idea is to share as much as possible of the tediously achieved process stuff that’s required for computational research but that isn’t itself “achieved” results. In addition to my post on geography, these’s also information (from Ted and others) on OCR correction and on removing running headers from scanned texts. Not always sexy, but we hope it’ll help others do related work without having to start entirely from scratch. And I suppose we’re also selfishly hoping for feedback and improvements from anyone else who might have experience dealing with related issues.

Digital Humanities Grad Syllabus

I’m teaching a graduate seminar on digital humanities this semester, ENGL 90127 (yes, Notre Dame has insane course numbers). The class involves a small amount of media studies (McLuhan, Galloway) and a whole lot of computational and quantitative work (both lit reviews and extensive hands-on practice). I’m excited about this; I’ve taught some version of DH many times in the past, but never with this degree of technical expectation. My students have been great so far and I’m looking forward to the programming work.

A PDF of the initial syllabus is available for those who are interested. As you’ll see, I’ve left some of the details fuzzy toward the end in order to respond to student needs and interests. Will try to remember to post a final version at the end of the semester that reflects the specifics.

[Update: The final syllabus and some reflections on the course are now available.]

Population Growth and Literary Attention

I just posted an item about the literary uses of Chicago and New Orleans on the new Scalable Reading group blog (to which Martin Mueller, Ted Underwood, and Steve Ramsay are also contributors). A brief preview:

There’s a lot of jitter in the New Orleans numbers, but a couple of things seem clear:

  1. Through most of the period 1851–75, there’s much more literary attention paid to New Orleans than to Chicago.
  2. Interest in Chicago picks up meaningfully after about 1870.
  3. Interest in New Orleans wanes a bit around the same time, but only to the extent that the two cities occur at about equal rates in the last few years of the corpus.

[And in sum:] I’m sure there’s some novelty-driven interest in emerging cities and demographic changes, but at least in the case of Chicago and New Orleans, it doesn’t appear to be the dominant factor driving literary attention.

This is also a chance to put in a plug for Scalable Reading, both the blog and the concept. Well worth a read, I think, my own contribution notwithstanding.

Fish’s Object

Stanley Fish has a piece in the New York Times today that makes some use of my contribution to Debates in the Digital Humanities. The DH Debates collection isn’t online yet, but similar work of mine can be found in Post45 and (with updates) in the proceedings of the Chicago Colloquium on Digital Humanities and Computer Science (PDF).

Jeremy Rosen anticipated most of what Fish says in his lengthy response to the Post45 essay. My reply to Rosen probably works equally well as a response to Fish.

Here I’ll only add that while I appreciate the attention, I have my doubts about Fish’s sincerity when he proposes to defend the pursuit of authorial intent (in Milton, no less!).

[My colleague Steve Fallon—the distinguished Miltonist—observes that Fish frequently uses a different, constructivist account of imputed authorial intent in his own criticism. But I’d maintain that this is sufficiently different from the naïve version offered in the column as to be an entirely distinct thing.]

Update: Ted Underwood has a smart reply on the relationship between theory and experiment or, more humanistically, where our ideas come from.

Update 2: Mark Lieberman at Language Log runs some revealing numbers on the P’s and B’s in Areopagitica that were part of Fish’s set piece.

Update 3: Martin Mueller has a long and wide-ranging response to Fish’s series of articles, including a defense-cum-clarification of my own work. Worth a read and I thank him for it.