Beth Plale and Yiming Sun from the HathiTrust Research Center at Notre Dame
May 22, 2013 § Leave a Comment
A regrettably post facto — but no less enthusiastic — note that Beth Plale and Yiming Sun from the HathiTrust Research Center were on campus earlier this month to discuss recent developments at the HTRC. My colleague Eric Morgan posted a write-up of the event.
Hoping to build on our conversation with more collaboration in the future!
Matthew Sag at Notre Dame, Friday 4/12/13
April 8, 2013 § Leave a Comment

Matthew Sag, Associate Professor of Law at Loyola University Chicago, will be visiting Notre Dame this Friday (4/12) to give a lunchtime talk on copyright, text analysis, and the legal issues involved in digital humanities research. (Practical details below.)
Professor Sag has written widely on intellectual property law and was the lead author of an influential amicus brief in the recent HathiTrust case that cleared the way for “nonconsumptive” computational use of large digital archives. He’s an important thinker doing work in an area of law that touches more of us in the humanities every day.
All are welcome; hope you can join us. Light lunch will be served. Please do feel free to pass along word to anyone who might be interested!
Professor Sag’s visit is sponsored by the Notre Dame Working Group on Computational Methods in the Humanities and Sciences with generous support from the Office of the Provost.
Details …
- Who: Matthew Sag (Loyola University Chicago School of Law)
- What: A talk on — and discussion of — copyright and humanities research
- Where: LaFortune Gold Room (3rd floor; campus map)
- When: Friday, April 12, 11:45 am – 1:00 pm
For more information, contact Matt Wilkens (mwilkens@nd.edu).
Nathan Jensen on “Big” Data
March 23, 2013 § Leave a Comment
An interesting post from Nathan Jensen, a political scientist at Wash U, on the practicalities of working with non-public datasets (via @Ted_Underwood). Worth a read; here are the two main takeaways:
… theory is even more important when using “big data”. You can only really harness the richness of complicated micro data if you have clear micro theories.
Barriers to entry can create rents for a researcher, but they also make it much more difficult to replicate your results. This means that journal reviewers and grant reviewers can hold this against you, and the ultimate impact of your work might be lower. This isn’t a suggestion. It is a warning.
That second point’s a big one.
Kuhn on the comparative difficulty of the disciplines
February 1, 2013 § Leave a Comment
Noted for my own future use:
Unlike the engineer, and many doctors, and most theologians, the scientist need not choose problems because they urgently need solution and without regard for the tools available to solve them. In this respect, also, the contrast between natural scientists and many social scientists proves instructive. The latter often tend, as the former almost never do, to defend their choice of a research problem—e.g., the effects of racial discrimination or the causes of the business cycle—chiefly in terms of the social importance of achieving a solution. Which group would one then expect to solve problems at a more rapid rate? (Kuhn, Structure of Scientific Revolutions, 164).
How much has this changed as funding in the sciences has moved away from basic research?
DH Grad Course Reflections
December 31, 2012 § 2 Comments
This past semester I taught a grad seminar on digital humanities, one with more technical content than has been the case in my previous (undergrad) DH classes. On the whole, it went shockingly well; my students came in with very little background in either programming or statistical or quantitative analysis, and they left with enough of each of those things to do genuinely interesting work on their final projects. More importantly, they now know enough to go much further in the future, which many of them are already promising/threatening to do. I’m very pleased.
A few thoughts on specific aspects of the class and the syllabus. (NB. I posted an initial syllabus back in September, but it had some holes toward the end of the semester; the final, complete version (PDF) is now available.)
I was especially pleased with the response to the weekly problem sets, which were difficult and time consuming. The students ended up mostly working together in study groups, often meeting on campus over the weekend to finish the exercises for Monday’s seminar. This was exactly what I was hoping would happen. There’s no getting around the fact that programming, like language learning, requires hours and days of hands-on practice. Problem sets are an odd form in the humanities and I know there were students who thought the exercises consumed too much time or required more groping for answers than they would have liked, but I think this part of the course worked exactly as designed. I’m glad everyone was willing and able to struggle productively with them. On a semi-related front, grad school can be an isolating experience; one of the very few things I missed about chemistry when I moved to an English PhD program was the camaraderie and feedback I got from group work in the sciences. The problem sets were an opportunity to bring more collaborative structure to an English PhD program.
The biggest problem we faced was lack of time. This is a course that wants to be an intro to programming, an intro to statistics, a survey of recent work in DH (broadly defined), a theoretical treatment of digital media, a chance to think about the future of the discipline, and a grad-level seminar on nineteenth-century American fiction (nineteenth century rather than twentieth due to corpus constraints). We spent two weeks at the beginning of the semester on media studies (McLuhan, Galloway); that was fun, but it was ultimately to the side of our main concerns. The time could have been more profitably spent on an extra week of intro programming concepts and another week later in the semester on advanced computational topics.
The exercises from The Programming Historian were invaluable and I recommend them to anyone teaching a similar course. But/and I’d add a week of more introductory proper CS concepts (branching, looping, variables, return values, etc.) before beginning them. This is true even though the PH exercises really do start from “Hello, world”; they then ramp up by way of very concrete examples, which the students sometimes found difficult to separate from the concepts those examples were meant to convey.
I’d spend two weeks (rather than one) on mapping and GIS, which was popular and useful for several final projects.
I’d also spend more time on visualization in general, maybe assigning Tufte’s book and all of Yau’s (rather than just one chapter). An additional merit of Yau’s book is that it would provide some intro to R, which the students said they’d like.
I’d assign all of Jockers’ Macroanalysis once it’s available. Many thanks to Matt for sharing a handful of draft chapters with us.
Not sure it’s worth reviewing in depth the merits of individual articles and chapters (of which we read many); by the time I teach this course again in a year or two, most of those will probably drop off in favor of newer results. This is both the joy and the frustration of teaching DH at the moment.
Geolocation Correction at Uses of Scale
December 30, 2012 § Leave a Comment
I’ve just posted a writeup and some data on hand-corrected geolocation extraction over at the Uses of Scale site (associated with the Mellon grant Ted Underwood, Robin Valenza, and I are running). The idea is to share as much as possible of the tediously achieved process stuff that’s required for computational research but that isn’t itself “achieved” results. In addition to my post on geography, these’s also information (from Ted and others) on OCR correction and on removing running headers from scanned texts. Not always sexy, but we hope it’ll help others do related work without having to start entirely from scratch. And I suppose we’re also selfishly hoping for feedback and improvements from anyone else who might have experience dealing with related issues.
