NEH Grant for Textual Geographies Project

map-nations-allI’m pleased to announce that the Textual Geographies Project has been awarded a $325,000 Digital Humanities Implementation Grant from the National Endowment for the Humanities. I’m hugely grateful for the NEH’s generous support and for previous startup funding from the ACLS and from the Notre Dame Office of Research.

I’m excited to work with project partners at Notre Dame, at the HathiTrust Research Center, and around the world. The grant will support further development of a Web-based front end for the enormous amount of textual-geographic data that the project has already generated, as well as ongoing improvements to the data collection process, new research using that data, and several events to engage scholars and members of the public who are interested in geography, history, literature, and the algorithmic study of culture. I’ll also be hiring a project postdoc for the 2017-19 academic years.

More information on all these fronts in the months ahead!

Postdoc in Computational Textual Geography

london-mapI’m seeking a postdoctoral fellow for a two-year appointment to work on aspects of the Textual Geographies project and to collaborate on research of mutual interest in my lab in the Department of English at Notre Dame.

The ideal candidate will have demonstrated expertise in literary or cultural studies, machine learning or natural language processing, and geographic or spatial analysis, as well as a willingness to work in new areas. The fellow will contribute to the ongoing work of the Textual Geographies project, an NEH-funded collaboration between literary scholars, historians, geographers, and computer scientists to map and analyze geographic references in more than ten million digitized volumes held by the HathiTrust Digital Library. Areas of current investigation include machine learning for toponym disambiguation, named entity recognition in book-length texts, visualization of uncertainty in geospatial data sets, and cultural and economic analysis of large-scale, multinational literary geography. We welcome applications from candidates whose research interests might expand the range of our existing projects, as well as from those whose expertise builds on our present strengths.

Interdisciplinary collaboration with other groups at Notre Dame is possible. The fellow will also have access to the Text Mining the Novel project, which has helped to underwrite the position.

Apply

Details and application via Interfolio (free). Letters not required for initial stage. Review begins immediately and continues until position is filled. Salary $50,000/year plus research stipend. Initial appointment for one year, renewable for a second year subject to satisfactory progress. Teaching possible but not required.

Computational Approaches to Genre in CA

fig5New year, catch-up news. I have an article in CA, the journal of cultural analytics, on computational approaches to genre detection in twentieth-century fiction. The piece came out back in November, but, well, it’s been a busy year.

The big finding — beyond what I happen to think is a nifty way of considering genre — is that certain highly canonical, male-authored novels of the mid-late twentieth century (by the likes of Updike, Bellow, Vonnegut, DeLillo, etc.) resemble one another about as closely as do mid-century hard-boiled detective stories. That is, very closely indeed. There are a couple of conclusions one might draw from this; my preferred interpretation is that the functional definition of literary fiction in the postwar period (and probably everywhere else) remains much too narrow. But there are other possibilities as well …

CA, by the way, has had some really great work of late. Andrew Piper’s article on “fictionality” is especially worth a read; Piper shows that it’s not just possible but really pretty easy to separate fiction from nonfiction using a basic set of lexical features.

Masterclass and Lecture at Edinburgh

I’m giving a two-and-a-half day masterclass on quantitative methods for humanities researchers at the University of Edinburgh, 19-21 September, 2016. There’s a rough syllabus available now, with more materials to be added as the event draws nearer. If you’re in Scotland and want to attend, there may be (literally) a place or two left; details at the Digital Humanities Network Scotland.

There will also be a public lecture on the evening of Wednesday, September 21, featuring a response and discussion with the ever-excellent Jonathan Hope (Strathclyde).

I’m grateful to Maria Filippakopoulou for organizing the visit and to the Edinburgh Fund of the University of Edinburgh for providing financial support.

Come Work with Me!

Update (20 August 2016)

I wasn’t able to hire anyone for this post, but will rerun the search this fall. More information forthcoming soon. In the meantime, if you happen to know anyone suitable — especially with a strong background in NLP and an interest in humanities problems — please let me know so that I can get in touch. Thanks!


Original post

I’m hiring a postdoc for next year (2016-17) to work on literature, geography, and computational methods. Wide latitude in training and background; interest in working on a very large geographic dataset a big plus. Full details and application via Interfolio. Review begins next week. The position will remain open until filled.

Many thanks to the Text Mining the Novel Project for helping to underwrite the post.

Literature and Economics at Chicago

I’m giving a talk next Friday (5/22) on literature and economic geography as part of Richard Jean So and Hoyt Long’s Cultural Analytics conference at Chicago. (Talking econ at Chicago. That’s not terrifying at all!) The list of speakers is really impressive, present company excluded. If you’re in or near Chicago, hope to see you there.

My talk will be closely related to my recent lecture at Kansas, video of which is available on YouTube (and embedded below). There’s also some enlightening discussion on Facebook; you might need to be friends with Richard So to see it, but you should be friends with him anyway …

Looking forward to seeing folks in Chicago!

rsq_na_fit.png

Literary Attention Lag

I gave a short talk on geography and memory at this year’s MLA in Vancouver (session info). I didn’t work from a script, but here’s the core material and a few key slides.

So the problem I was trying to address was this: How is geographic attention in literary fiction related to the distribution of population at the time the fiction is published? And what do the details of the relation between them tell us about literary memory? These are questions I just barely touched in my ALH article on the literary geography of the Civil-War period last year, and I thought they were worth a bit more consideration.

To review, we know that there’s a moderate correlation between the population of a geographic location and the amount of literary attention paid to it (measured by the number of times that place is mentioned in books). New York City is used in American literature more frequently than is Richmond, for instance. (This is all using a corpus of about a thousand volumes of U.S. fiction published between 1850 and 1875, but I strongly suspect the correlation holds elsewhere; I’ll be able to say more definitively and share results in a month or two.)

But there is, in at least some instances, a temporal component involved as well. After all, population isn’t a stable feature of cities. Witness the cases of New Orleans and Chicago:

Population, 1820-1900

Populations of New Orleans and Chicago, 1820-1900

Literary mentions, 1850-1875

Mentions of New Orleans and Chicago, 1850-1875

In short, those cities were about the same size in 1860, but New Orleans — the older of the two by far — was used much more often in fiction at the time. It appears to have taken a while for Chicago to catch on in the literary imagination.

I wondered, then, whether this was a generalizable trend and, if so, whether I could quantify and explain it. I considered four informal hypotheses about the temporal relationship between population and literary-geographic representation (if I were feeling a little grand, I’d refer to these as reduced models of literary-geographic memory).

  1. National or deep. Not all the way to deep time in Wai Chee Dimock’s sense, but maybe closer to Sacvan Bercovitch’s model of Puritan inheritance. Literature in the nineteenth century represents the nation as it was in the eighteenth.
  2. Formative-psychological. Authors (and readers?) represent the world as it existed during their formative years, for whatever value of “formative” we might choose. Presumably their childhood or school years.
  3. Presentist. We find in books largely the world as it is at the time they were written. We see evidence of this in the rapidly shifting topical content of many texts, especially the dross that we don’t tend to study in English departments but that dominates the quantitative output of any period.
  4. Predictive. Literature looks beyond the present to anticipate or shape cultural features not yet fully realized. I don’t think this as crazy as it might sound. Critics pretty consistently emphasize the transformational power of books in terms that aren’t strictly personal or metaphorical, and we often bristle, rightly, at the notion that literature merely “reflects” the world. The Romantics among us might say that authors are charged with diagnosing or symptomatizing features of the world that will be obvious in the future, but are hidden now.

For what it’s worth, I’d say that (3) and (2) strike me as most likely or broadly relevant, in that order, followed by (1) and, somewhat distantly for literature en masse, (4).

To (begin to) address the problem of literary-cultural lag/memory/prediction, I collected population data from census records for 23 cities that were relatively well represented in the literary corpus and of comparatively significant size at some point before 1900. They ranged from New York and Philadelphia to Newport (RI), Salem (MA), San Francisco, Detroit, Vicksburg and so on. I did a bit of hand correction on the data to account for changing municipal boundaries and to agglomerate urban areas (metro St. Louis, or Albany and Saratoga Springs, or Buffalo and Niagara Falls; in the second and third cases cases, the latter place was smaller but more frequently used in fiction).

Anyway, with that data in hand, I plotted total literary mentions (1850-1875) against decennial census counts and ran a simple linear regression on each one. Individually, this produced plots like this (using 1850 census data):

1850lm

The r2 value in this case is 0.46, meaning that a city’s 1850 population appears to account for a little less than half the observed variation in literary attention to it over the next two deacdes. Repeat for every decade with census data to 1990 and you get this:

Literary attention vs. Population, 1790-1990

That’s pretty and all, but it’s a little hard to see the trends in the r2 values, which are the thing that would help to quantify the degree of correlation between population and literary attention over time. So let’s pull out the r2‘s and plot them:

r-squared values over time with Gaussian fit

Now this is pretty interesting (he says, of his own work). Note again that the literary data is the same in every case; the only thing that’s changing is the census-year population. So the position of the largest r2 tells us which decade’s population distribution most closely predicts the allocation of literary-geographic attention between 1850 and 1875. The maximum observed r2 is in the 1830 data. The fit line here (which is a simple Gaussian, by the way, a fact that’s also kind of nifty and unexpected, since it’s a pretty good fit and symmetrical forward and backward in time) has its max in 1832.

The average book in the literary corpus was published in 1862 and the average age of the author at publication was 42. So it looks like lag peaks at around 30 years and corresponds to the author’s … “experience,” maybe we’d call it? … at age 12. I’d say this is a piece of evidence in favor of the formative-psychological hypothesis, and then I’d wave my hands vigorously indeed.

I expect to do some more exploration in the months ahead. Having literary data forward to 1990 will be a big help. A few things I’ll be looking into:

  • International comparison. How does lag change, if at all, in other national contexts? The U.S. was (and is) pretty young. Maybe longer-established nations have different dynamics. And how about changes in U.S. representation of foreign cities and vice versa? My guess is that lag is longer the less an author or culture knows about a foreign place.
  • Does lag change over time? Is it shorter today than it was 150 years ago? My guess: yes, but not radically.
  • Is the falloff in fit quality always symmetrical in time, and am I capturing all the relevant dynamics? The near-symmetry in the current data is surprising to me; I would have expected better backward fit than forward. Could be an artifact of the United States’ youth at the time; several of the cities in question didn’t exist for much more than a decade or two before the literature represented in the corpus was written. I wonder if part of this, too, is down to offsetting effects of memory (skewing fit better backward in time) and relative population stability (skewing things forward).
  • Other ways to get at the same question. A comparison of topical content against textual media presumed to be faster moving (newspapers, journals, etc.) would be instructive. How much more conservative is fiction than non-fiction?

Finally, three data notes:

  • Full data is available from the data page. And the code used for analysis and plotting can be had as an IPython notebook.
  • Careful readers will have noticed that the fits are log-linear, i.e., I’ve used the (base 10) logarithms of the values for mentions and population. This is what you’d expect to do for data like these that follow a power-law distribution.
  • I’ve dropped non-existent cities from the computed regressions (though not the visualizations) as appropriate before 1850 (by which time all the cities have population tallies). I think this is defensible, but you could argue for keeping them and using zero population instead. If I’d done that, the fit quality for 1840 and earlier would have been lower, pushing support toward the presentist hypothesis. But that would also be misleading, since it would amount to treating those cities as if they did exist, but were very small, which isn’t true. That’s one of the reasons to include cities like Salem and Nantucket and Newport, which really were existent but small(ish) from the earliest days of the republic. Anyway, an interpretive choice.

A Bit of Position-Taking on Surface Reading

There’s a new piece by Jeffrey Williams in the Chronicle on surface reading and “the new modesty” in literary studies. Came to my attention via Ted Underwood, who had a kind of ambivalent response to it on Twitter.

I was going to reply there, but 140 characters weren’t quite enough, and I’m asked about this pretty often, so thought I’d set down my short thoughts in a more permanent way.

I like and respect Marcus and Best’s work, which I find subtle and illuminating, though most of it falls somewhat outside my own field. And I guess I understand why some people are fed up with ideologically committed, theoretically oriented, hermeneutically inflected literary scholarship. When that stuff is bad, it’s pretty bad. Then again, just about anything can be (and often is) bad. I don’t see any special monopoly on badness there.

I also understand how it’s possible to look at (some) digital humanities research and think that it shares some sort of imagined turn away from depth and detail in favor of “direct” observation of “obvious” features. People who have no experience with the sciences tend to imagine that such things exist and that they’re different from what literary people work with. They aren’t, though that’s an argument for another time. (I have a little on it in passing in my forthcoming Comparative Literature review, FWIW.) In any case, it’s true that you sometimes hear people talking about a desire for “empirical” or “descriptive” research in DH, though they’re in the minority and I’m not one of them.

It’s hopeless, of course, to try to tell other people how to frame their work or ultimately to control how people receive your own. But I’ll say that my own reasons for pursuing computational literary research have nothing to do with (naïve, illusory) empiricism or a desire for critical modesty or a disenchantment with symptomatic, culturally committed criticism. Quite the opposite. Computers help me marshall evidence for large-scale cultural claims. That’s why I’m interested in them: they help me do better the kind of big, not especially modest, fundamentally symptomatic and suspicious critical work that brought me to the field in the first place.

But then, I would say that. I was Fred Jameson’s student and I was his student for a reason.

New Minor in Computing and Digital Technologies at Notre Dame

I’m pleased to announce a new collaborative undergraduate minor in Computing and Digital Technologies at the University of Notre Dame. Beginning next fall, students will be able to pursue a combination of tailored, rigorous instruction in computer programming and closely related coursework in the humanities, arts, and social sciences. There are six tracks within the minor, from UI design to cognitive psychology to digital humanities and more.

It’s an interesting model, one that’s intended to allow our best and most ambitious students to undertake serious research before graduation and to gain the skills they need for success at the highest levels once they leave campus. I’ll be closely involved, serving on the advisory board for the minor, teaching CDT classes in the digital humanities track, and bringing strong students into my research group. We’re seeing more of these kinds of programs elsewhere, including Columbia’s “Computing in Context” courses and Stanford’s “CS+X” majors. There’s been talk here — though not yet any concrete plans — of eventually expanding CDT to a full major and of offering a BA in computer science through Arts and Letters. In the meantime, there may also be teaching opportunities in the program for qualified grad students.

That last point reminds me: if you have outstanding students looking to do grad work in DH, I hope you’ll consider pointing them toward ND!

In any case, exciting times. Looking forward to getting under way in August.

NovelTM Grant and Project

Finally for the day, another announcement that’s been slightly delayed: I’m really pleased to be part of the SSHRC-funded, McGill-led NovelTM: Text Mining the Novel project. It’s a long-term effort “to produce the first large-scale cross-cultural study of the novel according to quantitative methods” (quoth the About page). A super-impressive group of people are attached – just have a look at the list of team members!

Our first all-hands project meeting is coming up this week. Looking forward to getting started on things that will keep me busy for years to come. Updates and preliminary results here in the months ahead.