December 31, 2013 § Leave a comment
- Bennett, Ronan. The Catastrophist (2001). A possible text for the Congo class, though I’d probably go with something by Mabanckou instead.
- Coetzee, J. M. The Childhood of Jesus (2013). Very good, as always.
- Fountain, Ben. Billy Lynn’s Long Halftime Walk (2012). Wanted to like this one, as the consensus best novel of the recent wars, but it left me kind of cold.
- Klosterman, Chuck. The Visible Man (2011). Interesting premise: Try to think through all the implications of selective invisibility.
- Kushner, Rachel. The Flamethrowers (2013). Not sure it’s as good as everyone says, but it is good.
- Ledgard, J. M. Submergence (2013). Two-thirds of a really good book. The African sections are wonderful; the oceanography bits, not so much. Has the same problem Richard Powers does in writing about scientists — can’t get over a fawning love of science itself that finds expression as insufferably polymathic scientists.
- Mantel, Hilary. Bring Up the Bodies (2012). I’m a shameless fan. The last Wolf Hall book is coming soon, right? Please?
- Nutting, Alissa. Tampa (2013). The comparisons to Lolita are entirely unearned, though I suppose one could do worse than “not as good as Nabokov.”
- Pava, Sergio De La. A Naked Singularity (2012). Best book I’ve read in a long time.
- Pynchon, Thomas. Bleeding Edge. (2013) Really enjoyed this; as with Coetzee, I’m a sucker for everything Pynchon writes.
- Saunders, George. Tenth of December (2013). Pretty much as good as everyone says, though I still never know what to do with short stories.
- Winterbach, Ingrid. The Book of Happenstance. (2011). An interesting, patient novel, translated from Afrikaans.
A dozen books in all. Once again, not setting any records, but an enjoyable year. I’m on leave next fall, so may do a bit better in 2014. In the meantime, I’ve just started Tash Aw’s Five Star Billionaire …
December 2, 2013 § Leave a comment
My article, “The Geographic Imagination of Civil War-Era American Fiction,” is in the latest issue of American Literary History (which happens to be the 100th issue of the journal). The easiest way to get it is probably via Muse (direct link, paywall), though it’s also available from Oxford (publisher of ALH, temporarily free to all). If your institution doesn’t subscribe to either of those outlets, drop me a line and I’ll send you a PDF offprint. I’m really pleased to see the piece in print, especially in an issue with so many people whose work I admire.
The article presents some of my recent work on geolocation extraction in a form that’s more complete than has been possible in the talks I’ve given over the last year or so. There’s more coming on a number of fronts: geographic attention as a function of demographic and economic factors, a wider historical scope, a (much) larger corpus, some marginally related studies of language use in the nineteenth century (with my students Bryan Santin and Dan Murphy), and more. Looking forward to sharing these projects in the months ahead.
September 29, 2013 § 5 Comments
Last week I finished a fellowship proposal to fund work on geolocation extraction across the whole of the HathiTrust corpus. It’s a big project and I’m excited to start working on it in the coming months.
One thing that came up in the course of polishing the proposal—but that didn’t make it into the finished product—is how volumes in languages other than English might be handled. The short version is that the multilingual nature of the HathiTrust corpus opens up a lot of interesting ground for comparative analysis without posing any particular technical challenges.
In slightly more detail: There are a fair number of HathiTrust volumes in languages other than English; the majority of HT’s holdings are English-language texts, but even 10 or 20% of nearly 11 million books is a lot. Fortunately, this is less of an issue than it might appear. You won’t get good performance running a named entity recognizer trained on English data over non-English texts, but all you need to do is substitute a language-appropriate NER model, of which there are many, especially for the European languages that make up the large bulk of HT’s non-English holdings. And it’s not hard at all to identify the language in which a volume is written, whether from metadata records or by examining its content (stopword frequency is especially quick and easy). In fact, you can do that all the way down to the page level, so it’s possible to treat volumes with mixed-language content in a fine-grained way.
About the only difference between English and other languages is that I won’t be able to supply as much of my own genre- and period-specific training data for non-English texts, so performance on non-English volumes published before about 1900 may be a bit lower than for volumes in those languages published in the twentieth century (since the available models are trained almost exclusively on contemporary sources). On the other hand, NER is easier in a lot of languages other than English because they’re more strongly inflected and/or rule bound, so this may not be much of a problem. And in any case, the bulk of the holdings in all languages are post-1900. When it comes time to match extracted locations with specific geographic data via Google’s geocoding API, handling non-English strings is just a matter of supplying the correct language setting with the API request.
Anyway, fun stuff and a really exciting opportunity …
August 17, 2013 § Leave a comment
A few days back, I tweeted about the Racial Dotmap, a really cool GIS project by Dustin Cable of the Weldon Cooper Center for Public Service at UVa. The map shows the distribution (down to the block level) of US population by race according to the 2010 census. There’s a fuller explanation on the Cooper Center’s site.
The map is fascinating stuff — I lost most of a morning browsing around it. Really, you should check it out. To give you an idea of what you’ll find, here are a couple of screen grabs:
The eastern US (click for live version):
South Bend, Indiana (with Notre Dame). Not clickable, alas, but you can find it from the main map:
One of the things that’s especially appealing about the project is how open it is. The code is posted on GitHub and the underlying data comes from the National Historical Geographic Information System. That fact, along with a suggestion by Nathan Yau of FlowingData, made me wonder how much effort would be involved in creating a version of the map that would allow users to move between historical censuses. It would be really helpful to have an analogous picture for the nineteenth century as I work on the evolution of literary geography during that period.
If I were cooler than I am, this would be where I’d reveal that I had, in fact, created such a thing. I am not that cool. But I wanted to flag the possibility for future use by me or my students or anyone else who might be so inclined. I’m thinking of at least looking into this as a group project for the next iteration of my DH seminar.
I can imagine two big difficulties straight away:
- You’d need to have historical geo data, particularly block- or tract-level shapefiles. I have no idea how much the census blocks have changed over time nor whether such historical shapefiles exist. Seems like they should, but …
- You’d need the historical census info to be tabulated and available in a way that allows it to be dropped into the existing code or translated into an analogous form. I haven’t looked at that data, so I don’t know how much work would be involved.
Anyway, the Racial Dotmap is a great project to which I hope to be able to return in the future. In the meantime, enjoy!