Books I Read in 2014

Here’s the new (to me) fiction I read this year. As always, I like seeing other people’s lists, so I figure I ought to contribute my own. Archived lists back to 2009 are also available.

  • Aw, Tash. Five Star Billionaire (2013). Did less for me than I’d hoped, but I think I just faulted it for failing to be the sweeping social drama I wanted it to be.
  • Braak, Chris. The Translated Man (2007). Pretty fun steampunk piece. Can’t remember how I found it – a blog somewhere, I think.
  • Catton, Eleanor. The Luminaries (2013). Really well done, but (just?) an entertainment.
  • Hustvedt, Siri. The Blazing World (2014). For stories of women and art, I preferred Messud.
  • Lepucki, Edan. California (2014). Read this right after The Bone Clocks for maximum depression value. Should have been 30 pages shorter or 100 pages longer — the ending doesn’t quite work.
  • Marcus, Ben. The Age of Wire and String (1995). The lone full-on experimental text on the list. Didn’t enjoy it as much as I expected to, because I’m a hypocrite.
  • Martin, Valerie. The Ghost of the Mary Celeste (2014).
  • Mengestu, Dinaw. All Our Names (2014). Disappointing. Guess I wanted more and tighter politics, less domestic drama.
  • Messud, Claire. The Woman Upstairs (2013). Enjoyed this a lot, probably more than anything else on the year.
  • Mitchell, David. The Bone Clocks (2014). I love Mitchell, who’s almost good enough to pull off the book’s bizarre mashup of Black Swan Green, the innermost novella of Cloud Atlas, and interdimensional Manichaean sci-fi. Almost.
  • Murakami, Haruki. 1Q84 (2011). I also like Murakami, but 1,000 pages of close to literally nothing happening is a lot to ask.
  • Offill, Jenny. Dept. of Speculation (2014). Good, narratively interesting, but ultimately underdrawn in substance.
  • Osborne, John. Look Back in Anger (1956). A quick glance at Osborne, whom I’d never read.
  • Tartt, Donna. The Goldfinch (2013). More disaster/suffering porn. Didn’t like it.
  • Waldman, Adelle. The Love Affairs of Nathaniel P. (2013).
  • Weir, Andy. The Martian (2011). Picked up in an airport book rack for a flight with a dead Kindle. Fun to read, sociologically and symptomatically interesting.
  • Wolitzer, Meg. The Interestings (2013). Not really. (Ooh, sick burn!)

Also picked up and put down … let’s see … Hotel World by Ali Smith, Ugly Girls by Lindsay Hunter, We Are All Completely Beside Ourselves by Karen Joy Fowler, and a couple of others.

Sixteen books and one play in sum, a little better than usual. Helps to be on leave. But not a year full of great reads. Was briefly enamored of Offill’s book, but its genuinely cool schtick got a little flat over just 100 pages. The Woman Upstairs was probably my favorite, and even that one wasn’t something I fell in love with. Nothing on the list that I’d especially want to teach or that struck me as something I should spend more time thinking about.

On the whole, it seemed as though I’d read a lot of these things before; well-executed, straight-ahead fiction. Which I suppose is mostly a defect in me, picking things from the pages of the New Yorker and the LRB and the Times and such. I know their deal; it’s not like those outlets went unexpectedly conservative this year. I read a lot of things out of vague professional obligation. The books I had the most fun with — Dept. of Speculation, The Translated Man, The Martian — were either experimental or genre fiction. Maybe there’s a lesson here. Maybe I should learn it.

So, here’s to a better 2015. Leading off (in the absence of the aforementioned lesson) with maybe Lily King’s Euphoria or Hilary Mantel’s Assassination of Margaret Thatcher or Marlon James’s Brief History of Seven Killings or Phil Klay’s Redeployment. Or Emily St. John Mandel’s Station Eleven, if I want to continue the apocalyptic theme from Mitchell and Lepucki …

New Minor in Computing and Digital Technologies at Notre Dame

I’m pleased to announce a new collaborative undergraduate minor in Computing and Digital Technologies at the University of Notre Dame. Beginning next fall, students will be able to pursue a combination of tailored, rigorous instruction in computer programming and closely related coursework in the humanities, arts, and social sciences. There are six tracks within the minor, from UI design to cognitive psychology to digital humanities and more.

It’s an interesting model, one that’s intended to allow our best and most ambitious students to undertake serious research before graduation and to gain the skills they need for success at the highest levels once they leave campus. I’ll be closely involved, serving on the advisory board for the minor, teaching CDT classes in the digital humanities track, and bringing strong students into my research group. We’re seeing more of these kinds of programs elsewhere, including Columbia’s “Computing in Context” courses and Stanford’s “CS+X” majors. There’s been talk here — though not yet any concrete plans — of eventually expanding CDT to a full major and of offering a BA in computer science through Arts and Letters. In the meantime, there may also be teaching opportunities in the program for qualified grad students.

That last point reminds me: if you have outstanding students looking to do grad work in DH, I hope you’ll consider pointing them toward ND!

In any case, exciting times. Looking forward to getting under way in August.

NovelTM Grant and Project

Finally for the day, another announcement that’s been slightly delayed: I’m really pleased to be part of the SSHRC-funded, McGill-led NovelTM: Text Mining the Novel project. It’s a long-term effort “to produce the first large-scale cross-cultural study of the novel according to quantitative methods” (quoth the About page). A super-impressive group of people are attached – just have a look at the list of team members!

Our first all-hands project meeting is coming up this week. Looking forward to getting started on things that will keep me busy for years to come. Updates and preliminary results here in the months ahead.

A Generalist Talk on Digital Humanities

Since I’m apparently in self-promotion mode … This past weekend, I gave a talk in the Notre Dame College of Arts and Letters’ Saturday Scholars series. These are public lectures aimed at curious folks who are in town for football games. It was a lot of fun and, apart from spilling water on my laptop because I’m a doofus and a klutz, I think it went well. Video is embedded below; there was also a write-up in the student newspaper.

ACLS Digital Innovation Fellowship

I somehow failed to post about this when it was announced last summer, but I’ve received an ACLS Digital Innovation Fellowship for the 2014-15 academic year to work on the project “Literary Geography at Scale.”

Things are going well so far; I’ll be updating the site here with reports as the research moves along. Eventually, there will be a full site to access and visualize the data (think Google Ngrams for geographic data). In the meantime, here’s the project abstract:

Literary Geography at Scale uses natural language processing algorithms and automated geocoding to extract geographic information from nearly eleven million digitized volumes held by the HathiTrust Digital Library. The project extends existing computationally assisted work on American and international literary geography to new regions, new historical periods – including the present day – and to a vastly larger collection of texts. It also provides scholars in the humanities and social sciences with an enormous yet accessible trove of geographic information. Because the HathiTrust corpus includes books published over many centuries in a variety of languages and across nearly all disciplines, the derived data is potentially useful to researchers in a range of humanities and computational fields. Literary Geography at Scale is one of the largest humanities text-mining projects to date and the first truly large-scale study of 20th and 21st century literature.

Visualizing Uncertainty with Probability Clouds

I’ve come up with a visualization of data uncertainty that seems really obviously useful, but that I’ve never seen before. So I guess some combination of three things must be true:

  1. I am a genius. Deeply unlikely, given that I misspelled “genius” the first time I typed it here.
  2. There’s something wrong with the “new” method that makes it less useful than I think and/or total bunk.
  3. People do use this, and I just haven’t seen it before. Totally possible, given the number of statistical visualizations in most literary studies papers.

Anyway, the idea is to use probability clouds to show a density region around a given line of best fit through the data.[1] I think this avoids some visual-rhetorical pitfalls in the usual ways of showing trends and uncertainty in data, but/and I’d be grateful for thoughts on its value.

Here’s the context and an example: I’m working on a manuscript at the moment for which I need to visualize a bit of data. Nothing fancy; this is one of the basic figures:

Demo 0 data

Yeah, the axes aren’t labeled, etc. The point is, there are two series that are pretty noisy but seem to be doing different things over time (along the x axis).

OK, so to get a handle on the trend, let’s insert a linear fit for each series:

Demo 1 line

Neat! But the fit lines are a little misleadingly precise. I don’t think we want to say that the “true” value of series 2 in 1820 is exactly 0.15, or that the true values cross in exactly 1872. So let’s add a confidence interval at the usual 95% level:

Demo 2 line se

Better, but this manages to be somehow both too precise and not precise enough. Beyond the line of best fit, which still suggests false precision at the center, the shaded 95% confidence region comes to an abrupt end (too precise) and doesn’t have any internal differentiation (not precise enough). The true value, if we want to think of it that way, isn’t equally likely to fall anywhere within the shaded region; it’s probably somewhere near the middle. But there’s also a smallish chance (5%, to be exact) that it falls outside the shaded region entirely.

So why not indicate those facts visually, while getting rid of the fit line entirely? Here’s what this might look like:

Demo 3 cloud

This seems a lot better. It doesn’t draw your eye misleadingly to the fit line or to the edges of an arbitrarily bounded region, but it does suggest where the real fit might be. And it does that while making plain the fuzziness of the whole business. It would be even better in color, too. I like it. Am I missing something?

On the technical side, this is built up by brute force in R with ggplot. The relevant code is:

library(ggplot2)

se_limit     = 0.99  # Largest standard error level to show; valid range 0 to 1
se_regions   = 100   # Number of regions in uncertainty cloud. 100 is a lot;
                     #   a little slow, but produces very smooth cloud.
se_alpha_max = 0.5   # How dark to make region at center of uncertainty cloud.
                     #   0.5 = 50% grey.
line_type    = 0     # A ggplot2 linetype for fit line; 0 = none, 1 = solid

p = qplot(x, y, data=data)  # Use real data, of course!
for(i in 1:se_regions) { # This loop generates the uncertainty density shading 
	p = p + geom_smooth(method = "lm", linetype = line_type, fill = "black", level = i*se_limit/se_regions, alpha = se_alpha_max/(se_regions))
} 
p # Show the finished plot

That’s it. As you can see, it’s just brute force building up overlapping alpha layers at different confidence levels. I once looked at the denstrip package, but couldn’t make it do the same thing. But I’m dumb, so …

Update: I knew I couldn’t be the first to have thought of this! Doug Duhaime points me to visually-weighted regression, apparently first suggested by Solomon Hsiang in 2012. There’s R code (but I guess not yet a formal package) to do this at Felix Schönbrodt’s site.

Here’s a version using Felix Schönbrodt’s vwReg(). Not all cleaned up to match the above, but you get the idea:

Demo 4 vwreg


[1] If you’ve learned any undergrad-level physical chemistry, you can probably see where this idea came from. Here’s a bog-standard textbook visualization of the electron probability density of a 2p atomic orbital:

(source; back to the post body])

Bamman, Underwood, and Smith, “A Bayesian Mixed Effects Model of Literary Character” (2014)

Too long for Twitter, a pointer to a new article:

  • Bamman, David, Ted Underwood, and Noah A. Smith, “A Bayesian Mixed Effects Model of Literary CharacterProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (2014): 370-79.
    NB. The link here is to a synopsis of the work and related info; you’ll want the authors’ PDF for details.

The new work is related to Bamman, O’Connor, and Smith’s “Learning Latent Personas of Film Characters” (ACL 2013; PDF), which modeled character types in Wikipedia film summaries. I mention the new piece here mostly because it’s cool, but also because it addresses the biggest issue that came up in my grad seminar when we discussed the film personas work, namely the confounding influence of plot summaries. Isn’t it the case, my students wanted to know, that what you might be finding in the Wikipedia data is a set of conventions about describing and summarizing films, rather than (or, much more likely, in addition to) something about film characterization proper? And, given that Wikipedia has pretty strong gender­/race­/class­/age­/nationality­/etc.­/etc./etc. biases in its authorship, doesn’t that limit what you can infer about the underlying film narratives? Wouldn’t you, in short, really rather work with the films themselves (whether as scripts or, in some ideal world, as full media objects)?

The new paper is an important step in that direction. It’s based on a corpus of 15,000+ eighteenth- and nineteenth-century novels (via the HathiTrust corpus), from which the authors have inferred arbitrary numbers of character types (what they call “personas”). For details of the (very elegant and generalizable) method, see the paper. Note in particular that they’ve modeled author identity as an explicit parameter and that it would be relatively easy to do the same thing with date of publication, author nationality, gender, narrative point of view, and so on.

The new paper finds that the author-effects model — as expected — performs especially well in discriminating character types within a single author’s works, though less well than the older method (which doesn’t control for author effects) in discriminating characters between authors. Neither method does especially well on the most difficult cases, differentiating similar character types in historically divergent texts.

Anyway, nifty work with a lot of promise for future development.

Two Events at Stanford

I’m giving a couple of talks at Stanford next week. Announcements from the Lit Lab and CESTA:

On Monday, May 19th, 2014 at 10am, The Literary Lab will host Matt Wilkens, an Assistant Professor of English at the University of Notre Dame. His talk, entitled, “Computational Methods, Literary Attention, and the Geographic Imagination,” will focus on his recent work that combines Digital and Spatial Humanities research as he investigates the literary representation of place in American Literature.

For those interested in the role of Digital Humanities within humanities disciplines, Matt will also be leading a seminar/discussion on the institutional place of Digital Humanities, particularly focusing on its role in the classroom. This event, “Digital Humanities and New Institutional Structures” will take place on Tuesday, May 20th at 12pm in CESTA (the Fourth Floor of Wallenberg Hall, Building 160), Room 433A. Lunch will be provided.

Digital Americanists at ALA 2014

From the Digital Americanists site, which has full details:

Visualizing Non-Linearity: Faulkner and the Challenges of Narrative Mapping
Session 1-A. Thursday, May 22, 2014, 9:00 – 10:20 am

  1. Julie Napolin, The New School
  2. Worthy Martin, University of Virginia
  3. Johannes Burgers, Queensborough Community College

Digital Flânerie and Americans in Paris
Session 2-A. Thursday, May 22, 2014, 10:30-11:50 am

  1. “Mapping Movement, or, Walking with Hemingway,” Laura McGrath, Michigan State University
  2. “Parisian Remainder,” Steven Ambrose, Michigan State University
  3. “Sedentary City,” Anna Green, Michigan State University
  4. “Locating The Imaginary: Literary Mapping and Propositional Space,” Sarah Panuska, Michigan State University

Matthew Wilkens: Geospatial Cultural Analysis and Literary Production

An interview with the DH group at Chicago in advance of my talk there this Friday. Looking forward!

digital humanities blog @UChicago

the distribution of US city-level locations, revealing a preponderance of literary–geographic occurrences in what we would now call the Northeast corridor between Washington, DC, and Boston, but also sizable numbers throughout the South, Midwest, Texas, and California. The distribution of US city-level locations, revealing a preponderance of literary–geographic occurrences in what we would now call the Northeast corridor between Washington, DC, and Boston, but also sizable numbers throughout the South, Midwest, Texas, and California.

Matthew Wilkens, Assistant Professor of English at Notre Dame University, will be speaking at the Digital Humanities Forum on March 7 about Geospatial Cultural Analysis and its intersection with Literary Production. Specifically, Wilkens’ research asks: Using computational analysis, how can we define and assess the geographic imagination of American fiction around the Civil War, and how did the geographic investments of American literature change across that sociopolitical event?

We spoke to him about his choice to use a quantitative methodology, the challenges that were consequently faced, and the overall future for the Digital Humanities. This is what he had to say:

What brought you to Digital Humanities methodologies?

I guess it was…

View original post 1,715 more words