Literary Attention Lag

I gave a short talk on geography and memory at this year’s MLA in Vancouver (session info). I didn’t work from a script, but here’s the core material and a few key slides.

So the problem I was trying to address was this: How is geographic attention in literary fiction related to the distribution of population at the time the fiction is published? And what do the details of the relation between them tell us about literary memory? These are questions I just barely touched in my ALH article on the literary geography of the Civil-War period last year, and I thought they were worth a bit more consideration.

To review, we know that there’s a moderate correlation between the population of a geographic location and the amount of literary attention paid to it (measured by the number of times that place is mentioned in books). New York City is used in American literature more frequently than is Richmond, for instance. (This is all using a corpus of about a thousand volumes of U.S. fiction published between 1850 and 1875, but I strongly suspect the correlation holds elsewhere; I’ll be able to say more definitively and share results in a month or two.)

But there is, in at least some instances, a temporal component involved as well. After all, population isn’t a stable feature of cities. Witness the cases of New Orleans and Chicago:

Population, 1820-1900

Populations of New Orleans and Chicago, 1820-1900

Literary mentions, 1850-1875

Mentions of New Orleans and Chicago, 1850-1875

In short, those cities were about the same size in 1860, but New Orleans — the older of the two by far — was used much more often in fiction at the time. It appears to have taken a while for Chicago to catch on in the literary imagination.

I wondered, then, whether this was a generalizable trend and, if so, whether I could quantify and explain it. I considered four informal hypotheses about the temporal relationship between population and literary-geographic representation (if I were feeling a little grand, I’d refer to these as reduced models of literary-geographic memory).

  1. National or deep. Not all the way to deep time in Wai Chee Dimock’s sense, but maybe closer to Sacvan Bercovitch’s model of Puritan inheritance. Literature in the nineteenth century represents the nation as it was in the eighteenth.
  2. Formative-psychological. Authors (and readers?) represent the world as it existed during their formative years, for whatever value of “formative” we might choose. Presumably their childhood or school years.
  3. Presentist. We find in books largely the world as it is at the time they were written. We see evidence of this in the rapidly shifting topical content of many texts, especially the dross that we don’t tend to study in English departments but that dominates the quantitative output of any period.
  4. Predictive. Literature looks beyond the present to anticipate or shape cultural features not yet fully realized. I don’t think this as crazy as it might sound. Critics pretty consistently emphasize the transformational power of books in terms that aren’t strictly personal or metaphorical, and we often bristle, rightly, at the notion that literature merely “reflects” the world. The Romantics among us might say that authors are charged with diagnosing or symptomatizing features of the world that will be obvious in the future, but are hidden now.

For what it’s worth, I’d say that (3) and (2) strike me as most likely or broadly relevant, in that order, followed by (1) and, somewhat distantly for literature en masse, (4).

To (begin to) address the problem of literary-cultural lag/memory/prediction, I collected population data from census records for 23 cities that were relatively well represented in the literary corpus and of comparatively significant size at some point before 1900. They ranged from New York and Philadelphia to Newport (RI), Salem (MA), San Francisco, Detroit, Vicksburg and so on. I did a bit of hand correction on the data to account for changing municipal boundaries and to agglomerate urban areas (metro St. Louis, or Albany and Saratoga Springs, or Buffalo and Niagara Falls; in the second and third cases cases, the latter place was smaller but more frequently used in fiction).

Anyway, with that data in hand, I plotted total literary mentions (1850-1875) against decennial census counts and ran a simple linear regression on each one. Individually, this produced plots like this (using 1850 census data):


The r2 value in this case is 0.46, meaning that a city’s 1850 population appears to account for a little less than half the observed variation in literary attention to it over the next two deacdes. Repeat for every decade with census data to 1990 and you get this:

Literary attention vs. Population, 1790-1990

That’s pretty and all, but it’s a little hard to see the trends in the r2 values, which are the thing that would help to quantify the degree of correlation between population and literary attention over time. So let’s pull out the r2‘s and plot them:

r-squared values over time with Gaussian fit

Now this is pretty interesting (he says, of his own work). Note again that the literary data is the same in every case; the only thing that’s changing is the census-year population. So the position of the largest r2 tells us which decade’s population distribution most closely predicts the allocation of literary-geographic attention between 1850 and 1875. The maximum observed r2 is in the 1830 data. The fit line here (which is a simple Gaussian, by the way, a fact that’s also kind of nifty and unexpected, since it’s a pretty good fit and symmetrical forward and backward in time) has its max in 1832.

The average book in the literary corpus was published in 1862 and the average age of the author at publication was 42. So it looks like lag peaks at around 30 years and corresponds to the author’s … “experience,” maybe we’d call it? … at age 12. I’d say this is a piece of evidence in favor of the formative-psychological hypothesis, and then I’d wave my hands vigorously indeed.

I expect to do some more exploration in the months ahead. Having literary data forward to 1990 will be a big help. A few things I’ll be looking into:

  • International comparison. How does lag change, if at all, in other national contexts? The U.S. was (and is) pretty young. Maybe longer-established nations have different dynamics. And how about changes in U.S. representation of foreign cities and vice versa? My guess is that lag is longer the less an author or culture knows about a foreign place.
  • Does lag change over time? Is it shorter today than it was 150 years ago? My guess: yes, but not radically.
  • Is the falloff in fit quality always symmetrical in time, and am I capturing all the relevant dynamics? The near-symmetry in the current data is surprising to me; I would have expected better backward fit than forward. Could be an artifact of the United States’ youth at the time; several of the cities in question didn’t exist for much more than a decade or two before the literature represented in the corpus was written. I wonder if part of this, too, is down to offsetting effects of memory (skewing fit better backward in time) and relative population stability (skewing things forward).
  • Other ways to get at the same question. A comparison of topical content against textual media presumed to be faster moving (newspapers, journals, etc.) would be instructive. How much more conservative is fiction than non-fiction?

Finally, three data notes:

  • Full data is available from the data page. And the code used for analysis and plotting can be had as an IPython notebook.
  • Careful readers will have noticed that the fits are log-linear, i.e., I’ve used the (base 10) logarithms of the values for mentions and population. This is what you’d expect to do for data like these that follow a power-law distribution.
  • I’ve dropped non-existent cities from the computed regressions (though not the visualizations) as appropriate before 1850 (by which time all the cities have population tallies). I think this is defensible, but you could argue for keeping them and using zero population instead. If I’d done that, the fit quality for 1840 and earlier would have been lower, pushing support toward the presentist hypothesis. But that would also be misleading, since it would amount to treating those cities as if they did exist, but were very small, which isn’t true. That’s one of the reasons to include cities like Salem and Nantucket and Newport, which really were existent but small(ish) from the earliest days of the republic. Anyway, an interpretive choice.

3 thoughts on “Literary Attention Lag

  1. This is fascinating! It would be helpful to see when each novel in the corpus was published, to get a sense of how to interpret the R^2 values. Given a reasonably normal distribution of novels over time, the argument gains much force!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s