Post45
June 1st, 2011 § Leave a Comment
I have a new piece, “Contemporary Fiction by the Numbers,” in the inaugural batch of essays at Post45 Contemporaries. My article is a primer on quantitative methods for literary studies, along with a brief for their significance. Not much that I haven’t said before, but it pulls together a few DH-and-lit ideas and a set of examples in one place.
More important, though, is the existence of Post45. Post45 is a bunch of things: an Americanist working group, a book series (with Stanford UP), a conference, an online journal (which will soon begin publishing regular peer-reviewed articles), and—through its Contemporaries section, edited by Andy Hoberek—a cross between the Partisan Review, NYRB, and an especially smart blog devoted to “actively intervening in current tastes.”
I’m really happy to have an essay in the launch edition of the site, but I’m even happier that the whole project exists.
Memoir and Autobiography after WWII
April 4th, 2011 § Leave a Comment
Apropos my upcoming talk at the Narrative Conference, an interesting n-gram chart of the terms “memoir” and “autobiography” after 1945. Serious bonus points for a convincing explanation of what you’ll find if you widen the date range. (Click the image for Google’s live-data version.)
Update: Another, possibly relevant chart. Again, click for the live version:
Maps of American Fiction
March 28th, 2011 § 17 Comments
A quick post to show some recent research on named places in nineteenth-century American fiction. I’m interested in the range and distribution of places mentioned in these books as potential indicators of cultural investments in, for example, internationalism and regionalism. I’m also curious about the extent to which large-scale changes (both cultural and formal) are observable in the overall literary production of this (or any) period. The mapping work I’ve done so far doesn’t come close to answering those questions, but it’s part of the larger inquiry.
The Maps
The maps below were generated using a modest corpus of American novels (about 300 in total) drawn from the Wright American Fiction Project at Indiana by way of the MONK project. They show the named locations used in those books; points correspond to everything from small towns through regions, nations and continents. Methodological details and (significant) caveats follow.

1851. 37 volumes (~2.5M words), with data cleanup.

1852. 44 volumes (~3.0M words), minimal cleanup.

1874. 38 volumes (~3.1M words), minimal cleanup.
The Method
Texts were taken from MONK in XML (TEI-A) format with hand-curated metadata. Location names were identified and extracted using Pete Warden’s simple gazetteering script GeoDict, backed by MaxMind’s free world cities database. [Note that there's currently a bug in the database population script for Geodict. Pete tells me it'll be fixed in the next release of his general-purpose Data Science Toolkit, into which Geodict has now been folded. But for now, you probably don't want to use Geodict as-is for your own work.] I tweaked GeoDict to identify places more liberally than usual, which results (predictably) in fewer missed places but more false positives. The locations for 1851 were reviewed pretty carefully by hand; I haven’t done the same yet for the other years. Maps were generated in Flash using Modest Maps with code cribbed shamelessly from the awesome FlowingData Walmart project. This means that it should be relatively easy to turn the static maps above into a time-animated series, but I haven’t done that yet.
Discussion
As I pointed out in my talk on canons, the international scope and regional clustering of places in 1851 strike me as interesting. See the talk for (slightly) more discussion. Moving forward to 1874—and bearing in mind that we’re looking at dirty data best compared with the similarly dirty 1852—the density of named places in the American west increases after the Civil War and it looks as though a distinct cluster of places in the south central U.S is beginning to emerge.
The changes form 1852 to 1874 are (1) intriguing, (2) but also mostly as expected, and (3) more limited in scope than one might have imagined, given that they sit a decade on either side of the periodizing event of American history. I think an important question raised by a lot of work in corpus analysis (the present research included) concerns exactly what constitutes a “major” shift in form or content.
I’m going to avoid saying anything more here because I don’t want to build too much argument on top of a dataset that I know is still full of errors, but I wanted to put the maps up for anyone to puzzle through. If you have thoughts about what’s going on here, I’d love to hear them.
Caveats
A couple of notes and caveats on errors:
- Errors in the data are of several kinds. There are missed locations, i.e., named places that occur in the underlying text but are not flagged as such. Some places that existed in the nineteenth century don’t exist now. Some colloquial names aren’t in the database. And of course a book can be set in, say, New York City and yet fail to use the city’s name often or at all, possibly preferring street addresses or localisms like “the Village.” Also, GeoDict as configured identifies all country and continent names with no restrictions, but requires cities and regions (e.g., U.S. states) either to be paired with a larger geographic region (“Brooklyn, New York,” not “Brooklyn”) or preceded by “in” or “at” as indicators of place. You pretty much have to do this to keep the false positive rate manageable.
- But there are still false positives. There’s a city somewhere in the world named for just about any common English name, adjective, military rank, etc. “George,” for instance, is a city in South Africa. “George, South Africa,” if it ever occurred in a text, would be identified correctly. But “In George she had found a true friend” produces a false positive. When I clean the data, I eliminate almost all proper names of this kind and investigate anything else that looks suspicious. Note that the cluster of places in southern Africa visible in the (uncleaned) 1852 and 1874 maps is almost certainly attributable to this kind of error. Travis Brown tells me he’s seen the same thing in his own geocoding experiments.
- Then there are ambiguous locations, usually clear in context but not obvious to GeoDict. “Cambridge” is the most frequent example. Some study suggests that most American novels in the corpus mean the city in Massachusetts, but that’s surely not true of every instance. Most other ambiguities are much more easily resolved, but they still require human attention.
Some Thoughts on DH and Canons
January 29th, 2011 § 7 Comments
Below is a draft of the talk I’m giving next week at Austin for the first of three DH symposia this semester sponsored by the Texas Institute for Literary and Textual Studies. The theme of this first meeting is “Access, Authority, and Identity“; my paper is an attempt to think through some of the implications of working beyond the canon (however construed) for straight literary and cultural scholarship and for DH alike. It’s also a nice excuse to show a little preview of the geolocation work I’ve been doing recently.
A prettier PDF version is also available.
Undermining Canons
I have a point from which to start: Canons exist, and we should do something about them.
I wouldn’t have thought this was a dicey claim until I was scolded recently by a senior colleague who told me that I was thirty years out of date for making it. The idea being that we’d had this fight a generation ago, and the canon had lost. But I was right and he, I’m sorry to say, was wrong. Ask any grad student reading for her comps or English professor who might confess to having skipped Hamlet. As I say, canons exist. Not, perhaps, in the Arnoldian–Bloomian sense of the canon, a single list of great books, and in any case certainly not the same list of dead white male authors that once defined the field. But in the more pluralist sense? Of books one really needs to have read to take part in the discipline? And of books many of us teach in common to our own students? Certainly. These are canons. They exist.
So why, a few decades after the question of canonicity as such was in any way current, do we still have these things? If we all agree that canons are bad, why haven’t we done away with them? Why do we merely tinker around the edges, adding a Morrison here and subtracting a Dryden there? Is this a problem? If so, what are we going to do about it? And more to the immediate point, what does any of this have to do with digital humanities?
The answer to the first question—“Why do we still have canons?”—is as simple to articulate as it is apparently difficult to solve. We don’t read any faster than we ever did, even as the quantity of text produced grows larger by the year. If we need to read books in order to extract information from them and if we need to have read things in common in order to talk about them, we’re going to spend most of our time dealing with a relatively small set of texts. The composition of that set will change over time, but it will never get any bigger. This is a canon. [Footnote: How many canons are there? The answer depends on how many people need to have read a given set of materials in order to constitute a field of study. This was once more or less everyone, but then the field was also very small when that was true. My best guess is that the number is at least a hundred or more at the very lowest end—and an order of magnitude or two more than that at the high end—which would give us a few dozen subfields in English, give or take. That strikes me as roughly accurate.]
Another way of putting this would be to say that we need to decide what to ignore. And the answer with which we’ve contented ourselves for generations is: “Pretty much everything ever written.” We don’t read much. What little we do read is deeply nonrepresentative of the full field of literary and cultural production. Our canons are assembled haphazardly, with a deep set of ingrained cultural biases that are largely invisible to us, and in ignorance of their alternatives. We’re doing little better, frankly, than we were with the dead-white-male bunch fifty or a hundred years ago, and we’re just as smug in our false sense of intellectual scope.
So canons, even in their current, mildly multiculturalist form, are an enormous problem, one that follows from our single working method, that is, from the need to perform always and only close reading as a means of cultural analysis. It’s probably clear where I’m going with this, at least to a group of DH folks. We need to do less close reading and more of anything and everything else that might help us extract information from and about texts as indicators of larger cultural issues. That includes bibliometrics and book historical work, data-mining and quantitative text analysis, economic study of the book trade and of other cultural industries, geospatial analysis, and so on. Moretti is an obvious model here, as is the work of people like Michael Witmore on early modern drama and Nicholas Dames on social structures in nineteenth-century fiction.
To show you one quick example of what I have in mind, here’s a map of the locations mentioned in thirty-seven American literary texts published in 1851:
There are some squarely canonical works included in this collection, including Moby-Dick and House of the Seven Gables, but the large majority are obscure novels by the likes of T. S. Arthur and Sylvanus Cobb. I certainly haven’t read many of them, nor am I likely to spend months doing so. The corpus is drawn from the Wright American Fiction collection and represents about a third of the total American literary works published that year. [Footnote: Why only a third? Those are all the texts available in machine-readable format at the moment.] Place names were extracted using a tool called GeoDict, which looks for strings of text that match a large database of named locations. I had to do a bit of cleanup on the extracted places, mostly because many personal names and common adjectives are also the names of cities somewhere in the world. I erred on the conservative side, excluding any of those I found and requiring a leading preposition for cities and regions, so if anything, I’ve likely missed some valid places. But the results are fascinating. Two points of interest, just quickly:
- For one, there are a lot more international locations than one might have expected. True, many of them are in Britain and western Europe, but these are American novels, not British reprints, so even that fact might surprise us. And there are also multiple mentions of locations in South America, Africa, India, China, Russia, Australia, the Middle East, and so on. The imaginative landscape of American fiction in the mid-nineteenth century appears to be pretty diversely outward looking in a way that hasn’t received much attention.
- And then—point two—there’s the distinct cluster of named places in the American south. At some level this probably shouldn’t be surprising; we’re talking about books that appeared just a decade before the Civil War, and the South was certainly on people’s minds. But it doesn’t fit very well with the stories we currently tell about Romanticism and the American Renaissance, which are centered firmly in New England during the early 1850s and dominate our understanding of the period. Perhaps we need to at least consider the possibility that American regionalism took hold significantly earlier than we usually claim.
So as I say, I think this is a pretty interesting result, one that demonstrates a first step in the kind of analyses that remain literary and cultural but that don’t depend on close reading alone nor suffer the material limits such reading imposes. I think we should do more of this—not necessarily more geolocation extraction in mid-nineteenth-century American fiction (though what I just showed obviously doesn’t exhaust that little project), but certainly more algorithmic and quantitative analysis of piles of text much too large to tackle “directly.” (“Directly” gets scare quotes because it’s a deeply misleading synonym for close reading in this context.)
If we do that—shift more of our critical capacity to such projects—there will be a couple of important consequences. For one thing, we’ll almost certainly become worse readers. Our time is finite; the less of it we devote to an activity, the less we’ll develop our skill in that area. Exactly how much our reading suffers—and how much we should care—are matters of reasonable debate; they depend on both the extent of the shift and the shape of the skill–experience curve for close reading. My sense is that we’ll come out alright and that it’s a trade well worth making. We gain a lot by having available to us the kinds of evidence text mining (for example) provides, enough that the outcome will almost certainly be a net positive for the field. But I’m willing to admit that the proof will be in the practice and that the practice is, while promising, as yet pretty limited. The important point, though, is that the decay of close reading as such is a negative in itself only if we mistakenly equate literary and cultural analysis with their current working method.
Second—and maybe more important for those of us already engaged in digital projects of one sort or another—we’ll need to see a related reallocation of resources within DH itself. Over the last couple of decades, many of our most visible projects have been organized around canonical texts, authors, and cultural artifacts. They have been motivated by a desire to understand those (quite limited) objects more robustly and completely, on a model plainly derived from conventional humanities scholarship. That wasn’t a mistake, nor are those projects without significant value. They’ve contributed to our understanding of, for example, Rossetti and Whitman, Stowe and Dickinson, Shakespeare and Spenser. And they’ve helped legitimate digital work in the eyes of suspicious colleagues by showing how far we can extend our traditional scholarship with new technologies. They’ve provided scholars around the world—including those outside the centers of university power—with better access to rare materials and improved pedagogy by the same means. But we shouldn’t ignore the fact that they’ve also often been large, expensive undertakings built on the assumption that we already know which authors and texts are the proper ones to which to devote our scarce resources. And to the extent that they’ve succeeded, they’ve also reinforced the canonicity of their subjects by increasing the amount of critical attention paid to them.
What’s required for computational and quantitative work—the kind of work that undermines rather than reinforces canons—is more material, less elaborately developed. The Wright collection, on which the 1851 map that I showed a few minutes ago was based (Figure 1), is a partial example of the kind of resource that’s best suited to this next development in digital humanities research. It covers every known American literary text published in the U.S. between 1851 and 1875 and makes them available in machine-readable form with basic metadata. Google Books and the Hathi Trust aim for the same thing on a much larger scale. None of these projects is cheap. But on a per-volume basis, they’re not bad. And of course we got Google and Hathi for very little of our own money, considering the magnitude of the projects.
It will still cost a good deal to make use of these what we might call “bare” repositories. The time, money, and attention they demand will have to come from somewhere. My point, though, is that if (as seems likely) we can’t pull those resources from entirely new pools outside the discipline—that is to say, if we can’t just expand the discipline so as to do everything we already do, plus a great many new things—then we should be willing to make sacrifices not only in traditional or analog humanities, but also in the types of first-wave digital projects that made the name and reputation of DH. This will hurt, but it will also result in categorically better, more broadly based, more inclusive, and finally more useful humanities scholarship. It will do so by giving us our first real chance to break the grip of small, arbitrarily assembled canons on our thinking about large-scale cultural production. It’s an opportunity not to be missed and a chance to put our money—real and figurative—where our mouths have been for two generations. We’ve complained about canons for a long time. Now that we might do without them, are we willing to try? And to accept the trade-offs involved? I think we should be.
My 2011 MLA Session
January 6th, 2011 § Leave a Comment
For those attending MLA in Los Angeles this week, I’ll be taking part in a “digital roundtable” organized by the ACH. Details below. Lots of smart people and interesting projects. The session abstract:
The Association for Computers and the Humanities (ACH) is pleased to sponsor an electronic roundtable and demo session featuring new and renewed work in media and digital literary studies. Projects, groups, and initiatives highlighted in this session build on the editorial and archival roots of humanities scholarship to offer new, explicitly methodological and interpretive contributions to the digital literary scene, or to intervene in established patterns of scholarly communication and pedagogical practice. Each presenter will offer a very brief introduction to his or her work, setting it in the context of digital humanities research and praxis, before we open the floor for simultaneous demos and casual conversations with attendees at eight computer stations.
A complete session description, including a list of presenters and individual project abstracts, is available on the ACH site. MLA’s session description (less info but with up-to-date annotations) is available to MLA members.
Session details:
- 193. New (and Renewed) Work in Digital Literary Studies
- Friday, 7 January
- 8:30–9:45 a.m., Plaza I, J. W. Marriott
Books I Read in 2010
January 5th, 2011 § Leave a Comment
As I did last year, here’s a list of the books I read for the first time in 2010. Just fiction; no criticism, theory, journals, etc.
- Atwood, Margaret. Oryx and Crake.
- Burgess, Anthony. A Clockwork Orange.
- Camus, Albert. The Plague.
- Capek, Karel. R.U.R.
- Davis, Kathryn. The Thin Place.
- Donoghue, Emma. Room.
- Fowles, John. The French Lieutenant’s Woman.
- Gilb, Dagoberto. The Last Known Residence of Mickey Acuña.
- Golding, William. Lord of the Flies. (OK, I read this in high school, but that doesn’t count. Ditto Animal Farm, which I also reread this year, though I’m reluctant to cop to it.)
- Johnson, B.S. Albert Angelo.
- Kerouac, Jack. On the Road. (An exception here; a serious reread for the book manuscript.)
- Lee, Andrea. Lost Hearts in Italy.
- Mantel, Hilary. Wolf Hall.
- Markson, David. Wittgenstein’s Mistress.
- Millet, Lydia. Everyone’s Pretty.
- Mitchell, David. The Thousand Autumns of Jacob de Zoet.
- Peace, David. Occupied City.
- Petterson, Per. I Curse the River of Time.
- Powell, Padgett. The Interrogative Mood.
- Russo, Richard. Straight Man.
- Saro-Wiwa, Ken. Sozaboy.
- Williams, Joy. The Quick and the Dead.
- Yu, Charles. How to Live Safely in a Science Fictional Universe.
Oh, and I’m in the middle of Adrian Johns’ Piracy, which isn’t fiction, but which I’m totally reading for the plot. Does that count?
Read bits of a few others (Parrot and Olivier in America, Super Sad True Love Story, The Pregnant Widow, Death of the Adversary) to which I hope to return.
Should post some thoughts on these eventually. Or maybe something more formal for the new Post45 journal. We shall see.
First up in 2011: Alexander Theroux or Péter Esterházy, I think.
Finally and unrelated: I have awesome maps of nineteenth-century American fiction. More to come.
What To Do With Too Much Text
October 10th, 2010 § 2 Comments
Below are the slides from my talk on text mining, “What To Do with Too Much Text, or, Data Mining for the Humanities and Social Sciences,” given at the Washington University Center for Political Economy a few days ago (8 Oct. 2010). For those who weren’t there, the talk was primarily a survey of approaches to (mostly) humanities-oriented text analysis with examples drawn from literary studies, history, psychology, and political science. For a fuller treatment of the opening “Motivations” section, see this post. You might also want to check out the theoretical underpinnings of my own allegory project, about which I said relatively little.
The original slides are in Keynote and include embedded videos that don’t translate well to PowerPoint (and confuse SlideShare); rather than make a hash of things, I’ve put up a Quicktime version for people who don’t have access to Keynote. The Keynote file includes my (hopefully non-embarrassing) presenter notes, which may give a fuller sense of what I said at some points.
- The original Keynote presentation (23 MB)
- The Quicktime version (42 MB). Just slides, not a video of the talk. Click to advance through the stack.
- Plain HTML; lacks animations and videos, but it’s a lot faster to load and doesn’t require any other software.
Below are links to the projects and tools I mentioned (roughly in order of appearance).
Projects and Works Cited
- R.R. Bowker, U.S. publishing industry statistics.
- Monroe, Colaresi, and Quinn. “Fightin’ Words: Methods of Lexical Feature Selection and Evaluation for Evaluating the Content of Political Conflict.” Political Analysis, 16.4 (2008).
- Dan Cohen, “Searching for the Victorians.”
- Matt Jockers’ work on the geography of Irish-American literature.
- Jockers’ clustering work with Shakespeare and novel genres.
- Michael Witmore’s similar clustering studies using Docuscope. See also this draft version of Witmore and Hope’s forthcoming piece in Shakespeare Quarterly.
- John Burrows’ work on clustering novels and plays. See also many of the works cited in Burrows’ chapter.
- Elson, D. K., N. Dames, and K. R. McKeown. “Extracting Social Networks from Literary Fiction.” Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden, 2010. 138-147. (PDF). See also my brief comments on this paper.
- Holtzman et al. on semantic measures of media bias (PDF). See also casstools.org.
- Cameron Blevins’ work on topic modeling and the diary of Martha Ballard.
Tools
There are many, many text analysis and natural language processing tools available, many of them geared toward specific research domains. I mentioned only a comparative handful. This list is a long way from exhaustive.
All projects are free and open source unless otherwise noted.
Built Tools
Good places to start; little or no programming required.
- Wordle. Word clouds. Noncommercial use only, I believe.
- WordHoard. Statistics, analytics, and visualizations of classic literature.
- GeoDict. Extract named places from unstructured text.
- Docuscope. A semi-publicly-available tool for text analysis backed by an extensive, hand-curated dictionary.
- Casstools.org. Contrast Analysis of Semantic Similarity. Evaluate differential word associations in text corpora.
- Voyeur Tools. Simple, Web-based text analytics. BYO text/corpus.
- The MONK Project. Integrated, Web-based corpus analysis. Uses only texts from the (relatively large) included corpus.
- SEASR. Packaged text analytics and development environment aimed at scholars in the humanities. Includes Zotero integration. SEASR pushes toward a full toolkit.
- And one tool that I didn’t have a chance to mention: Mark Olson’s ARTFL-associated PhiloLine/PAIR. Sequence alignment detection in textual corpora; the analogy is to similar work in genetics.
Toolkits and Development Environments
Most of these packages come with demos and tutorials that may be useful on their own, but they’re aimed at allowing you to create your own text-mining applications.
- GATE. An advanced development environment for text analysis with included analysis routines.
- LingPipe. Advanced, Java-based natural language processing (NLP) toolkit. Partially integrated with GATE, but also a stand-alone product. Open source, but free only if you make your output texts freely available.
- NLTK. Well-documented, Python-based NLP toolkit. Used widely in teaching NLP.
- MALLET. Java-based, command-line package for statistical NLP. Useful for topic modeling, among many other things.
Statistics Packages
These packages don’t necessarily have anything to do with natural language analysis, but they’re useful for general statistical work and visualization.
- R. A platform for statistical computing. Baayen’s book on corpus linguistics with R is a useful introduction with a natural language focus.
- SPSS. The long-serving standard for stats in the social sciences. Emphatically not free, but widely site-licensed.
Hope this is of some use. Drop me a line (see the “About” page) if you spot any errors or want to chat about this work.
Expanded List of Allegorical-Nonallegorical Pairs
September 30th, 2010 § 1 Comment
Background
In an earlier post, I offered a brief list of paired allegorical and nonallegorical texts by single authors. The idea was to use these pairs to look for the distinguishing textual features of allegory by controlling for as many variables (such as authorial style, genre, national origin, gender, period of composition, etc.) as possible. Or in other words, the attempt was to get as close as possible to the unattainable ideal of a corpus of texts that differ only by the presence or absence of allegory.
That short list was OK and was the basis of the second figure in my MLA paper on “Critical Text Mining.” But it was both (1) too short for corpus work and (2) depended on my own assessment of allegoricalness, with attendant limitations of historical scope. I’ve always felt that the better option would be to build an expanded version of this pairwise list on the basis of settled scholarship in the field.
The table below represents the groundwork for such a corpus of well-established allegorical-nonallegorical pairs. It’s still under development—there are obvious holes and issues—but it’s an outline of where I’m headed. What I really need now is feedback on the composition of this list.
Issues and Notes
A few notes, followed by a request for kind assistance:
- All of the allegorical works are attested by one or more of the following major sources on allegory. Most are attested by several of them.
- Copeland, Rita, and Peter Struck, eds. The Cambridge Companion to Allegory. Cambridge: Cambridge UP, 2010.
- Fletcher, Angus. Allegory: The Theory of a Symbolic Mode. Ithaca: Cornell UP, 1964.
- Honig, Edwin. Dark Conceit: The Making of Allegory. Hanover, NH: UP of New England, 1959.
- Leeming, David Adams, and Kathleen Morgan Drowne. Encyclopedia of Allegorical Literature. Santa Barbara, CA: ABC-CLIO, 1996.
- Tambling, Jeremy. Allegory. New York: Routledge, 2010.
- From these sources, I’ve excluded works mentioned only in passing or discussed as ambiguous or difficult cases. So while there’s always room to argue about the allegoricalness of any entry, the texts presented here under the heading of “Allegory” are about as canonically allegorical as it’s possible to be.
- The nonallegorical texts are another matter; I’ve selected them myself as potential pairings for the allegorical entries. So far I’ve limited these to works by the same author, but I’m not necessarily averse to well-paired nonallegorical entries by other authors (and I’m aware that such pairings will sometimes be required).
There are two ways to use this list, and therefore two potentially conflicting goals when selecting pairs of texts:
- Pairwise comparisons. In this case, I’ll evaluate each allegorical text only against its paired nonallegorical counterpart. For this purpose, it’s not especially important where the two texts fall on the imagined spectrum of allegoricalness, only that they be well separated from one another on it. But it is important that the two members of the pair are otherwise as similar as possible.
- Corpus comparisons. On the other hand, I’ll also want to compare the features of the allegorical texts taken together against those of the collected nonallegorical texts. For this purpose what’s important is to avoid cases in which any of the allegorical or nonallegorical entries stray too far toward the opposite category, even if they’re significantly different from their pairmates. But it’s not so crucial that any one pair be especially well matched in content, style, etc.; the two corpora just need to be similar in overall composition.
Action Item
So what I’m looking for is feedback on the suitability of the nonallegorical items that are currently listed below, plus suggestions for appropriate texts where none is given.
The ideal case it to find a firmly nonallegorical text by the same author for each of the allegorical entries, but where that’s not possible, the next best solution is probably a text of similar origin, style, length, subject matter, form, and so forth. This will never be perfect, but the closer the match—while still maintaining good relative and absolute separation on the allegorical continuum—the better.
I’d also love to know about potential issues or complications concerning any of these texts and pairings.
Oh, and one other constraint: I need to be able to get my hands on electronic versions of whatever texts I’m going to use; this makes anything published after 1923 difficult (though not strictly impossible).
Massive thanks in advance to any and all who care to comment. The comments section below is probably the easiest way to leave feedback, or you can email me by clicking the “About” link (over on the lefthand side).
The Table: Allegorical and Nonallegorical Text Pairs Grouped by Era
| Author | Allegory | Nonallegory | Notes |
|---|---|---|---|
| Ancient and classical | |||
| Aeschylus | Prometheus Bound | Agamemnon | Disputed authorship of Prometheus Bound |
| Aesop | Fables | ??? | |
| Hesiod | Theogony | Works and Days | |
| Boethius | Consolation of Philosophy | De Musica | De Musica seems unsuitable |
| Capella, Martianus | Marriage of Mercury and Philology | ??? | |
| Ovid | Metamophoses | Amores | |
| Prudentius | Psychomachia | Cathemerinon | |
| Virgil | Aeneid | Georgics | |
| Anon. | Bible (Genesis) | ??? | Very likely more interpretational trouble than it’s worth |
| Medieval and Renaissance | |||
| Alain de Lille | Complaint of Nature | Liber poenitentialis | |
| Lorris, Guillaume de | Romance of the Rose | ??? | Other medieval romance? |
| Silvestris, Bernard | Cosmographia | ??? | Maybe commentary on Aeneid, but disputed authorship and different form |
| Bale, John | King John | ??? | Another play from the era? |
| Chaucer, Geoffrey | House of Fame | Troilus and Criseyde | |
| Chaucer, Geoffrey | Parliament of Fowles | Troilus and Criseyde | |
| Fletcher, Phineas | Purple Island | ??? | "Brittain’s Ida" (erotic poem)? |
| Gower, John | Confessio Amantis | Vox Clamantis | |
| Hawes, Stephen | Passetyme of Pleasure | Comfort of Lovers | |
| Kempe, Margery | Book of Margery Kempe | ??? | |
| Langland, William | Piers Plowman | ??? | |
| Lydgate, John | Reson and Sensualitie | King Henry VI’s Triumphal … | |
| Shakespeare, William | Phoenix and the Turtle | ??? | Appropriate sonnets? |
| Spenser, Edmund | Faerie Queene | Shepheardes Calender | Or Complaints |
| Anon. | Castle of Perseverance | ??? | |
| Anon. | Everyman | ??? | |
| Anon. | Pearl | ??? | |
| Alighieri, Dante | Divine Comedy | Vita Nuova | |
| Tasso, Torquato | Jerusalem Conquered | Aminta | |
| Calderón | Great Theater of the World | ??? | "Life Is a Dream" too allegorical? |
| 17th & 18th centuries | |||
| La Fontaine, Jean de | Fables | Tales | |
| Bunyan, John | Holy War | Grace Abounding | |
| Bunyan, John | Life and Death of Mr Badman | ||
| Bunyan, John | Pilgrim’s Progress | ||
| Defoe, Daniel | Robinson Crusoe | Journal of the Plague Year | |
| Dryden, John | Absalom and Achitophel | Annis Mirabilis | |
| Milton, John | Comus | Samson Agonistes | Samson Agonistes too allegorical? |
| Milton, John | Paradise Lost | ??? | Areopagitica? Genre/form mismatch. |
| Pope, Alexander | Dunciad | Rape of the Lock | |
| Swift, Johnathan | Battle of the Books | Modest Prposal | |
| Swift, Johnathan | Gulliver’s Travels | Argument Against Abolishing Christianity | |
| Swift, Johnathan | Tale of a Tub | ||
| 19th century British | |||
| Verne, Jules | Journey to the Center of the Earth | Twenty Thousand Leagues | Or "Around the World in 80 Days" |
| Butler, Samuel | Erewhon | Way of All Flesh | |
| Conrad, Joseph | Heart of Darkness | Lord Jim | |
| Darwin, Erasmus | Temple of Nature | Botanic Garden | |
| Gissing, George | Nether World | New Grub Street | |
| Kipling, Rudyard | Below the Mill-Dam | Young Men at the Manor | Better pairing? |
| Shelley, Mary | Frankenstein | Mathilda | |
| 19th century American | |||
| Baum, L. Frank | Wonderful Wizard of Oz | Queen Zixi of Ix | |
| Hawthorne, Nathaniel | Antique Ring | ??? | Suitable stories? |
| Hawthorne, Nathaniel | Birthmark | ??? | |
| Hawthorne, Nathaniel | Rappaccini’s Daughter | ??? | |
| Hawthorne, Nathaniel | Scarlet Letter | House of the Seven Gables | |
| Melville, Herman | Confidence-Man | Israel Potter | |
| Melville, Herman | Mardi | Typee | |
| Melville, Herman | Moby-Dick | Omoo | |
| Modern | |||
| Čapek, Karel | R.U.R. | ??? | |
| Čapek, Karel | War with the Newts | ??? | |
| Kafka, Franz | Castle | ??? | |
| Kafka, Franz | Country Doctor | ??? | |
| Kafka, Franz | Metamophosis | Description of a Struggle | |
| Kafka, Franz | Trial | Amerika | |
| Camus, Albert | Plague | First Man | |
| Huxley, Aldus | Brave New World | Point Counter Point | "Crome Yellow" (and maybe "Antic Hay") are public domain |
| Orwell, George | 1984 | Burmese Days | |
| Orwell, George | Animal Farm | Road to Wigan Pier | |
| Mann, Thomas | Mario and the Magician | Buddenbrooks | |
| Yeats, William Butler | Dialogue of Self and Soul | Second Coming | |
| Zamyatin, Yevgeny | We | Islanders | |
| Hurston, Zora Neale | Moses, Man of the Mountain | Thier Eyes Were Watching God | |
| Contemporary | |||
| Golding, William | Lord of the Flies | The Scorpion God | The Inheritors |
| Lewis, C. S. | Lion, the Witch, and the Wardrobe | ??? | |
| Rushdie, Salman | Midnight’s Children | Fury | Or Ground Beneath Her Feet or Moor’s Last Sigh |
| Beckett, Samuel | Waiting for Godot | All That Fall | Suitable nonallegorical drama? |
| Nabokov, Vladimir | Lolita | Ada | |
| Coetzee, J.M. | Waiting for the Barbarians | Boyhood | Or Youth/Summertime |
| Barth, John | Giles Goat-Boy | Sot-Weed Factor | |
| Ellison, Ralph | Invisible Man | ??? | |
| Faulkner, William | Fable | The Hamlet | |
| Ginsberg, Allen | Howl | Kaddish | |
| Kesey, Ken | One Flew over the Cuckoo’s Nest | Sometimes a Great Notion | |
| O’Connor, Flannery | Violent Bear It Away | ??? | Wise Blood too allegorical |
Publishing Stats from the UK
September 29th, 2010 § Leave a Comment
A quick follow-on to my previous post on the number of novels published annually in the U.S. I’ve now seen roughly comparable figures for the UK from 1994 through 2008 (via Dan Cohen, with thanks for the pointer).
The UK numbers come from Nielson and aren’t broken down by category, but the overall picture is that there have been about half as many total English-language volumes published annually there as in the U.S. in recent years. I don’t know if Brits are bigger readers of fiction, proportionately, than Americans, but I’d say the large-scale assumption that the two markets for fiction are of the same general magnitude (within about a factor of two) is reasonable.
What I’d still like to know is the portion of their annual output that’s in common. Are twenty percent of novels published in one country also published in the other? Fifty percent? Eighty? And are novels more (or less?) internationally “portable” than other kinds of books?



