Some POS Frequency Factoids

I’ll be posting a couple of times in the next few days about DH ’09, THATCamp, and the state of my project. First, though, a handful of (mildly) interesting plots concerning part-of-speech frequency correlations from the MONK corpus.

MONK contains about 1,000 novels and novel-like works spread over the eighteenth, nineteenth, and twentieth centuries. (The full corpus is larger and covers a longer timespan; it includes drama, witchcraft narratives, some nonfiction, etc.) I’ve counted occurrences of the major POS types across just the narrative fiction, divided them up by year of publication, and then grouped together a few nearby years in which few or no books were included. In the end, there’s coverage from 1742 through 1905, with all years (or groups of years) containing at least 500,000 words by four or more authors and no group spanning more than five years. This is the same dataset from which I’ll construct some POS frequency vs. time graphs in a later post (where I’ll also link to the raw counts).

First, two cases that that are easy to anticipate and serve as a kind of check that things aren’t too far off:

Adjective frequency vs. noun frequency

Adjective frequency vs. noun frequency

Adverb frequency vs. verb frequency

Adverb frequency vs. verb frequency

About what you’d expect: a decent positive correlation between the frequency of nouns or verbs and the frequency of words that modify them. Slightly weaker correlation in the adverb case, presumably because adverbs don’t always modify verbs.

Then there’s an interesting case that I think I can explain, but wouldn’t have predicted:

Noun frequency vs. verb frequency

Noun frequency vs. verb frequency

Noun and verb frequency are inversely correlated. This makes sense, I suppose, if you think of novels as tending toward portraiture or action (and for all I know if may be a well known phenomenon). But I expected to see more nouns imply more verbs, since you’d need more things for those subjects and objects to do. In any case, I learned something here from my few minutes with GGobi.

Finally, one that leaves me at a loss:

Adjective frequency vs. adverb frequency

Adjective frequency vs. adverb frequency

How can adjectives and adverbs be apparently uncorrelated? Shouldn’t there be flowery novels rich in both of them and plain ones rich in neither? I’ll investigate, but in the meantime I’d love to be told that this, too, is already accounted for.

Last note: GGobi is really nifty, even if it doesn’t produce beautiful figures out of the box (see above).

The Shakespeare Industry

Loosely apropos Ed Finn’s panel at DH on Pynchon, Matt Jockers and I were trying to guess the most-published-upon author in English. I figured Shakespeare, he suggested Joyce. This morning I ran a couple of quick queries on the MLA database and came up with the following:

	  Shakespeare	Joyce
2008+     	  716	  151
2004+     	 3826	  937
1999+     	 8159	 2135
All (1923+)	35489	 9315

There are some details to explain, but the take-away point is that Shakespeare seems to be the object of about four times more scholarship than Joyce.

The details: These are raw result counts for the subject queries “Shakespeare William” and “Joyce James,” both of which are defined subject headings in MLA. The counts are total matching items of all types (journal articles, refereed journal articles, books, chapters, and other) published from the listed year to the present. I didn’t make any attempt to distinguish major from minor works (e.g., books from articles), nor single-subject studies from multi-subject ones. This is obviously pretty non-rigorous, but it was good enough to satisfy my passing curiosity.

This is interesting and at least a little unexpected to me. I figured Shakespeare would be in the lead, especially over the full history of criticism, but I thought things would be much closer, especially in recent years. I wonder if part of the gap might be explained by a higher likelihood of talking about Shakespeare in any given English renaissance context than about Joyce in any given modernist one?

The Formal Charge Against Lurie

A bit more on Disgrace. Talking things over with Liz Evans, she pointed out that the specific charge leveled against Lurie by Melanie Isaacs isn’t entirely clear; we’re never told anything beyond the fact that it involves an alleged breach of “article 3.1 of the university’s Code of Conduct,” which “addresses victimization or harassment of students by teachers” and is a subsection of article 3, concerning “victimization or harassment on grounds of race, ethnic group, religion, gender, sexual preference, or disability” (38-39). Nor do we see the content of Melanie’s statement to the committee (which statement Lurie claims not to have read, though it has been provided to him). [Footnote: There’s also the technical charge of irregularity in grading and recordkeeping, but that is obviously a subsidiary matter, probably best understood as a gesture toward bureaucratic verisimilitude and an attempt to raise the probability of conviction by including a lesser but more easily proven allegation.] The members of the committee refer to the charges alternately as involving “harassment,” “abuse,” and “exploitation.” But it seems unlikely—this was Liz’s point—that they involve rape; if they did, it’s hard to imagine the committee entertaining the possibility that Lurie would retain his position at the university (which does appear to be the suggestion, provided he is willing to make a sincere apology and undergo counseling, etc.).

Why is this important? Because it’s part of the analogy between Melanie’s mistreatment and Lucy’s, hence of the structural and allegorical parallel between colonial violence and retributive justice. I hadn’t noticed this fact concerning Melanie’s accusation, but it adds another important way in which she resembles Lucy; they both present a legal claim to the authorities, but withhold from their accounts any mention of rape. This strengthens the parallel between the two women, and thus reinforces our obligation to make sense of the similarities and differences in the way they’re treated, in the ways they respond to that treatment, and in their respective social and historical positions.

Klaaste on Disgrace

I just received a copy—via ILL, on microfilm, from Johannesburg—of an opinion piece on Disgrace from the Sowetan (by Aggrey Klaaste, 3 April 2000, p. 9). It’s one that I had seen referenced in a couple of places, but had never before been able to read. Nothing of interest as literary criticism, but it’s a potentially useful fragment of documentation concerning the novel’s initial political reception in South Africa. If you’re interested, I’ve put up a marginal-quality PDF copy (what can I say, it’s a scan of a printout from microfilm.)

Also: The fact that it took two months to get a hold of this article—and that there were many more that were simply unobtainable at a major U.S. research university with a diligent ILL department—illustrates part of the problem with doing politically and culturally informed work on Coetzee outside South Africa. It’s certainly not impossible, and I don’t mean to overplay the difficulty, but the primary sources are much trickier to track down than I expected them to be.

Also also: Microfilm apparently still exists.

Debt and Punishment

A bit more on the function of debt in Disgrace, especially with respect to the TRC.

I’ve been trying to figure out how punishment works as a form of compensation for the victims of ethical and legal wrongs (which are not the same thing). In an earlier post, I moved away from the idea that legal (i.e., state-sanctioned, tribunal-mediated) punishment was intended to provide a compensatory satisfaction to those who have been wronged. I think this is generally true, at least as a theoretical principle of modern law; it’s one of the reasons, for instance, that “victims’ rights” remains a marginal concept. (The other being, of course, that the state is now understood to intervene between perpetrator and victim, so that the victim isn’t a proper party to the exercise of legal justice.)

But what about institutions like the TRC, which I claimed in my ACLA paper do serve a significantly compensatory function? How so? Well, it’s partially that they provide something of value to the victim in a context where more direct compensation in the form of reparations, significant socioeconomic restructuring, etc. is unlikely. But there’s more to it than that, I think. The desire of victims to receive a “sincere” apology—illustrated at some length in Coetzee’s novel through both the reactions of Melanie’s family and David’s souring relationship with Petrus—is a desire to see the perpetrator suffer the pangs of conscience. The victim in such cases enjoys, takes satisfaction in, the perpetrator’s self-punishment, which is more harsh than most of what can be otherwise imposed on the perpetrator (as evinced by the obvious inadequacy, to most of those involved, of Lurie merely losing his job absent any sincere expression of contrition). So there’s a sense in which legal punishment, even today, is intended to provide a kind of repayment to the victim, and this is especially true in the case of (pseudo) tribunals like the TRC that are intended to address large-scale historical wrongs.

Note, too, that all this is a theory of law and rights that’s presupposed by and enables the bad reading of Coetzee developed and critiqued in my paper, not something that I’m insisting is necessarily the basis of all contemporary law. In any case, though, Coetzee is in a way law-agnostic; his real analysis is of ethics or morality, which he is at pains to show work differently. That’s the point of Lurie’s encounter with the commission, which mixes law and ethics in a way that doesn’t work well for either one.

MorphAdorner Release

The first public release of MorphAdorner—version 0.9, released April 3, 2009—is now available. There’s full documentation, too. Congratulations and many thanks to Phil Burns – this is great news.

I discussed MorphAdorner as part of my series of posts on part-of-speech taggers a couple of months back, and will be using it for much of my upcoming work.

My understanding is that Phil intends to leave MorphAdorner mostly as-is for the time being, unless it’s taken up by another project; MONK has been funding current development, I think, and it (MONK) is winding down. Which reminds me: A public version of the MONK workbench, with a bevy of analytical tools and access to several thousand texts across four-plus centuries, should be available soon. Will post here when it’s up, though I’m not involved in making that happen.

Disgrace and Debt

A quick follow-up to this past weekend’s ACLA conference. My seminar was on literature and law; it was interesting and useful indeed, despite being a bit to the side of what I usually do. If you’re interested, I’ve posted a copy of my paper (PDF); a more formal treatment is in the works.

The talk argues that while a legal framework in which ethical wrongs are treated as analogous to economic debts is probably inevitable, Coetzee’s novel shows how this economic treatment is inadequate as a basis of moral action. Specifically, the economic analogy enables—maybe even requires—bad readings of Disgrace, ones in which Lurie becomes the true victim insofar as he pays out of all proportion for his offense against Melanie. If we drop the debt model, we can also avoid this problem.

Joey Slaughter, one of the co-conveners of the seminar, objected that contemporary understandings of the law are not in fact based on such an economic model, which struck him as “premodern” (of the eye-for-an-eye type). I see the point, which is similar to Foucault’s analysis in Discipline and Punish; we no longer think that the task of punishment is to provide a compensatory enjoyment for the victim, whether the victim in question is understood to be the directly harmed individual or the state/corporate body as a whole. All true, and I may have drifted perilously close to suggesting something along those lines. But my point had less to do with victims or with punishment as payment than it did with an “account balance” of sorts for the perpetrator. The idea—which I was suggesting underlies the principle of proportionality in sentencing and that Coetzee rejects as an adequate account of morality (but not necessarily of law)—is that punishments should deprive the perpetrator of any surplus or advantage accumulated through his offense (plus an additional deterrent amount, though I didn’t raise that point in the talk for lack of time).

The easier version is when the offense is straightforwardly economic, though even then it’s not dead simple. If I steal $10 from you, I’ll need to repay that amount, plus a deterrent amount, plus whatever we collectively deem appropriate for the inherent damage caused by a violation of the law (related, for instance, to the fact that we all feel less secure once we’ve experienced the fact of theft). It’s harder—and this is one of the novel’s points—when the violation in question is non- or supra-economic (as with rape, exploitation, etc.). But in either case, the idea isn’t to repay the victim by allowing her to enjoy the perpetrator’s suffering (which plainly doesn’t work, as the novel demonstrates at length), it’s to deprive the perpetrator of his illegitimately accrued advantage. What the proper balance should be is a tricky question, but it’s also what the law must do. Ethics, on the other hand (and this is my reading of Coetzee), doesn’t let you off even after you’ve paid a compensatory amount; there is nothing you can do to fix or to balance your sins, and no amount of your suffering offsets them. If you’ve been wronged in turn, you don’t break even at some point, ethically speaking—you just go on being wrong. True, you’ve now been wronged, too, but that’s of a different order; there’s no universal ethical as opposed to legal account to settle.

Depressing stuff, perhaps, but then Coetzee isn’t the author for joy. More on this to come at some point.

[Update: See also this follow-up post and the series on baseline questions about Disgrace.]

Contemporary U.S. Novel Syllabus

I’m finished with a draft of my Contemporary U.S. Novel syllabus for next semester. I’ve posted a copy of the full syllabus (PDF) and a little flier about the course (also PDF).

The primary texts, with dates of publication and page counts:

  • David Foster Wallace, Infinite Jest (1996, 1104 pp.)
  • Barbara Kingsolver, The Poisonwood Bible (1998, 576 pp.)
  • Colson Whitehead, John Henry Days (2001, 389 pp.)
  • Jonathan Safran Foer, Extremely Loud and Incredibly Close (2005, 368 pp.)
  • Junot Díaz, The Brief Wondrous Life of Oscar Wao (2007, 352 pp.)
  • Rivka Galchen, Atmospheric Disturbances (2008, 256 pp.)

There will be a handful of theoretical and critical readings as well (Jameson, Chow, Hayles, Hardt, Zadie Smith, others). The initial list of primary texts was about five times as long, but, well, semesters are short.

A post on Coetzee tomorrow, then back to DH/computational stuff for a while.

Contemporary Canon?

A question occasioned by the fact that only two people out of the dozen or so in my ACLA seminar today had read Disgrace: Can you think of any single work of fiction written in the last decade that one could reasonably expect nearly everyone in a room full of literature professors to have read? I would have said Disgrace, and I’d have been wrong.