Does anyone have an opinion on the relative merits of the various part-of-speech taggers? I’ve used (and had decent luck with) Lingpipe, which seems pretty quick and very accurate in my limited tests. I also just read a post by Matthew Jockers about the Stanford Log-linear Part-Of-Speech Tagger (which is what got me thinking about this; I admit I was largely sucked in by the discussion of Xgrid, which I’d really like to try). And I thought the Cornell NLP folks had one, too, though I now can’t find any reference to it, so I may well be wrong. Plus there’s MONK/Northwestern’s MorphAdorner (code not yet generally available, though I don’t think it would be a problem to get it), and any number of commercial options (less attractive, for many reasons).
I surely just need to test a bunch of them is some semi-systematic way, but is there any existing consensus about what works best for literary material?
I think this area has been neglected because as soon as POS taggers performed 98% on *some* corpus, researchers moved on to statistical parsing…
Some pointers:
http://portal.acm.org/citation.cfm?id=520794.878799&coll=&dl=
Click to access atwell.pdf
http://nora.hd.uib.no/corpora/1997-3/0161.html