Michael Witmore has a new post up at Wine Dark Sea on further clustering results using Docuscope on Shakespeare’s plays. I don’t have much to add, but comments are disabled on the site, and I do have a question: In his earlier work using principal components, he found that Othello clustered with the comedies. Using the new method reported today (based on “language action types”), that’s not the case. Or is it? When Witmore “standardizes” the texts, Othello returns to the comedies (it’s closest to Twelfth Night and Measure for Measure). So my question is: What is “standardization,” and why should it have so great a negative effect on clustering accuracy? (Othello isn’t the only play that changes places under standardization; as Witmore observes, the standardized results are much less eerily perfect than the nonstandardized ones.)
Great blog, Matthew. I have a sense of what the answer might be, now that I’ve thought about it for a while. Let me post your comment to my blog and then respond. (Sorry, I forgot to turn the comments on for the post.)
Mike
Thanks, Mike. I’d intended to email you about this yesterday, but didn’t quite get to it. Glad you saw the trackback, and very interested in the answer.
Also, embarrassed apologies for having found an ‘h’ in your name where none exists. A randomization procedure on my part, perhaps? Now fixed.