Shakespearean Clustering

Michael Witmore has a new post up at Wine Dark Sea on further clustering results using Docuscope on Shakespeare’s plays. I don’t have much to add, but comments are disabled on the site, and I do have a question: In his earlier work using principal components, he found that Othello clustered with the comedies. Using the new method reported today (based on “language action types”), that’s not the case. Or is it? When Witmore “standardizes” the texts, Othello returns to the comedies (it’s closest to Twelfth Night and Measure for Measure). So my question is: What is “standardization,” and why should it have so great a negative effect on clustering accuracy? (Othello isn’t the only play that changes places under standardization; as Witmore observes, the standardized results are much less eerily perfect than the nonstandardized ones.)

2 thoughts on “Shakespearean Clustering

  1. Great blog, Matthew. I have a sense of what the answer might be, now that I’ve thought about it for a while. Let me post your comment to my blog and then respond. (Sorry, I forgot to turn the comments on for the post.)


    • Thanks, Mike. I’d intended to email you about this yesterday, but didn’t quite get to it. Glad you saw the trackback, and very interested in the answer.

      Also, embarrassed apologies for having found an ‘h’ in your name where none exists. A randomization procedure on my part, perhaps? Now fixed.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s