Skip Navigation

Literary and Linguistic Computing 2002 17(2):157-180; doi:10.1093/llc/17.2.157
© 2002 by Association for Literary & Linguistic Computing
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hoover, D. L.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Frequent Word Sequences and Statistical Stylistics

David L. Hoover1

1 New York University, New York, NY, USA

This paper investigates the relative effectiveness and accuracy of multivariate analysis, specifically cluster analysis, of the frequencies of very frequent words and the frequencies of very frequent word sequences in distinguishing texts by different authors and grouping texts by a single author. Cluster analyses based on frequent words are fairly accurate for groups of texts by known authors, whether the texts are long sections of modern British and US novels or shorter sections of contemporary literary critical texts, but they are only rarely completely accurate. When frequent word sequences are used instead of frequent words or in addition to them, however, the accuracy of the analyses often improves, sometimes dramatically, especially when personal pronouns are eliminated. Analyses based on frequent sequences even provide completely correct results in some cases where analyses based on frequent words fail. They also produce superior results for small groups of problematic novels and critical texts extracted from the larger corpora. Such successes suggest that analyses based on frequent word sequences constitute improved tools for authorship and stylistic studies.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Lit Linguist ComputingHome page
O. Hellwig
A chronometric approach to Indian alchemical literature
Lit Linguist Computing, December 1, 2009; 24(4): 373 - 383.
[Abstract] [Full Text] [PDF]


Home page
Lit Linguist ComputingHome page
M. L. Jockers, D. M. Witten, and C. S. Criddle
Reassessing authorship of the Book of Mormon using delta and nearest shrunken centroid classification
Lit Linguist Computing, February 17, 2009; (2009) fqn040v2.
[Abstract] [Full Text] [PDF]


Home page
Lit Linguist ComputingHome page
J. Grieve
Quantitative Authorship Attribution: An Evaluation of Techniques
Lit Linguist Computing, September 1, 2007; 22(3): 251 - 270.
[Abstract] [Full Text] [PDF]


Home page
Lit Linguist ComputingHome page
G. Tambouratzis and M. Vassiliou
Employing Thematic Variables for Enhancing Classification Accuracy Within Author Discrimination Experiments
Lit Linguist Computing, June 1, 2007; 22(2): 207 - 224.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.