Literary and Linguistic Computing Advance Access published online on October 1, 2007
Literary and Linguistic Computing, doi:10.1093/llc/fqm023
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Bigrams of Syntactic Labels for Authorship Discrimination of Short Texts
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Correspondence: Graeme Hirst, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, M5S 3G4. E-mail: gh{at}cs.toronto.edu
| Abstract |
|---|
We present a method for authorship discrimination that is based on the frequency of bigrams of syntactic labels that arise from partial parsing of the text. We show that this method, alone or combined with other classification features, achieves a high accuracy on discrimination of the work of Anne and Charlotte Brontë, which is very difficult to do by traditional methods. Moreover, high accuracies are achieved even on fragments of text little more than 200 words long.