© 1996 by Association for Literary & Linguistic Computing
Feature-finding for test classification
Bristol Stylometry Research Unit, Department of Mathematical Sciences, University of the West of England, Bristol BS16 1QY, UK. Email: rs-forsyth@csm.uwe.ac.uk
Stylometrists have proposed and used a wide variety of textual features or markers, but until recently very little attention has been focused on the question: where do textural features come from? In many text-categorization tasks the choice of textual features is a crucial determinant of success, yet is typically left to the intuition of the analyst. We argue that it would be desirable, at least in some cases, if this part of the process were less dependent on subjective judgement. Accordingly, this paper compares five different methods of textual feature finding that do not need background knowledge external to the texts being analysed (three proposed by previous stylometers, two devised for this study). As these methods do not rely on parsing or semantic analysis, they are not tied to the English language only. Results of a benchmark test on ten representative text-classification problems suggest that the technique here designated Monte-Carlo Feature Finding has certain advantages that deserve consideration by future workers in this area.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Grieve Quantitative Authorship Attribution: An Evaluation of Techniques Lit Linguist Computing, September 1, 2007; 22(3): 251 - 270. [Abstract] [Full Text] [PDF] |
||||
