© 2002 by Association for Literary & Linguistic Computing
Automatically Categorizing Written Texts by Author Gender
1 Department of Mathematics and Computer Science, Bar-Ilan University, Israel 2 Department of Computer Science, Jerusalem College of Technology, Israel
The problem of automatically determining the gender of a document's author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with approximately 98 per cent accuracy.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Baroni and S. Bernardini A New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text Lit Linguist Computing, September 1, 2006; 21(3): 259 - 274. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Barak Sexual Harassment on the Internet Social Science Computer Review, February 1, 2005; 23(1): 77 - 92. [Abstract] [PDF] |
||||
![]() |
N. S. Baron See you Online: Gender Issues in College Student Use of Instant Messaging Journal of Language and Social Psychology, December 1, 2004; 23(4): 397 - 423. [Abstract] [PDF] |
||||


