Skip Navigation

Literary and Linguistic Computing 2004 19(2):221-242; doi:10.1093/llc/19.2.221
© 2004 by Association for Literary & Linguistic Computing
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Tambouratzis, G.
Right arrow Articles by Tambouratzis, D.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Discriminating the Registers and Styles in the Modern Greek Language-Part 2: Extending the Feature Vector to Optimize Author Discrimination

George Tambouratzis1, Stella Markantonatou1, Nikolaos Hairetakis1, Marina Vassiliou1, George Carayannis1 and Dimitrios Tambouratzis2

1 Institute for Language and Speech Processing, Greece 2 Agricultural University of Athens, Greece

This article describes a method for discriminating among authors within a given register of Modern Greek. The focus here is to determine to what extent the stylistic differences among authors can be detected with a high degree of accuracy for a set of texts belonging to a well-defined register. To that end, the chosen register is characterized by a well-defined sub-language, from which a corpus of more than 1,000 documents has been created. To discriminate the texts according to author style, a series of experiments have been performed using statistical techniques. Each text has been represented by a vector covering several linguistic aspects, in an effort to determine the most effective style markers. The experimental results indicate that the proposed approach can successfully separate the author styles for a given register. An extensive study of the effectiveness of the different variable categories has been performed. For instance, diglossia information on its own is not sufficient for author discrimination. Instead, a systematic evaluation process indicates that part-of-speech, structural and algorithmically derived lemma-frequency variables are the most important style markers, their use leading to an author discrimination accuracy exceeding 90%.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Lit Linguist ComputingHome page
G. Tambouratzis and M. Vassiliou
Employing Thematic Variables for Enhancing Classification Accuracy Within Author Discrimination Experiments
Lit Linguist Computing, June 1, 2007; 22(2): 207 - 224.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.