Skip Navigation



Literary and Linguistic Computing Advance Access published online on March 17, 2008

Literary and Linguistic Computing, doi:10.1093/llc/fqn004
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Smith, N.
Right arrow Articles by Rayson, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Authors 2008. Published by Oxford University Press on behalf of ALLC and ACH. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Corpus Tools and Methods, Today and Tomorrow: Incorporating Linguists’ Manual Annotations

Nicholas Smith

School of English, Sociology, Politics & Contemporary History, University of Salford, Manchester, UK

Sebastian Hoffmann

Department of Linguistics & English Language, Lancaster University, Lancaster, UK

Paul Rayson

Department of Computing, Lancaster University, Lancaster, UK

Correspondence: Nicholas Smith, School of English, Sociology, Politics & Contemporary History, University of Salford, Manchester M5 4WT, UK. E-mail: n.smith{at}salford.ac.uk

   Abstract

Today's corpus tools offer the user a wide range of features that greatly facilitate the linguistic analysis of large amounts of authentic language data (e.g. frequency distributions, collocations, keywords, etc.). However, these tools typically fail to address the fundamental need of the linguist to add interpretive information to a concordance or query result, by coding individual concordance lines for structural, functional, discoursal, and other features in a flexible way. The ability to add such qualitative data is indispensable to a fuller understanding of the phenomenon under investigation as it allows the linguist to produce more rigorous descriptions—and theories—about language in use.

Our article has two aims: first, to assess the merits and drawbacks of existing solutions, by surveying what can be achieved using state-of-the-art corpus tools and generic database software; second, we draw up a set of desiderata and recommendations for the incorporation of flexible encoding features into future corpus tools. We describe an initial step in this direction, with a recent enhancement to the BNCweb corpus analysis software. More generally, we hope our suggestions will lead to linguists and software developers working together more closely to ensure that the needs of the former are provided for by the available technology.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.