Skip Navigation



Literary and Linguistic Computing Advance Access published online on June 11, 2009

Literary and Linguistic Computing, doi:10.1093/llc/fqp024
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
24/3/363    most recent
fqp024v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Witt, A.
Right arrow Articles by Stegmann, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press on behalf of ALLC and ACH. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

SusTEInability of linguistic resources through feature structures

Andreas Witt

Institut für Deutsche Sprache, Mannheim, Germany

Georg Rehm

vionto GmbH, Berlin, Germany

Erhard Hinrichs

Tübingen University, General and Computational Linguistics, Germany

Timm Lehmberg

Hamburg University, SFB Multilingualism, Germany

Jens Stegmann

Bielefeld University, Faculty of Linguistics and Literary Studies, Germany

Correspondence: Andreas Witt, Institut für Deutsche Sprache, R 5, 6-13, D-68161 Mannheim, Germany. E-mail: witt{at}ids-mannheim.de

   Abstract

This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.