© 1993 by Association for Literary & Linguistic Computing
Articles |
Corpus Annotation Schemes
Lancaster University UK
This paper explains the nature of corpus annotation, as an automatic or machine-aided procedure for adding interpretative information to a text corpus. It proposes principles or standards to be applied to corpus annotation. It also describes and illustrates different levels of corpus annotation: prosodic, morphosyntactic, syntactic, semantic, and pragmatic/discoursal. Up to the present, the first three of these have been most fully developed, and grammatical tagging software, in particular, has become almost commonplace.
The paper suggests that certain criteria of success have to be maintained. From the annotator's point of view, these include speed of application, accuracy, and consistency. From the user's point of view there is often a need for delicacy of analysis, although this may conflict with the need for speed of application and the need to meet the requirements of a wide range of potential end-users, for whom a consensual analysis, not strongly wedded to any particular theoretical position, may be desirable.