Literary and Linguistic Computing Advance Access originally published online on February 22, 2005
Literary and Linguistic Computing 2005 20(1):91-102; doi:10.1093/llc/fqh045
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Articles |
Named Entity Recognition for the Mainland Scandinavian Languages
University of Oslo, Norway Gothenburg University, Sweden University of Bergen, Norway University of Southern Denmark, Denmark Centre for Language Technology, Denmark
Janne Bondi Johannessen, The Text Laboratory, University of Oslo, PO Box 1102 Blindern, 0317 Oslo, Norway. E-mail: jannebj{at}ilf.uio.no
In this paper we discuss the results of the Nomen Nescio Named Entity Recognition project, a joint effort for the mainland Scandinavian languagesNorwegian, Swedish, and Danish. Five research groups have been involved, and developed NE recognizers using rule-based as well as statistical methods. We focus particularly on the choice of semantic categories and the problems regarding metonymy and semantic polysemy. Furthermore, we discuss the extent to which different approaches to these problems have different effects on the different types of systems, and look at two strategies, which we call Function over Form, and Form over Function.