Literary and Linguistic Computing Advance Access published online on April 20, 2006
Literary and Linguistic Computing, doi:10.1093/llc/fql020
| ||||||||||||||||||||||||||||||||||||||||||||||||||
1 Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany
* To whom correspondence should be addressed. In this article, we describe our interdisciplinary project Rule-based search in text databases with nonstandard orthography (RSNSR) in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).
Original Papers
Rule-based Search in Text Databases with Nonstandard Orthography
Thomas Pilz 1 *,
Wolfram Luther 1,
Norbert Fuhr 1,
and
Ulrich Ammon 2
2 Institute of German Language and Literature Studies, University of Duisburg-Essen, Germany
Thomas Pilz, E-mail: pilz{at}informatik.uni-duisburg.de
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. Pilz, A. Ernst-Gerlach, S. Kempken, P. Rayson, and D. Archer The Identification of Spelling Variants in English and German Historical Texts: Manual or Automatic? Lit Linguist Computing, April 1, 2008; 23(1): 65 - 72. [Abstract] [Full Text] [PDF] |
||||
