Literary and Linguistic Computing Advance Access originally published online on July 15, 2009
Literary and Linguistic Computing 2009 24(4):449-466; doi:10.1093/llc/fqp025
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Dictionary generation for less-frequent language pairs using WordNet
Graduate School of Science and Engineering, Yamagata University, Yonezawa, Japan
National Institute of Information and Communications Technology, Kyoto, Japan
Correspondence: István Varga, PhD student, Graduate School of Science and Engineering, Yamagata University, Yonezawa, Japan. E-mail: dyn36150{at}dipfr.dip.yz.zyamagata-u.ac.jp
| Abstract |
|---|
Bilingual dictionaries are vital resources in many areas of natural language processing. Numerous methods of machine translation require bilingual dictionaries of large coverage, but less-frequent language pairs rarely have any digitalized resources of such kind. Since the need for these resources is increasing, but the human resources are scarce for less represented languages, efficient automatized methods are imperative. This article presents a fully automated, robust intermediate language-based bilingual dictionary generation method that uses the WordNet of the intermediate language to build a new bilingual dictionary. We propose the usage of WordNet in order to increase accuracy; we also introduce a bidirectional selection method with a flexible threshold to maximize recall. The evaluations showed 79% accuracy and 51% weighted recall, outperforming representative pivot language-based methods. A dictionary generated with this method will still need manual post-editing, but the improved recall and precision decrease the work of human correctors.