Literary and Linguistic Computing Advance Access originally published online on May 2, 2007
Literary and Linguistic Computing 2007 22(3):271-290; doi:10.1093/llc/fqm009
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Multivariate Analysis of Finnish Dialect Data—An Overview of Lexical Variation
Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, P.O. Box 68, FI–00014, Finland
Department of Computer Science, University of Helsinki, Research Institute for the Languages of Finland
Research Institute for the Languages of Finland
Correspondence: Saara Hyvönen, Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, P.O. Box 68, FI–00014, Finland. E-mail: saara.hyvonen{at}cs.helsinki.fi
| Abstract |
|---|
During the process of writing a comprehensive dictionary of Finnish dialects, a large set of maps describing the regional distribution of the dialect words have been compiled in electronic form. In this article, we set out to analyse this corpus of data in order to gain new insight on the variation of Finnish dialects. We use a wide range of multivariate data analysis methods, including principal components analysis, independent components analysis, clustering, and multidimensional scaling. We explain how to preprocess the data to overcome the problem of uneven sampling caused by the way the data has been collected. We discuss the results obtained by these methods and compare them to the traditional view of Finnish dialect groups.