Literary and Linguistic Computing Advance Access originally published online on September 6, 2006
Literary and Linguistic Computing 2006 21(4):477-492; doi:10.1093/llc/fql038
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
The Relative Contribution of Pronunciational, Lexical, and Prosodic Differences to the Perceived Distances between Norwegian Dialects
Scandinavian Languages and Cultures, University of Groningen, Groningen, The Netherlands
Humanities Computing, University of Groningen, Groningen, The Netherlands
Correspondence:Charlotte Gooskens, Scandinavian Languages and Cultures, University of Groningen, Postbus 716, NL-9700 AS Groningen, The Netherlands. E-mail: c.s.gooskens{at}rug.nl
In the period between 1999 and 2002, Jørn Almberg and Kristian Skarbø compiled a database which consists of recordings and phonetic transcriptions of translations of the fable The North Wind and the Sun in about fifty Norwegian dialects. On the basis of fifteen of these recordings, Charlotte Gooskens carried out a perception experiment (Gooskens and Heeringa, 2004). In this experiment she investigated the distances between the fifteen dialects as perceived by the speakers themselves.
On the basis of the phonetic transcriptions, Wilbert Heeringa (2004) measured computational linguistic distances between the fifteen Norwegian varieties (Gooskens and Heeringa, 2004). Distances were calculated by means of Levenshtein distance, which finds the minimum cost of changing one pronunciation into another by inserting, substituting or deleting phonetic segments. Gooskens and Heeringa (2004) correlated the perceptual distances with these computational distances and found a significant correlation of r = 0.67. In the computational distances, pronunciational, lexical, and morphological variation is processed, but these levels are not studied separately.
The contribution of this article is that we measure pronunciational, lexical, and prosodic distances separately. Within pronunciational distances we distinguish between consonants and vowels on the one hand, and between substitutions and insertions/deletions on the other hand. When correlating the separate levels with perception and using multiple linear regression analyses we found that pronunciation is most important in perception and especially vowel substitutions play a major role.
1 See http://hyde.park.uga.edu/lamsas.
2 The recordings and the transcriptions (in IPA as well as in SAMPA) were made by Jørn Almberg in cooperation with Kristian Skarbø at the Department of Linguistics, NTNU, Trondheim and made available at http://www.ling.hf.ntnu.no/nos/. We are grateful for their permission to use the material.
3 The example should not be interpreted as a historical reconstruction of the way in which one pronunciation changed into another. We just show that the distance between two arbitrary pronunciations is found on the basis of the least costly set of operations mapping one pronunciation into another.
4 See http://www.phon.ucl.ac.uk/home/wells/cassette.htm.
5 The program PRAAT is a free public-domain program developed by Paul Boersma and David Weenink at the Institute of Pronunciation Sciences of the University of Amsterdam and is available at http://www.fon.hum.uva.nl/praat.
6 If there are fifteen dialects, there are (15 x (15 1))/2 = 105 dialect pairs. Per dialect pair, there are maximally fifty-eight word pairs, so the reader may expect totally 105 x 58 = 6110 Levenshtein distances. The higher number of 18801 is the result of the fact that some words appear more than once in the text, for example nordavinden the North wind usually appears four times in the text, which increases the number of Levenshtein calculations per word pair.
7 In seven cases we found missing transcriptions, namely for the dialects of Herøy (two cases), Lesja (one case), Stjørdal (two cases), Trondheim (one case), and Verdal (one case).
8 Although our example is hypothetical, the pronunciations used here are existing ones, which are found in our set of fifteen Norwegian dialects.