Literary and Linguistic Computing Advance Access originally published online on April 12, 2007
Literary and Linguistic Computing 2007 22(2):137-150; doi:10.1093/llc/fqm006
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
How to Handle Small Samples: Bootstrap and Bayesian Methods in the Analysis of Linguistic Change
Institute for Informatics, Martin-Luther University, Halle/Saal, Germany
HIIT Basic Research Unit, University of Helsinki and Helsinki University of Technology, Finland
Department of English, University of Helsinki, Finland
Correspondence: Prof. Heikki Mannila, Helsinki Institute of Information Technology, P.O. Box 68, 000140 University of Helsinki, Finland. E-mail: mannila{at}cs.helsinki.fi
| Abstract |
|---|
Estimating the relative frequencies of linguistic features is a fundamental task in linguistic computation. As the amount of text or speech that is available from a given user of the language typically varies greatly, and the sample sizes tend to be small, the most straightforward methods do not always give the most informative answers. Bootstrap and Bayesian methods provide techniques for handling the uncertainty in small samples. We describe these techniques for estimating frequencies from small samples, and show how they can be applied to the study of linguistic change. As a test case, we use the introduction of the pronoun you as subject in the data provided by the Corpus of Early English Correspondence (c. 14101681).
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Hilpert and S. Th. Gries Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition Lit Linguist Computing, December 1, 2009; 24(4): 385 - 401. [Abstract] [Full Text] [PDF] |
||||
