© 2000 by Association for Literary & Linguistic Computing
Measuring vocabulary diversity using dedicated software
The University of Reading, School of Education, Bulmershe Court, Reading RG6 1HY, UK Z Corresponding author E-mail: b.j.richards@reading.ac.uk
This paper describes software (vocd) that implements a solution to problems encountered in quantifying vocabulary diversity. Researchers in various fields of linguistic enquiry have calculated vocabulary diversity using the ratio of different words (Types) to total words (Tokens) - the Type-Token Ratio (TTR) - or measures derived from it. Such measures are flawed, however, because the values obtained are related to the number of words in the sample. The paper shows how the relationship between TTR and sample size can be described by a new mathematical model, which in turn leads to an innovative method of measuring vocabulary diversity. The software automates measurement from transcripts prepared in a widely used computer-readable set of conventions: the CHAT format of the CHILDES project. Options in vocd are described to show how the user can determine which linguistic items will count as valid types and tokens in the analysis. The new measure is calculated by, first, randomly sampling words from the transcript to produce a curve of the TTR against Tokens for the empirical data. Then the software finds the best fit between this empirical curve and theoretical curves calculated from the model by adjusting the value of a parameter. The parameter, D, is shown to be a valid and reliable measure of vocabulary diversity without the problems of sample size found with previous methods.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. D. Rider, H. H. Wright, R. C. Marshall, and J. L. Page Using Semantic Feature Analysis to Improve Contextual Discourse in Adults With Aphasia Am J Speech Lang Pathol, May 1, 2008; 17(2): 161 - 172. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. S. DeThorne, S. A. Petrill, S. A. Hart, R. W. Channell, R. J. Campbell, K. Deater-Deckard, L. A. Thompson, and D. J. Vandenbergh Genetic Effects on Children's Conversational Language Use J Speech Lang Hear Res, April 1, 2008; 51(2): 423 - 435. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Altintas, F. Can, and J. M. Patton Language Change Quantification Using Time-separated Parallel Translations Lit Linguist Computing, November 1, 2007; 22(4): 375 - 393. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. McCarthy and S. Jarvis vocd: A theoretical and empirical evaluation Language Testing, October 1, 2007; 24(4): 459 - 488. [Abstract] [PDF] |
||||
![]() |
L. S. DeThorne and R. W. Channell Clinician-Child Interactions: Adjustments in Linguistic Complexity Am J Speech Lang Pathol, May 1, 2007; 16(2): 119 - 127. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. L. Hutchins, M. Brannick, J. B. Bryant, and E. R. Silliman Methods for controlling amount of talk: Difficulties, considerations and recommendations First Language, October 1, 2005; 25(3): 347 - 363. [Abstract] [PDF] |
||||
![]() |
D. K. O'Neill, M. J. Pearce, and J. L. Pick Preschool Children's Narratives and Performance on the Peabody Individualized Achievement Test - Revised: Evidence of a Relation between Early Narrative and Later Mathematical Ability First Language, June 1, 2004; 24(2): 149 - 183. [Abstract] [PDF] |
||||
![]() |
S. Jarvis Short texts, best-fitting curves and new measures of lexical diversity Language Testing, January 1, 2002; 19(1): 57 - 84. [Abstract] [PDF] |
||||
![]() |
D. Malvern and B. Richards Investigating accommodation in language proficiency interviews using a new measure of lexical diversity Language Testing, January 1, 2002; 19(1): 85 - 104. [Abstract] [PDF] |
||||
![]() |
D. Y. W. Lee Defining Core Vocabulary and Tracking Its Distribution across Spoken and Written Genres: Evidence of a Gradience of Variation from the British National Corpus Journal of English Linguistics, September 1, 2001; 29(3): 250 - 278. [PDF] |
||||





