Skip Navigation

Literary and Linguistic Computing 2000 15(3):323-338; doi:10.1093/llc/15.3.323
© 2000 by Association for Literary & Linguistic Computing
This Article
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by McKee, G
Right arrow Articles by Richards, B
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Measuring vocabulary diversity using dedicated software

G McKee, D Malvern and B RichardsZ

The University of Reading, School of Education, Bulmershe Court, Reading RG6 1HY, UK Z Corresponding author E-mail: b.j.richards@reading.ac.uk

This paper describes software (vocd) that implements a solution to problems encountered in quantifying vocabulary diversity. Researchers in various fields of linguistic enquiry have calculated vocabulary diversity using the ratio of different words (Types) to total words (Tokens) - the Type-Token Ratio (TTR) - or measures derived from it. Such measures are flawed, however, because the values obtained are related to the number of words in the sample. The paper shows how the relationship between TTR and sample size can be described by a new mathematical model, which in turn leads to an innovative method of measuring vocabulary diversity. The software automates measurement from transcripts prepared in a widely used computer-readable set of conventions: the CHAT format of the CHILDES project. Options in vocd are described to show how the user can determine which linguistic items will count as valid types and tokens in the analysis. The new measure is calculated by, first, randomly sampling words from the transcript to produce a curve of the TTR against Tokens for the empirical data. Then the software finds the best fit between this empirical curve and theoretical curves calculated from the model by adjusting the value of a parameter. The parameter, D, is shown to be a valid and reliable measure of vocabulary diversity without the problems of sample size found with previous methods.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
AJSLPHome page
J. D. Rider, H. H. Wright, R. C. Marshall, and J. L. Page
Using Semantic Feature Analysis to Improve Contextual Discourse in Adults With Aphasia
Am J Speech Lang Pathol, May 1, 2008; 17(2): 161 - 172.
[Abstract] [Full Text] [PDF]


Home page
JSLHRHome page
L. S. DeThorne, S. A. Petrill, S. A. Hart, R. W. Channell, R. J. Campbell, K. Deater-Deckard, L. A. Thompson, and D. J. Vandenbergh
Genetic Effects on Children's Conversational Language Use
J Speech Lang Hear Res, April 1, 2008; 51(2): 423 - 435.
[Abstract] [Full Text] [PDF]


Home page
Lit Linguist ComputingHome page
K. Altintas, F. Can, and J. M. Patton
Language Change Quantification Using Time-separated Parallel Translations
Lit Linguist Computing, November 1, 2007; 22(4): 375 - 393.
[Abstract] [Full Text] [PDF]


Home page
Language TestingHome page
P. M. McCarthy and S. Jarvis
vocd: A theoretical and empirical evaluation
Language Testing, October 1, 2007; 24(4): 459 - 488.
[Abstract] [PDF]


Home page
AJSLPHome page
L. S. DeThorne and R. W. Channell
Clinician-Child Interactions: Adjustments in Linguistic Complexity
Am J Speech Lang Pathol, May 1, 2007; 16(2): 119 - 127.
[Abstract] [Full Text] [PDF]


Home page
First LanguageHome page
T. L. Hutchins, M. Brannick, J. B. Bryant, and E. R. Silliman
Methods for controlling amount of talk: Difficulties, considerations and recommendations
First Language, October 1, 2005; 25(3): 347 - 363.
[Abstract] [PDF]


Home page
First LanguageHome page
D. K. O'Neill, M. J. Pearce, and J. L. Pick
Preschool Children's Narratives and Performance on the Peabody Individualized Achievement Test - Revised: Evidence of a Relation between Early Narrative and Later Mathematical Ability
First Language, June 1, 2004; 24(2): 149 - 183.
[Abstract] [PDF]


Home page
Language TestingHome page
S. Jarvis
Short texts, best-fitting curves and new measures of lexical diversity
Language Testing, January 1, 2002; 19(1): 57 - 84.
[Abstract] [PDF]


Home page
Language TestingHome page
D. Malvern and B. Richards
Investigating accommodation in language proficiency interviews using a new measure of lexical diversity
Language Testing, January 1, 2002; 19(1): 85 - 104.
[Abstract] [PDF]


Home page
Journal of English LinguisticsHome page
D. Y. W. Lee
Defining Core Vocabulary and Tracking Its Distribution across Spoken and Written Genres: Evidence of a Gradience of Variation from the British National Corpus
Journal of English Linguistics, September 1, 2001; 29(3): 250 - 278.
[PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.