Literary and Linguistic Computing Advance Access originally published online on January 12, 2009
Literary and Linguistic Computing 2009 24(4):435-447; doi:10.1093/llc/fqn044
| ||||||||||||||||||||||||||||||||||||||||||||||||
Lexical Diversity in a Literary Genre: A Corpus Study of the
gveda
St. Petersburg State University
Correspondence: Alexandre Sotov St. Petersburg State University, St. Petersburg, Russia E-mail: a.sotov{at}yahoo.co.uk
| Abstract |
|---|
This research1 evaluates the extent to which lexical diversity, measured by frequent content words, hapax legomena, and type-token ratios (TTRs), is dependent on three features of the genre of the oral Indo-Aryan cultic poetry represented by the literary corpus of the
gveda (ca. 165,000 tokens): characteristic choice of subject matter, usage of refrains, and the attribution of hymns to distinct poetic collectives. Analysis of 255 texts of 200 tokens showed that hymns on popular topics and where refrains were attested have a significantly higher rate of high-frequency content words and a lower ratio of once-occurring types. A higher TTR is observed in the hymns of specific family origin. Complexity of genre can be interpreted as a result of different discourse strategies of the poets. Overall, conservative mythological texts are characterized by regularity in word usage. Occurrence of content words, in the entire corpus, with lexemes denoting deities on the one side and nature on the other is accounted for by the factor of semantics, which deals with the structure of narrative.