Literary and Linguistic Computing Advance Access originally published online on May 2, 2007
Literary and Linguistic Computing 2007 22(2):167-186; doi:10.1093/llc/fqm008
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Trees and After: The Concept of Text Topology. Some Applications to Verb-Form Distributions in Language Corpora
Université de Nice-Sophia Antipolis, France
Université de Liège, UMR 6039 Bases, Corpus et Langage, Belgium
Correspondence: Michel Juillard, Université de Nice, 98 boulevard E. Herriot, BP 3209, F 06204 NICE cedex 3. E-mail: juillard{at}unice.fr
| Abstract |
|---|
The model described here relies on the key concepts of topology, i.e. neighbourhood and equivalence of shape. A linguistic object L is studied in text T by means of one or several local questions Q. The set of successive local answers is processed so as to provide a global function characterizing the textual space under scrutiny. We begin with short sequences of tenses to illustrate the way in which to explore originally Emile Benveniste's concepts of history and discourse.1 We then supply life-size examples of other objects selected for their heuristic value. We go on to demonstrate the model at work on the distribution of strings of finite (F) and non-finite (n) verbal forms in the LOB Corpus of English. A topological chart is produced as the synthetic image mirroring the locations of the relevant linguistic entities throughout the text. All the individual strings concatenating any number of F and n are classified in a table. Alternatively, individual full-text strings can be extracted. We then proceed to refine the notion of lexical distribution in rafales in a lemmatized corpus of Latin texts, the purpose being to test the stability of the distributions in individual texts of selected verbs and assess whether a verb's behaviour is related to its semantic status. The final section is devoted to other Latin texts. The use of segments of equal length makes it possible to draw up the narrative profile of each author as revealed by his handling of tenses in main clauses.