© 2004 by Association for Literary & Linguistic Computing
Frequency and Function of Characters Used in the Bangla Text Corpus
1 Indian Statistical Institute, India
Empirical analysis of any natural language needs to be substantiated with the statistical findings because without adequate knowledge from statistics any linguistic study can fall into the quicksand of mistaken data handling and false observation. Recent introduction of various sub-disciplines (computational linguistics, corpus linguistics, forensic linguistics, applied linguistics, lexicology, stylometrics, lexicography, and language teaching, etc.) requires various statistical results of language properties to understand the language as well as to design sophisticated tools and software for language technology. Keeping this in mind, we present here some simple frequency counts of characters found in the Bangla text corpus. Also, we empirically evaluate their functional behaviours in the language with close reference to the corpus. Here we verify previously made observations, as well as make some new observations required for various works of language technology in Bangla.