Skip Navigation

Literary and Linguistic Computing 1996 11(3):141-146; doi:10.1093/llc/11.3.141
© 1996 by Association for Literary & Linguistic Computing
This Article
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Han, Y.
Right arrow Articles by Choi, K-S
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

An upper bound estimate for the entropy of Korean texts

YS HanZ, HR Park, JH Shin and K-S Choi

The University of Suwon, Korea, Korea Research and Development Information Centre Z Corresponding author at: Computer Science Department, Suwon University, Suwon PO Box NN-N8 440-600, Korea. Email: yshan@csking.kaist.ac.kr

The entropy of printed languages suggests how predictable the language usages are and how efficiently the printed texts can be handled in text processing. In this paper, for the first time we present an upper bound estimate of the entropy for printed Korean. We obtained 6.01 bits for each Korean syllable. The method to compute the entropy makes use of a stochastic language model for Korean whose probabilistic parameters are estimated from a sample of 5.5 million word-phrases. The stochastic model was designed to best utilize the structure of Korean. An entropy estimate is computed by running the stochastic model on a sample of 1.45 million units that is carefully arranged to represent a wide range of printed Korean styles.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.