An empirical study of smoothing techniques for language modeling - Joshua Goodman en Stanley Chen
Datum: Zaterdag 17 januari @ 15:50:01 GMT+1
Onderwerp: Literatuur


An empirical study of smoothing techniques for language modeling van Joshua Goodman and Stanley Chen uit 1996 (In Proceedings of the 34th Annual Meeting of the ACL, pages 310-318, Santa Cruz, California, June 1996) maakt deel uit van het studiemateriaal bij de cursus Probabilistic Models of Natural Language Processing die wordt gegeven door Khalil Sima'an van het ILLC te Amsterdam in het kader van de LOT Winter School 2004.

In Goodman's eigen woorden:
"We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods."





Dit artikel komt van marco@work
http://marco.info/pro

De URL voor dit verhaal is:
http://marco.info/pro/modules.php?name=News&file=article&sid=35