13 citations found. Retrieving documents...
W. A. Gale and G. Sampson. Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics, 2:217--237, 1995.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Induction of random fields in NLP - Clark (1998)   (Correct)

....solutions. The first is to have some kind of smoothing of the empirical distribution so that p assigns some non zero probability to all configurations in the domain of the null field. This is the solution that Sampson recommends using a Good Turing smoothing technique( Sam96] p 137 146, GS96] Otherwise we will be forced to assign a weight of = Gamma1 or fi = 0 which will assign a probability of zero to all trees with this configuration. This is undesirable because the configuration may be absent from the corpus because of sparse data problems, not because it is extremely low ....

.... p . The reason for this is just that with a fairly underspecified field, the range of possible configurations is quite large compared to the number of samples produced by the Monte Carlo algorithm. The obvious solution is to smooth the samples using for example the Simple Good Turing algorithm[GS96] It was not possible to implement this in the time available, so I merely aborted the re estimation process when this occurred. 36 Chapter 9 Evaluation The major problems encountered are in the efficiency of the MCMC samplers used to estimate the field feature expectations. MCMC techniques ....

W Gale and G Sampson. Good-Turing frequency estimation without tears. Cognitive Science Research Paper CSRP 407, University of Sussex, 1996.


Logic Programming Tools for Probabilistic Part-of-Speech Tagging - Nivre (2000)   (Correct)

....a single model are being adjusted to counter the e ect of sparse data, usually by taking some probability mass from seen events and reserving it for unseen events. This category includes methods such as ######## ######### ( Lidstone 1920, Gale and Church 1990] ########### ########## ( Good 1953, Gale and Sampson 1995] and various methods based on held out data and cross validation ( Jelinek and Mercer 1985, Jelinek 1997] In the second category, which we may call ########### #########, we nd methods for combining the estimates from several models. The most well known methods in this category are probably ....

....the variable #. In practice, there is no way of precisely calculating expected frequencies of frequencies, and di erent versions of Good Turing estimation di er mainly in the way they estimate these values from the observed frequencies of frequencies (see, e.g. Good 1953, Church and Gale 1991, Gale and Sampson 1995] Good Turing estimation has been shown to give good results for the lexical model in part of speech tagging ( Nivre 2000] Back o Smoothing The basic idea in back o smoothing is to use the basic MLE model for events which are frequent enough in the training data to have reliable estimates ....

[Article contains additional citation context not shown here]

Gale, W. A. and Sampson, G. (1995) Good-Turing Frequency Estimation Without Tears. ####### ## ############ ########### 2, 217237.


Sparse Data and Smoothing in Statistical Part-of-Speech Tagging - Nivre (2000)   (Correct)

....of a single model are being adjusted to counter the e ect of sparse data, usually by taking some probability mass from seen events and reserving it for unseen events. This category includes methods such as additive smoothing (Lidstone 1920, Gale and Church 1990) Good Turing estimation (Good 1953, Gale and Sampson 1995), and various methods based on held out data and cross validation (Jelinek and Mercer 1985, Jelinek 1997) In the second category, which we may call combinatory smoothing, we nd methods for combining the estimates from several models. The most well known methods in this category are probably ....

....the variable X . In practice, there is no way of precisely calculating expected frequencies of frequencies, and di erent versions of Good Turing estimation di er mainly in the way they estimate these values from the observed frequencies of frequencies (see, e.g. Good 1953, Church and Gale 1991, Gale and Sampson 1995). 2.3.3 Back o Smoothing The basic idea in back o smoothing is to use the basic MLE model for events which are frequent enough in the training data to have reliable estimates and to back o to a more general model for rare events, i.e. back o to a model where distinct outcomes in the rst ....

[Article contains additional citation context not shown here]

Gale, W. A. and Sampson, G. (1995). Good-Turing Frequency Estimation Without Tears. Journal of Quantitative Linguistics 2, 217-237.


Tagging a Corpus of Spoken Swedish - Nivre, Grönqvist (2001)   (Correct)

....of a single model are being adjusted to counter the e ect of sparse data, usually by taking some probability mass from seen events and reserving it for unseen events. This category includes methods such as additive smoothing (Lidstone 1920, Gale and Church 1990) Good Turing estimation (Good 1953, Gale and Sampson 1995), and various methods based on held out data and cross validation (Jelinek and Mercer 1985, Jelinek 1997) In the second category, which we may call combinatory smoothing, we nd methods for combining the estimates from several models. The most well known methods in this category are probably ....

....the variable X . In practice, there is no way of precisely calculating expected frequencies of frequencies, and di erent versions of Good Turing estimation di er mainly in the way they estimate these values from the observed frequencies of frequencies (see, e.g. Good 1953, Church and Gale 1991, Gale and Sampson 1995). In our case, the reestimated frequencies have been calculated using the simple Good Turing method (Gale and Sampson 1995) in Dan Melamed s implementation. 8 We refer to the probability estimates derived in this way as the SUC lexical probabilities. In this model, unknown words (i.e. word ....

[Article contains additional citation context not shown here]

Gale, W. A. and Sampson, G. 1995. \Good-Turing Frequency Estimation Without Tears." Journal of Quantitative Linguistics 2, 217-237.


Experiments on Sentence Boundary Detection - Stevenson, Gaizauskas (2000)   (5 citations)  (Correct)

....been used rather than the tagging in the corpus as such reliable annotations may not be available for the output of an ASR system. Thus, the current experiments should be viewed as making an optimistic assumption. 6 We attempted to smooth these probabilities using GoodTuring frequency estimation (Gale and Sampson, 1996) but found that it had no effect on the final results. Position Feature 1 Preceding word 2 Probability preceding word ends a sentence 3 Part of speech tag assigned to preceding word 4 Probability that part of speech tag (feature 3) is assigned to last word in a sentence 5 Flag indicating ....

W. Gale and G. Sampson. 1996. Good-Turing frequency estimation without tears. Journal of Quantitave Linguistics, 2(3):217--37.


Experiments on Sentence Boundary Detection - Stevenson, Gaizauskas (2000)   (5 citations)  (Correct)

....Ratnaparkhi [14] Section 2) argued that a context of one word either side is sufficient for the punctuation disambiguation problem. However, the results of our system suggest that this may 5 We attempted to smooth these probabilities using Good Turing frequency estimation (see Gale and Sampson [9]) but found that it had no effect on the final results. 5 Position Feature 1 Preceding word 2 Probability preceding word ends a sentence 3 Part of speech tag assigned to preceding word 4 Probability that part of speech tag (feature 3) is assigned to last word in a sentence 5 Flag indicating ....

W. Gale and G. Sampson. Good-Turing frequency estimation without tears. Journal of Quantitave Linguistics, 2(3):217--37, 1996.


A Maximum Likelihood Ratio Information Retrieval Model - Ng (1999)   (20 citations)  (Correct)

....r # in (15) can become problematic if Nr =0for some r. As a result, it is necessary to pre smooth Nr so that it never equals zero. There are many different possible smoothing methods and each gives rise to a slightly different GT approach. We use the Simple Good Turing (SGT) approach described in [5]. Basically Nr is linearly smoothed (in the log domain) and a decision rule is used to decide when to switch from using the observed Nr values to the smoothed values. Unlike the estimate for p(t) the quantity pml (t D i ) is likely to be poorly estimated regardless of the size of the document ....

W. A. Gale and G. Sampson, "Good-Turing frequency estimation without tears," Journal of Quantitative Linguistics, no. 2, pp. 217--237, 1995.


Analysis, Visualization and Meta-analysis of Functional.. - Nielsen (1999)   (Correct)

....regression, and usually with a logistic sigmoid function on the output. In the notation of (Bishop 1995a) y = g X i OE i (x i ) w 0 (A. 5) Gibbs sampling: Good Turing frequency estimation: A type of regularized frequency estimation (Good 1953) Often used in word frequency analysis (Gale and Sampson 1995). greedy clustering: Guttman s point alienation: Rank order statistics. Guttman 1978) Harris recurrence: Hebbian learning: heteroassociation: identi cation: estimation) Imax: Algorithms maximizing the mutual information (between outputs) Term used in Becker and Hinton (1992) infomax: ....

Gale, W. A. and G. Sampson (1995). Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics 2, 217237.


An Empirical Study of Smoothing Techniques for Language Modeling - Chen, Goodman (1998)   (118 citations)  (Correct)

....are used in the above equation. In other words, we use the empirical values of n r to estimate what their expected values are. The Good Turing estimate cannot be used when n r = 0; it is generally necessary to smooth the n r , e.g. to adjust the n r so that they are all above zero. Recently, Gale and Sampson (1995) have proposed a simple and effective algorithm for smoothing these values. In practice, the Good Turing estimate is not used by itself for n gram smoothing, because it does not include the combination of higher order models with lower order models necessary for good performance, as discussed in ....

Gale, William A. and Geoffrey Sampson. 1995. Good-Turing frequency estimation without tears.


Log-Linear Interpolation of Language Models - Gutkin (2006)   (Correct)

No context found.

W. A. Gale and G. Sampson. Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics, 2:217--237, 1995.


Log-Linear Interpolation of Language Models - Gutkin (2000)   (Correct)

No context found.

W. A. Gale and G. Sampson. Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics, 2:217--237, 1995.


An Empirical Study of Smoothing Techniques for - Language Modeling And   (Correct)

No context found.

Gale, William A. and Geoffrey Sampson. 1995. Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics, 2(3). To appear.


Building Probabilistic Models for Natural Language - Chen (1996)   (18 citations)  (Correct)

No context found.

William A. Gale and Geoffrey Sampson. 1995. Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics, 2(3). To appear.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC