| W. A. Gale and K. W. Church. Poor estimates of context are worse than none. In Proceedings of DARPA Speech and Natural Language Workshop, Hidden Valley, Pennsylvania, June 1990. |
....0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 swallow crow eagle bird bug bee insect Prob. Figure 2.2: Word based distribution estimated using MLE. To overcome this problem, we can smooth the probabilities by resorting to statistical techniques (Jelinek and Mercer, 1980; Katz, 1987; Gale and Church, 1990; Ristad and Thomas, 1995) We can, for example, employ an extended version of the Laplace s Law of Succession (cf. Je#reys, 1961; Krichevskii and Trofimov, 1981) to estimate P (n v, r) as f(n v, r) 0.5 f(v, r) 0.5 where N denotes the size of the set of nouns. The results may ....
Gale, Williams A. and Kenneth W. Church. 1990. Poor estimates of context are worse than none. Proceedings of the DARPA Speech and Natural Language Workshop, pages 283--287.
....two content nouns of the hypothesised sentence : hbuoysi and hsandi are both things found near the sea side, but in this case, the semantic priming is unwarranted by the selectional restriction associated with the subject of the verb hto eati. 2. 3 Sparse Data The so called sparse data problem [44, 58, 67] can be thought of as a technical one : given the size and power of current computers, no corpus can be processed which satisfies the requirement that it be sufficiently representative of the underlying language of which it is supposed to be an example. There is also a theorerical dimension to ....
.... domain of trigram language modelling, use is made of a model which specifically concerns itself with the distribution of rare segments (hapax legomena) and words which do not appear in the training set, whilst a simpler sub model is applied for segments whose frequency statistics are more healthy [31, 58, 67, 73, 84, 86, 89, 114]. Generally, pure models exploit particular insights into the distribution of natural language utterances; these may sometimes come from theoretical discourses. If they are useful, they tend to model some aspect successfully. No theoretical insights exist today which lead to 30 models which ....
[Article contains additional citation context not shown here]
William A. Gale and Kenneth W. Church. Poor estimates of context are worse than none. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 283 -- 287, 1990.
....######### ######, we nd methods where the parameters of a single model are being adjusted to counter the e ect of sparse data, usually by taking some probability mass from seen events and reserving it for unseen events. This category includes methods such as ######## ######### ( Lidstone 1920, Gale and Church 1990] ########### ########## ( Good 1953, Gale and Sampson 1995] and various methods based on held out data and cross validation ( Jelinek and Mercer 1985, Jelinek 1997] In the second category, which we may call ########### #########, we nd methods for combining the estimates from several ....
Gale, W. A. and Church, K. W. (1990) Poor Estimates of Context Are Worse Than None. In ########### ## ### ###### ### ####### ######## ########, 283287. Morgan Kaufmann.
....smoothing proper, we nd methods where the parameters of a single model are being adjusted to counter the e ect of sparse data, usually by taking some probability mass from seen events and reserving it for unseen events. This category includes methods such as additive smoothing (Lidstone 1920, Gale and Church 1990), Good Turing estimation (Good 1953, Gale and Sampson 1995) and various methods based on held out data and cross validation (Jelinek and Mercer 1985, Jelinek 1997) In the second category, which we may call combinatory smoothing, we nd methods for combining the estimates from several models. ....
Gale, W. A. and Church, K. W. (1990). Poor Estimates of Context Are Worse Than None. In Proceedings of the Speech and Natural Language Workshop, 283{ 287. Morgan Kaufmann.
....Tagging This section reports statistical trigram tagging experiments performed with the English part of the CRATER corpus. We used a standard statistical approach [Rabiner, 1989] and apply the Viterbi algorithm [Viterbi, 1967] Sparse data is handled by estimated likelihood estimation (see e.g. [Gale and Church, 1990]) to avoid zero probabilities for transitions between tags we add 0.5 to each frequency before calculating the maximum likelihood estimation of transition probabilities. Unknown words are handled by analyzing the suffixes of length 3 of the unknown words. If there are other words in the lexicon ....
W. A. Gale and K. W. Church. Poor Estimates of Context are Worse than None. In Proc. of the Speech and Natural Language Workshop, pages 283--287, Hidden Valley, PA, 1990.
....smoothing proper, we nd methods where the parameters of a single model are being adjusted to counter the e ect of sparse data, usually by taking some probability mass from seen events and reserving it for unseen events. This category includes methods such as additive smoothing (Lidstone 1920, Gale and Church 1990), Good Turing estimation (Good 1953, Gale and Sampson 1995) and various methods based on held out data and cross validation (Jelinek and Mercer 1985, Jelinek 1997) In the second category, which we may call combinatory smoothing, we nd methods for combining the estimates from several models. ....
....word) we let Std(w) w u . 10 Many of these word forms do not occur in SUC at all, so some sort of exceptional treatment would have been necessary in any case. 11 This way of smoothing the maximum likelihood estimates is sometimes referred to as Expected Likelihood Estimation (ELE) cf. Gale and Church (1990). 12 One of the 23 categories, wh possessives (hs) was not represented at all in the test corpus. 13 The ratio between the two error types is 1:6 on average. 14 This is standard practice in Swedish elementary grammars, probably due to the large number of word forms that can be used in both ....
Gale, W. A. and Church, K. W. 1990. \Poor Estimates of Context Are Worse Than None." In Proceedings of the Speech and Natural Language Workshop, 283-287. Morgan Kaufmann.
....Here, we assume P has a posterior distribution. ffl Minmax Estimator (MM) r = r 0:5 p N . This method minimizes the maximum quadratic loss. ffl Good Turing Estimator (GT) r = r 1)N r 1 =N r . Gale and Church have investigated the use of these estimators. In their brilliant paper (Gale and Church, 1990), they use a simple n gram model of context for a spelling corrector. An enhanced version of GT method is said to have the best performance among those mentioned above. They then come to the conclusion poor estimates of contextual probabilities are worse than none. 4.3.3. Enhanced Good Turing ....
Gale, W. A. & Church, K. W. (1990). Poor Estimates of Context are Worse than None. In Proceedings of the 1990 DARPA Speech and Natural Language Workshop.
....there never will be enough data to train upon and hence the grammars will be undertrained. This follows from the generally accepted fact that natural languages are infinite. Clearly, any training set will be finite and hence the learner will need to deal with the resulting sparse statistics [19]. 2.2 Model based learning The other approach to grammar learning is model based (for example [4, 53] Model based (or deductive) learners are far less frequently used in NLP systems than data driven approaches. However, they are used by language acquisition theorists (for example [53] ....
William A. Gale and Kenneth W. Church. Poor estimates of context are worse than none. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 283--287, 1990.
....function useful in modeling future events. A number of techniques have been developed to refine the empirical estimation process by modeling the error associated with the probability estimates, and adjusting the estimates accordingly. Some of these techniques have been applied to CL ( 13] 6] [10], 11] 7 P 2 Omega f( is the sum of the values of f evaluated for all of the elements in Omega : 7 5 Properties of P The probability function isn t just any function that happens to have a oe field as a domain. It also has to obey three rules: ffl P(A) 0 for all elementary events A ....
....a conditional or a joint probability is that these marginal distributions are not independent. And if we cannot assume independence, probability theory does not dictate any method for estimating joint probabilities. In their illuminating paper, Poor Estimates of Context are Worse than None [10], Church and Gale demonstrate how invalid assumptions in the probability estimation process can produce results that are worse than having a random model (i.e. guessing at the answer) Based on this result, if there is not enough data to estimate a joint probability function for two events which ....
Gale, W. A. and Church, K. 1990. Poor Estimates of Context are Worse than None. In Proceedings of the June 1990 DARPA Speech and Natural Language Workshop. Hidden Valley, Pennsylvania. 18
.... For example, Church and Mercer show that to obtain reliable information about the adjective strong requires at least 46 million words [25] Of course, this is an upper bound, and less data could be used, assuming that smoothing techniques are employed to deal with the resulting sparse statistics [39]. Smoothing approaches, however, can only estimate the underestimated parameters of an inductively constructed language model. Therefore, in practise, the size of the training sets required will pose such a formidable computational task that the resulting grammar, even with smoothing, will be ....
William A. Gale and Kenneth W. Church. Poor estimates of context are worse than none. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 283--287, 1990.
....out to be relatively uncritical, and therefore to simplify parameter optimization were both set to the same value of 0.00002. Ongoing research shows that formula (6) has a number of weaknesses, for example that it does not discriminate words with co occurrence frequency zero, as discussed by Gale Church (1990) in a comparable context. However, since the results reported on later are acceptable, it probably gets the major issues right. One is, that subjects usually respond with common, i.e. frequent words in the free association task. The other is, that estimations of co occurrence frequencies for ....
Gale, W. A., Church, K. W. (1990). Poor estimates of context are worse than none. DARPA Speech and Natural Language Workshop, Hidden Valley, PA, June 1990, 283--287. Galton, F. (1880). Psychometric experiments. Brain 2, 149--162.
....must be combined with linguistic probabilities in a theoretically sound manner. The second problem with the current formulation is that there is no principled way to incorporate other sources of information about the best word sequence other than the tag information from the lexicon. Gale Church [4] have shown that improper estimates of contextual influence can actually affect the performance adversely. In our new formulation, which we discuss in Section 4, the recogniser s confidence values as well as the word tag correlations are captured in independent terms, and this allows us to ....
W.A. Gale and K.W. Church. "Poor estimates of context are worse than none". Proceedings of the DARPA Speech and Natural Language Workshop, pp. 283-287, 1990.
....quadruples appearing in test data had not been seen in training data) Even if f(v; n1; p; n2) 0, it may still be very low, and this may make the above MLE estimate inaccurate. Unsmoothed MLE estimates based on low counts are notoriously bad in similar problems such as n gram language modeling [GC90]. However later in this paper it is shown that estimates based on low counts are surprisingly useful in the PP attachment problem. 3.3 Previous Work Hindle and Rooth [HR93] describe one of the first statistical approaches to the prepositional phrase attachment problem. Over 200,000 (v; n1; p) ....
W. Gale and K. Church. Poor Estimates of Context are Worse than None. In Proceedings of the June 1990 DARPA Speech and Natural Language Workshop, Hidden Valley, Pennsylvania.
....senses from a machine readable dictionary. Taxonomies of this form might be used to replace PP complement heads and postmodified heads in corpus data with a smaller number of superordinate concepts. This would make the statistical data concerning trigrams of head preposition head less sparse (c.f. Gale Church, 1990) and easier to gather from a corpus. Nevertheless, it is only possible to gather such data from determinately syntactically analysed material. 7.3 Dealing with Undergeneration Although many missing rules can be identified during the interactive training phase, the probabilistic framework set out ....
Gale, W. & K. Church (1990) "Poor estimates of context are worse than none." In Proceedings of the DARPA Speech and Natural Language Workshop, Hidden Valley, PA, 283--287.
....accuracy. The number of parameters that need to be estimated for trigrams and higher n grams is enormous and may greatly exceed the size of the available text corpora. This can result in an inadequate model due to sparse data. Various methods have been proposed to deal with this issue [42] 13] [22] [49] with some degree of success. However, n grams model the language on a superficial level, that captures very little of the linguistic constraints typical of any natural language. N gram models have also been used for other natural language tasks. Church [13] developed a probabilistic program ....
W. A. Gale and K. W. Church. Poor estimates of context are worse than none. In Proceedings of DARPA Speech and Natural Language Workshop, Hidden Valley, Pennsylvania, June 1990.
No context found.
W. A. Gale and K. W. Church. Poor estimates of context are worse than none. In Proceedings of DARPA Speech and Natural Language Workshop, Hidden Valley, Pennsylvania, June 1990.
No context found.
Gale, W. A. and Church, K. 1990. Poor Estimates of Context are Worse than None. In Proceedings of the Jmte 1990 DARPA Speech and Natural Language Workshop. Hidden Valley, Pennsylvania.
No context found.
W.A. Gale and K.W. Church. "Poor estimates of context are worse than none". Proceedings of the DARPA Speech and Natural Language Workshop, pp. 283-287, 1990. (a) Word n-gram (b) Word-Tag (c) Word-Tag-Score (d) Syntax-Semantics-Signal
No context found.
Cognition, 6, 291-325. Gale, W. & Church, K. (1990). Poor estimates of context are worse than none. In Proceedings of the June 1990 DARPA Speech and Natural Language Workshop. Hidden Valley, PA.
No context found.
William A. Gale & Kenneth W. Church (1990). Poor estimates of context are worse than none, in Proceedings of the June 1990 DARPA Speech and Natural Language Workshop, San Mateo: Morgan Kaufmann, 283-287.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC