| T. Pedersen, R. Bruce, and J. Wiebe, "Sequential model selection for word sense disambiguation", Proceedings of the 1997. |
....sense disambiguation that seem to be promising candidates for the detection of inadvertent semantic errors. The references are Schutze (1992, 1998) Yarowsky (1992, 1995) Liddy and Paik (1993) Weischedel, Meteer, Schwartz, Ramshaw, and Palmucci (1993) Bruce and Wiebe (1994, 1999) Lin (1997) Pedersen, Bruce, and Wiebe (1997), Karov and Edelman (1998) Leacock, Chodorow, and Miller (1998) Pedersen and Bruce (1998) Towell and Voorhees (1998) Pedersen (1999) and Whaley (1999) For a reasonable evaluation of the cited methods, we suggest that a given scheme not be required to determine the sense of a given word. ....
....xm of the above rules to be v i , v j , w j , and w i , respectively, and selects the applicable texts. The characteristic vectors needed in Algorithm REGULAR WORD TEST and Algorithm SPECIAL WORD TEST are constructed analogously. Some previous work (for example, see Bruce and Wiebe (1994, 1999) Pedersen, Bruce, and Wiebe (1997), Pedersen and Bruce (1998) Pedersen (1999) uses similar encodings where words, parts of speech, and morphological features near a given word instance are recorded. Here, our list of parts of speech has 46 items that accommodate all morphological subcases. Ignoring that minor variation, the main ....
Pedersen, T., Bruce, R., and Wiebe, J., Sequential model selection for word sense disambiguation, Proceedings of the 1997 Conference on Applied Natural Language Processing (ANLP-97). Washington, D.C., 1997, 388--395.
....of features and correspondingly sparse data. It is only viable to extend the repertoire of features if one also introduces methods for determining which are salient for each word. Papers exploring this route in different ways are (Hearst, 1991; Leacock, Towell, and Vorhees, 1993; Yarowsky, 1995; Pedersen, Bruce, and Wiebe, 1997). Note that if one sees the lexicon generation phase of c WSD as a one off, resource development activity, it becomes viable to spend substantially longer on it than if it is seen as a regularly repeated compile time activity. 7 Relation to theme of workshop, and way ahead If lexical resources ....
Pedersen, Ted, Rebecca Bruce, and Janyce Wiebe. 1997. Sequential model selection for word sense disambiguation. In Proc. Fifth Conference on Applied Natural Language Processing, pages 388--395, Washington DC, April. ACL.
....a. The weights in the maximum entropy model are somewhat finer grained than in the interpolation model, which associates weights with only the predicates, and not the outcomes. 3. 6 Decomposable Models Decomposable models have been used for word sense disambiguation in [Bruce and Weibe, 1994, Pedersen et al. 1997] and also for prepositional phrase attachment in [Kayaalp et al. 1997] Such models can be expressed as a product of the marginal probabilities of the interdependent variables, scaled by the marginal probabilities of the variables that are common to two or more terms. In our notation, if we are ....
....marginal probabilities p 1 ; p 2 ; p 3 can be obtained directly from the counts in the training data. In order to compute the joint probability given by a decomposable model, the interdepencies of the contextual predicates must either be known a priori, or must be induced automatically, as in [Pedersen et al. 1997]. Furthermore, the contextual predicates may be interdependent in such a way that prohibits further decomposition of the joint probability. Maximum entropy models differ from decomposable models in how they handle interdependence among the features. In the maximum entropy framework, ....
Pedersen, T., Bruce, R., and Wiebe, J. (1997). Sequential Model Selection for Word Sense Disambiguation. In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 388--395, Washington D.C.
....and model fit. We present a number of different approaches to locating such models. Sequential model selection finds a single parametric form that is judged to achieve the best balance between model complexity and fit for a given corpus of text. We extend this methodology with the Naive Mix (Pedersen Bruce 1997), an averaged probabilistic model based on the sequence of parametric forms generated during a sequential model search. This paper includes an experimental comparison of these approaches and discusses possible further extensions to these methodologies. Word Sense Disambiguation This paper ....
....We employ both backward sequential search (Wermuth 1976) and forward sequential search (Dempster 1972) as search strategies. Backward sequential search for probabilistic models of word sense disambiguation was introduced in (Bruce Wiebe 1994) while forward sequential search was introduced in (Pedersen, Bruce, Wiebe 1997). Forward searches evaluate models of increasing complexity based on how much candidate models improve upon the fit of the current model, while backward searches evaluate candidate models based on how much they degrade the fit of the current model. A forward sequential search begins by ....
Pedersen, T.; Bruce, R.; and Wiebe, J. 1997. Sequential model selection for word sense disambiguation. In Proceedings of the Fifth Conference on Applied Natural Language Processing, 388--395.
....criterion judges how well the model characterizes the data in the training sample. We use Akaike s Information Criteria (AIC) Akaike 1974) as the evaluation criterion based on the results of an extensive comparison of search strategies and selection criteria for model selection reported in (Pedersen, Bruce, Wiebe 1997). Search Strategy BSS begins by designating the saturated model as the current model. A saturated model has complexity level c = n(n Gamma1) 2 , where n is the number of feature variables. At each stage in BSS we generate the set of decomposable models of complexity level c Gamma 1 that can ....
Pedersen, T.; Bruce, R.; and Wiebe, J. 1997. Sequential model selection for word sense disambiguation. In Proceedings of the Fifth Conference on Applied Natural Language Processing.
....maximal cliques in the graph of the model. Model selection integrates a search strategy with an evaluation criterion. The search strategy determines which decomposable models are evaluated during the selection process. The evaluation criterion measures the fit of each model to the training sample. (Pedersen, Bruce, Wiebe 1997) report that the strategy of forward sequential search (FSS) and evaluation by Akaike s information criteria (AIC) selects models that serve as accurate classifiers for word sense disambiguation. Here, this combination is shown to result in Naive Mixes that improve the accuracy of disambiguation ....
....a variety of machine learning algorithms. The average accuracy over twelve words for the Naive Mix is 85 while a decision tree (C4.5) achieves 86 , rule induction (CN2) 84 , nearest neighbor classification (PEBLS) 84 , and Naive Bayes classification 85 . These results are discussed in detail in (Pedersen Bruce 1997). Acknowledgments This research was supported by the Office of Naval Research under grant number N00014 95 1 0776. ....
Pedersen, T.; Bruce, R.; and Wiebe, J. 1997. Sequential model selection for word sense disambiguation. In Proceedings of the Fifth Conference on Applied Natural Language Processing.
....a sentence. Model selection is a supervised method of inducing a probabilistic model from sense tagged text. We extend this approach in two ways. The search strategies and evaluation criteria used to select the parametric form of a probabilistic model for WSD are expanded and critically evaluated (Pedersen, Bruce, Wiebe 1997). The Naive Mix, a new supervised learning algorithm that builds an averaged probabilistic model, is introduced and shown to be competitive with well known machine learning algorithms (Pedersen Bruce 1997) In the absence of sense tagged text, the sense of an ambiguous word is treated as a ....
.... form of a probabilistic model for WSD are expanded and critically evaluated (Pedersen, Bruce, Wiebe 1997) The Naive Mix, a new supervised learning algorithm that builds an averaged probabilistic model, is introduced and shown to be competitive with well known machine learning algorithms (Pedersen Bruce 1997). In the absence of sense tagged text, the sense of an ambiguous word is treated as a feature with a missing value. The observable features are those that can be automatically identified such as part of speech, morphology, and collocations. We perform WSD via two unsupervised learning ....
Pedersen, T.; Bruce, R.; and Wiebe, J. 1997. Sequential model selection for word sense disambiguation. In Proceedings of the Fifth Conference on Applied Natural Language Processing.
.... text with sense information can improve document classification (e.g. 33] 32] and smooth the path for Web mining applications such as those described in [8] Word sense disambiguation has commonly been cast as a problem in supervised learning (e.g. 2] 35] 36] 16] 3] 21] 22] [23]) However, these methods require text where ambiguous words have been manually tagged with sense information to train the learning algorithm. Such data exists only in very small quantities and is very expensive to create. Rather than assuming the availability of sense tagged text, it seems more ....
....Each sentence containing the ambiguous word is reduced to a vector of features (i.e. an observation) and the corpus of text under study is reduced to a matrix of observations. 2.1. 1 Feature Sets A and B Feature set A has been used in a variety of supervised learning experiments (e.g. 3] 22] [23]) A sentence with an ambiguous word is represented by a feature set composed of three types of contextual features: one morphological feature, four part of speech (POS) features, and three collocation features. The morphological feature indicates if the ambiguous noun is plural or not. The POS ....
[Article contains additional citation context not shown here]
T. Pedersen, R. Bruce, and J. Wiebe. Sequential model selection for word sense disambiguation. In Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington, DC, 1997.
....not tractable. We must develop a search strategy to guide the learning algorithm through the space of possible networks and an evaluation criterion to measure the acceptability of a network, usually in terms of how closely the network characterizes or fits the training data. Our previous work ((Pedersen, Bruce, Wiebe 1997), Pedersen Bruce 1997b) utilized sequential search strategies and information criteria to select probabilistic classifiers for word sense disambiguation. Here we extend those methodologies to learn belief networks that will support inference on any variable in the domain, not just a single ....
....search strategy to guide the learning algorithm through the space of possible networks and an evaluation criterion to measure the acceptability of a network, usually in terms of how closely the network characterizes or fits the training data. Our previous work ( Pedersen, Bruce, Wiebe 1997) (Pedersen Bruce 1997b) utilized sequential search strategies and information criteria to select probabilistic classifiers for word sense disambiguation. Here we extend those methodologies to learn belief networks that will support inference on any variable in the domain, not just a single classification variable. ....
[Article contains additional citation context not shown here]
Pedersen, T.; Bruce, R.; and Wiebe, J. 1997. Sequential model selection for word sense disambiguation. In Proceedings of the Fifth Conference on Applied Natural Language Processing, 388--395.
....value of the classification feature, i.e. the sense of the ambiguous word. This assumption is based on the suc cess of the Naive Bayes model when applied to supervised word sense disambiguation (e.g. Gale, Church, and Yarowsky, 1992) Leacock, Towell, and Voorhees, 1993) Mooney, 1996) (Pedersen, Bruce, and Wiebe, 1997), Pedersen and Bruce, 1997a) There are two potential problems when using the EM algorithm. First, it is computationally expensive and convergence can be slow for problems with large numbers of model parameters. Unfortunately there is little to be done in this case other than reducing the ....
....i.e. the sense of the ambiguous word. This assumption is based on the suc cess of the Naive Bayes model when applied to supervised word sense disambiguation (e.g. Gale, Church, and Yarowsky, 1992) Leacock, Towell, and Voorhees, 1993) Mooney, 1996) Pedersen, Bruce, and Wiebe, 1997) (Pedersen and Bruce, 1997a) There are two potential problems when using the EM algorithm. First, it is computationally expensive and convergence can be slow for problems with large numbers of model parameters. Unfortunately there is little to be done in this case other than reducing the dimensionality of the problem so ....
[Article contains additional citation context not shown here]
Pedersen, T., R. Bruce, and J. Wiebe. 1997. Sequential model selection for word sense disambiguation.
No context found.
T. Pedersen, R. Bruce, and J. Wiebe, "Sequential model selection for word sense disambiguation", Proceedings of the 1997.
No context found.
T. Pedersen, R. Bruce, and J. Wiebe, "Sequential model selection for word sense disambiguation", Proceedings of the 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC