| Dagan, Ido, Shaul Marcus, and Shaul Markovitch (1993). "Contextual Word Similarity and Estimation from Sparse Data", in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL93). Columbus, Ohio. |
....Bayesian estimate with Je#rey s Prior being the prior probability. other conditional probabilities under contexts of similar words using similarities as the weights. Note that the equation #(v, v # ) 1 must hold. The advantage of this approach is that it relies only on corpus data. cf. (Dagan, Marcus, and Makovitch, 1992; Dagan, Pereira, and Lee, 1994; Dagan, Pereira, and Lee, 1997) 2.2.3 Class based approach A number of researchers have proposed to employ class based models, which use classes of words rather than individual words. An example of a class based approach is Resnik s method of learning case ....
Dagan, Ido, Shaul Marcus, and Shaul Makovitch. 1992. Contextual word similarity and estimation from sparse data. Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pages 164--171.
....stimulus arguments against the possibility of learning language structure. The representations can also be used in cognitive models and in theories of learning. The techniques have already been successfully applied in language technology to achieve semantic disambiguation and document retrieval [3, 14]. It will be interesting to see whether the interdisciplinary study of exactly how these techniques work will reap benefits for both cognitive science and practical information technologies. The idea of using distributional statistics to examine aspects of lexical semantics comes from the ....
Dagan, I., Marcus, S. & Markovitch, S. (1993). Contextual word similarity and estimation from sparse data. in Proceedings of the 31 st Annual Meeting of the ACL, 164-171.
.... by hand, but this should be supported by some use of thesaurus information to try to form hypotheses about what semantic classes of nominals can occur in argument slots (and possibly form classes of verbs) It is possible that this thesaurus could be based on automatic classification (e.g. Dagan et al., 1993; Grishman and Sterling, 1992, 1994) but we think that for the quantity of information likely to be available use of an existing semantic taxonomy would be more effective. WordNet (Miller et al., 1990) is one possible public domain source of such a taxonomy which has been used to effect (e.g. ....
Dagan, I., Marcus, S. and Markovitch, S. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Assoc. for Computational Linguistics, 31--37. Columbus, Oh..
....adds items dynamically, either as relevant terms are found in training corpora, or through a pair of auxiliary external tools that provide a link to . WordNet, the pre existing domain independent hierarchy, and . tables of domain specific word similarities, based on co occurrence statistics, [12]. The parent class of the node Operation (upper right corner) is C Company and it includes many related terms. This class also includes unspecified types of businesses identified only by a proper name a name which was determined to refer to a company by low level name identification ....
....are more densely distributed among the relevant documents than overall. The correlation score is computed by the same scoring scheme as detailed below in section 7.5, using a fixed cuto#. Several other groups have explored induction of semantic classes through analysis of syntactic co occurrence, [46, 43, 12, 25], though in our case, the contexts are limited to selected syntactic constructs which are relevant to the scenario. 7.4 Indexing From the primary triples consisting of the main three clausal arguments, we build an inverted index back into the parsed documents. Each clause annotation produced by ....
Ido Dagan, Shaul Marcus, and Shaul Markovitch. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting of the Assn. for Computational Linguistics, pages 31--37, Columbus, OH, June 1993.
....the stimulus arguments against the possibility of learning language structure. The representations can also be used in cognitive models and in theories of learning. The techniques have already been successfully used in language technology to achieve semantic disambiguation and document retrieval [3, 14]. It is interesting to see whether the interdisciplinary study of exactly how these techniques work will reap benefits for both cognitive science and practical information technologies. The idea of using distributional statistics to examine aspects of lexical semantics comes from the intuition ....
Dagan, I., Marcus, S. & Markovitch, S. (1993). Contextual word similarity and estimation from sparse data. in Proceedings of the 31 st Annual Meeting of the ACL, 164-171.
....[n(C location) based] n(C company) n(C co descrip) The semantic hierarchy is scenario speci c. It is built up dynamically through tools that draw on pre existing domain independent hierarchies, such as WordNet, as well as domain speci c word similarity measures and co occurrence statistics [4]. Slot Value class Predicate Start Job company entity = E.I.S. person entity = Garrick position entity = president Figure 7: Event LF corresponding to a clause By a similar process, we can now acquire a clausal pattern from the example in gure 3 at the beginning of this section. ....
Ido Dagan, Shaul Marcus, and Shaul Markovitch. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting of the Assn. for Computational Linguistics, pages 31-37, Columbus, OH, June 1993.
....English word senses in unrestricted text using statistical models of the major Roget s Thesaurus categories. Roget s categories serve as approximations of conceptual classes[Yarowsky 92] Dagan use the similarity between mutual information values and the relative entropy to find similar words[Dagan 93, Dagan 94] Besides, there are some approaches using verb direct object distributional information to estimate the similarity between words[Pereira 93, Cho 95] In the previous researches mentioned so far, the similarity between words is used to classify words and to compensate for the lack of ....
Ido Dagan, Shaul Marcus and Shaul Markovitch. "Contextual Word Similarity and Estimation from Sparse Data", In Proceedings of the 31th Meeting of the ACL, pp. 164--171, 1993.
....is a sufficiently large corpus texts of the language. It can be used for both morphological disambiguation and for the discrimination between the meanings within the same part of speech. DEFINITIONS The Similarity Metric We use the pairwise similarity metric sim(w 1 #w 2 ) introduced in (Dagan et al. 1995), which measures the tendency of the words w 1 and w 2 to occur in similar contexts. The metric is based on averaging the similarityofmutual information values of w 1 and w 2 with other words in the lexicon. The value of sim(w 1 #w 2 )varies between 0 (totally dissimilar) and 1 (absolutely similar ....
.... method is extremely expensive for a large scale lexicon (e.g. about 100,000 words) even if we intend to compute all similarities off line and then store the results in the lexicon (the complexity of doing so for all the words in the lexicon is O(l 3 ) To handle this problem, Dagan et al. (Dagan et al. 1995) developed a technique that approximates the original similarity method, and requires a considerably smaller amount of computation. In presenting this technique, we will use the 3 Similar words sim add 0.1085 ask 0.1031 forget 0.1010 observe 0.0983 notice 0.0935 complain 0.0934 remember ....
[Article contains additional citation context not shown here]
Dagan, I., Marcus, S., and Markovitch, S. (1995). Contextual word similarity and estimation from sparse data. Computer Speech and Language, 9:123--152.
....requires more complex grammars, in which derivation steps are associated to possible relations of interest. Even in language modeling, distributional regularities associated to meaningful relationships may be an important source of additional predictive power (Hindle, 1990; Hindle Rooth, 1991; Dagan, Markus, et al. 1993; Lafferty, Sleator, et al. 1992) Grammatical representations of meaningful relationships may be usefully classified into three main classes: linguistic grammars, task oriented grammars and data oriented grammars. Linguistic grammars and task oriented grammars have been in use since the ....
Dagan, I., Markus, S., and Markovitch, S. (1993). Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 164--171, Ohio State University.
....et al., 1994) and is applicable to other problems, such as choosing between possible target representations in a machine translation system. Finally it would be interesting to combine the work on semantic collocation functions with that on similarity based clustering (Pereira, Tishby and Lee 1993; Dagan, Marcus and Markovitch 1993) with the aim of overcoming the problem of sparse training data. If this is successful, it might make these functions suitable for disambiguation in domains with larger vocabularies than ATIS. Acknowledgements We would like to thank Manny Rayner for many useful suggestions in carrying out this ....
Dagan, I., S. Marcus and S. Markovitch. 1993. "Contextual Word Similarity and Estimation from Sparse Data". Proceedings of the 31st meeting of the Association for Computational Linguistics, ACL, 164--171.
....the actual cooccurrences in the training set. This can be done by smoothing the observed frequencies (Church and Mercer, 1993) or by class based methods (Brown et al. 1991; Pereira and Tishby, 1992; Pereira, Tishby, and Lee, 1993; Hirschman, 1986; Resnik, July 1992; Brill et al. June 1990; Dagan, Marcus, and Markovitch, 1993). In comparison to these approaches, we use similarity information throughout training, and not merely for estimating cooccurrence statistics. This allows the system to learn successfully from very sparse data. 3.6. Summary We have described an approach to WSD that combines a corpus and a MRD to ....
Dagan, I., S. Marcus, and S. Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting of the ACL, pages 164--174.
....see Cover and Thomas (1991) and Kullback (1959) for general information, Acz el and Dar oczy (1975) for an axiomatic development, and R enyi (1970) for a description of information theory that uses the KL divergence as a starting point. Some authors (Brown et al. 1992; Church and Hanks, 1990; Dagan, Marcus, and Markovitch, 1995; Luk, 1995) use the mutual information, which is the KL divergence between the joint distribution of two random variables and their product distributions. Let A and B be two random variables with probability mass functions f(A) and g(B) respectively, and let h(A; B) be their joint distribution ....
....The data for this test was built from the training data for the previous one in the 0 100 200 300 400 number of clusters 0 0.2 0.4 0.6 0.8 1 decision error exceptional all Figure 3. 7: Pairwise verb comparisons, 1988 Associated Press object verb pairs following way, based on an experiment by Dagan, Marcus, and Markovitch (1995). We randomly picked 104 object verb pairs (x; y) such that verb y appeared fairly frequently (between 500 and 5000 occurrences) and deleted all occurrences of such pairs from the training set. The resulting training set was used to build a sequence of cluster models as before. To create the test ....
[Article contains additional citation context not shown here]
Dagan, Ido, Shaul Marcus, and Shaul Markovitch. 1995. Contextual word similarity and estimation from sparse data. Computer Speech and Language, 9:123--152.
....propensity to validate most personal names as collocations. At least among West European languages, translations of the vast majority of personal names are perfectly compositional. Several authors have used mutual information and similar statistics as an objective function for word clustering (Dagan et al. 1993; Brown et al. 1992; Pereira et al. 1993; Wang et al. 1996) for automatic determination of phonemic baseforms (Lucassen Mercer, 1984) and for language modeling for speech recognition (Ries et al. 1996) Although the applications considered in this paper are different, the strategy is ....
I. Dagan, S. Marcus & S. Markovitch. (1993) "Contextual Word Similarity and Estimation from Sparse Data," Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. Columbus, OH.
....recognition system (see Chapter 3) and choosing between possible target representations in a machine translation system (as discussed in Section 8. 3) Finally it would be interesting to combine the work on semantic collocation metrics with that on similarity based clustering [Pereira et al. 1993, Dagan et al. 1993] with the aim of overcoming the problem of sparse training data. If this is successful, it might make these metrics suitable for disambiguation in domains with larger vocabularies than ATIS. SLT first year report Chapter 6 Corpus based grammar specialization for fast analysis Manny Rayner ....
Ido Dagan, Shaul Marcus, and Shaul Markovitch. "Contextual Word Similarity and Estimation from Sparse Data". In Proceedings of 31th Annual Meeting of the Association for Computational Linguistics, pp. 164--171, Columbus, Ohio, June 1993.
....is to use the word similarity information directly, to infer information about the likelihood of a co occurrence pattern from information about patterns involving similar words. This is the approach we have adopted for our current experiments [6] and which has also been employed by Dagan et al. [2]. We compute from the co occurrence data a confusion matrix , which measures the interchangeability of words in particular contexts. We then use the confusion matrix directly to generalize the semantic patterns. 2 Acquiring Semantic Patterns Based on a series of experiments over the past two ....
....may yeild an un normalized confusion matrix (i.e. P w j PC (w j jw i ) 1) we renormalize the matrix so that P w j PC (w j jw i ) 1. A similar approach to pattern generalization, using a similarity measure derived from co occurrence data, has been recently described by Dagan et al. [2]. Their approach differs from the one described here in two significant regards: their co occurrence data is based on linear distance within the sentence, rather than on syntactic relations, and they use a different similarity measure, based on mutual information. The relative merits of the two ....
[Article contains additional citation context not shown here]
Ido Dagan, Shaul Marcus, and Shaul Markovitch. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting of the Assn. for Computational Linguistics, pages 31--37, Columbus, OH, June 1993.
No context found.
Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proc. of the Annual Meeting of the ACL, pages 164--171.
....corresponding classes. Pereira, Tishby, and Lee (1993) propose a soft clustering scheme for certain grammatical cooccurrences in which membership of a word in a class is probabilistic. Cooccurrence probabilities of words are then modeled by averaged cooccurrence probabilities of word clusters. Dagan, Markus, and Markovitch (1993) argue that reduction to a relatively small number of predetermined word classes or clusters may cause a substantial loss of information. Their similarity based model avoids clustering altogether. Instead, each word is modeled by its own specific class, a set of words which are most similar to it ....
Dagan, Ido, Shaul Markus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In 30th Annual Meeting of the Association for Computational Linguistics, pages 164--171, Columbus, Ohio. Ohio State University, Association for Computational Linguistics, Morristown, New Jersey.
No context found.
Dagan, Ido, Shaul Marcus, and Shaul Markovitch (1993). "Contextual Word Similarity and Estimation from Sparse Data", in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL93). Columbus, Ohio.
No context found.
Dagan, Ido, Shaul Marcus, and Shaul Markovitch. 1995. Contextual word similarity and estimation from sparse data. Computer Speech and Language, 9(2):123--152.
No context found.
Dagan, Ido, Shaul Marcus, and Shaul Markovitch. 1995. Contextual word similarity and estimation from sparse data.
No context found.
Dagan, Ido, Shaul Marcus, and Shaul Markovitch. 1995. Contextual word similarity and estimation from sparse data. Computer Speech and Language, 9(2):123--152.
No context found.
Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting of the Assn. for Computational Linguistics, pages 31--37, Columbus, OH, June.
No context found.
Dagan, I., Marcus, S. and Markovitch, S. 1993. Contextual Word Similarity and Estimation from Sparse Data. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH.
No context found.
Ido Dagan, Shaul Marcus & Shaul Markovitch (1993). Contextual word similarity and estimation from sparse data, in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, 164-171.
No context found.
I. Dagan, S. Marcus, and S. Markovitch. Contextual word similarity and estimation from sparse data. In 31st Annual Meeting of the Association of Compuatational Linguistics, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC