• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Semantic similarity based on corpus statistics and lexical taxonomy (1997)

by J J Jiang, D W Conrath
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 873
Next 10 →

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

by Evgeniy Gabrilovich, Shaul Markovitch - In Proceedings of the 20th International Joint Conference on Artificial Intelligence , 2007
"... Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedi ..."
Abstract - Cited by 562 (9 self) - Add to MetaCart
Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r =0.56 to 0.75 for individual words and from r =0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users. 1

WordNet::Similarity -- Measuring the Relatedness of Concepts

by Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi , 2004
"... WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity or relatedness between a pair of concepts (or word senses). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical databa ..."
Abstract - Cited by 388 (8 self) - Add to MetaCart
WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity or relatedness between a pair of concepts (or word senses). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical database WordNet. These measures are implemented as Perl modules which take as input two concepts, and return a numeric value that represents the degree to which they are similar or related.
(Show Context)

Citation Context

... three remaining similarity measures are based on information content, which is a corpus–based measure of the specificity a concept. These measures include res (Resnik 1995), lin (Lin 1998), and jcn (=-=Jiang & Conrath 1997-=-). The lin and jcn measures augment the information content of the LCS of two concepts with the sum of the information content of the individual concepts. The lin measure scales the information conten...

From frequency to meaning : Vector space models of semantics

by Peter D. Turney, Patrick Pantel - Journal of Artificial Intelligence Research , 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract - Cited by 347 (3 self) - Add to MetaCart
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
(Show Context)

Citation Context

...Section 6.2) share the task of measuring the semantic similarity of words. The main alternatives to VSMs for measuring word similarity are approaches that use lexicons, such as WordNet (Resnik, 1995; =-=Jiang & Conrath, 1997-=-; Hirst & St-Onge, 1998; Leacock & Chodrow, 1998; Budanitsky & Hirst, 2001). The idea is to view the lexicon as a graph, in which nodes correspond to word senses and edges represent relations between ...

Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures

by Alexander Budanitsky, Graeme Hirst - IN WORKSHOP ON WORDNET AND OTHER LEXICAL RESOURCES, SECOND MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS , 2001
"... Five different proposed measures of similarity or semantic distance in WordNet were experimentally compared by examining their performance in a real-word spelling correction system. It was found that Jiang and Conrath 's measure gave the best results overall. That of Hirst and St-Onge seriously ..."
Abstract - Cited by 338 (4 self) - Add to MetaCart
Five different proposed measures of similarity or semantic distance in WordNet were experimentally compared by examining their performance in a real-word spelling correction system. It was found that Jiang and Conrath 's measure gave the best results overall. That of Hirst and St-Onge seriously over-related, that of Resnik seriously under-related, and those of Lin and of Leacock and Chodorow fell in between.

Evaluating WordNet-based measures of lexical semantic relatedness

by Alexander Budanitsky, Graeme Hirst - Computational Linguistics , 2006
"... The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling error ..."
Abstract - Cited by 321 (0 self) - Add to MetaCart
The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling errors. An information-content–based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. In addition, we explain why distributional similarity is not an adequate proxy for lexical semantic relatedness. 1.

Extended gloss overlaps as a measure of semantic relatedness

by Satanjeev Banerjee - In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence , 2003
"... This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to whic ..."
Abstract - Cited by 264 (8 self) - Add to MetaCart
This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to which they are related according to a given concept hierarchy. We show that this new measure reasonably correlates to human judgments. We introduce a new method of word sense disambiguation based on extended gloss overlaps, and demonstrate that it fares well on the SENSEVAL-2 lexical sample data. 1

Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL

by Peter D. Turney , 2001
"... This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of wo ..."
Abstract - Cited by 262 (13 self) - Add to MetaCart
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).

Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation

by P. W. Lord, R. D. Stevens, A. Brass, C. A. Goble - Bioinformatics , 2003
"... between sequence and annotation ..."
Abstract - Cited by 247 (5 self) - Add to MetaCart
between sequence and annotation
(Show Context)

Citation Context

...‘orphan terms’ would invalidate this, see Section 3.1). Once we have calculated these probabilities, there are a variety of different mechanisms for calculating the semantic similarity between terms (=-=Jiang and Conrath, 1998-=-; Lin, 1998). In this paper we have used the simplest of these measures (Resnik, 1999). This measure is based on the information content of shared parents of the two terms, as defined in Equation (1),...

Reading Tea Leaves: How Humans Interpret Topic Models

by Jonathan Chang, Jordan Boyd-graber, Sean Gerrish, Chong Wang, David M. Blei
"... Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Practitioners typically assume that the latent space is semantically meaningful. It is used to check models, summariz ..."
Abstract - Cited by 238 (26 self) - Add to MetaCart
Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Practitioners typically assume that the latent space is semantically meaningful. It is used to check models, summarize the corpus, and guide exploration of its contents. However, whether the latent space is interpretable is in need of quantitative evaluation. In this paper, we present new quantitative methods for measuring semantic meaning in inferred topics. We back these measures with large-scale user studies, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood. Surprisingly, topic models which perform better on held-out likelihood may infer less semantically meaningful topics. 1
(Show Context)

Citation Context

...nd topics) with the number of senses in WordNet of the words displayed to the subjects [24] and a slight positive correlation (ρ = 0.109) with the average pairwise Jiang-Conrath similarity of words 1 =-=[25]-=-. Topic intrusion In Section 3.2, we introduced the topic intrusion task to measure how well a topic model assigns topics to documents. We define the topic log odds as a quantitative measure of the de...

Dependency-based construction of semantic space models

by Sebastian Padó, Mirella Lapata - Computational Linguistics , 2007
"... Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of m ..."
Abstract - Cited by 236 (14 self) - Add to MetaCart
Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of models, which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection, and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art. 1.
(Show Context)

Citation Context

...1998), relative depth (Leacock and Chodorow 1998), and density (Agirre and Rigau 1996). A number of hybrid approaches have also been proposed that combine WordNet with corpus statistics (Resnik 1995; =-=Jiang and Conrath 1997-=-). McCarthy et al. (2004) use their ranking model to automatically infer the first senses of all nouns attested in SemCor, a subset of the Brown corpus containing 23,346 lemmas annotated with senses a...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University