• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge (1997)

by T K Landauer, S T Dumais
Venue:Psychological Review
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,816
Next 10 →

Abstraction in perceptual symbol systems

by L. W. Barsalou , 2003
"... ..."
Abstract - Cited by 1168 (32 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ual character (Glaser, 1992; also see Seifert, 1997).sMore recently, researchers have suggested that amodal vectors derived from linguistic context underlie semantic processing (Burgess & Lund, 1997; =-=Landauer & Dumais, 1997-=-).sHowever, Glenberg et al. (1998b) provide strong evidence against these views, suggesting instead that affordances derived from sensory-motor simulations are essential to semantic processing. Findin...

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

by Peter Turney , 2002
"... This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A ..."
Abstract - Cited by 784 (5 self) - Add to MetaCart
This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.
(Show Context)

Citation Context

...cal measure of word association, attains a score of 64% on the 3 http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z 4 See Santorini (1995) for a complete description of the tags. 418same 80 TOEFL questions (=-=Landauer & Dumais, 1997-=-). The Pointwise Mutual Information (PMI) between two words, word1 and word2, is defined as follows (Church & Hanks, 1989): PMI(word1, word2) = log2 p(word1 & word2) p(word1) p(word2) (1) Here, p(word...

Probabilistic Latent Semantic Analysis

by Thomas Hofmann - In Proc. of Uncertainty in Artificial Intelligence, UAI’99 , 1999
"... Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two--mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Sema ..."
Abstract - Cited by 771 (9 self) - Add to MetaCart
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two--mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.
(Show Context)

Citation Context

...eyond the lexical level and reveals semantical relations between the entities of interest. Due to its generality, LSA has proven to be a valuable analysis tool with a wide range of applications (e.g. =-=[3, 5, 8, 1]-=-). Yet its theoretical foundation remains to a large extent unsatisfactory and incomplete. This paper presents a statistical view on LSA which leads to a new model called Probabilistic Latent Semantic...

Unsupervised Learning by Probabilistic Latent Semantic Analysis

by Thomas Hofmann - Machine Learning , 2001
"... Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurren ..."
Abstract - Cited by 618 (4 self) - Add to MetaCart
Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.
(Show Context)

Citation Context

...erality, LSA has proven to be a valuable analysis tool for many different problems in practice and thus has a wide range of possible applications (e.g., Deerwester et al., 1990; Foltz & Dumais, 1992; =-=Landauer & Dumais, 1997-=-; Wolfe et al., 1998; Bellegarda, 1998). Despite its success, there are a number of shortcomings of LSA. First of all, the methodological foundation remains to a large extent unsatisfactory and incomp...

From frequency to meaning : Vector space models of semantics

by Peter D. Turney, Patrick Pantel - Journal of Artificial Intelligence Research , 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract - Cited by 347 (3 self) - Add to MetaCart
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
(Show Context)

Citation Context

...ng item. Many techniques for vector analysis, such as factor analysis (Spearman, 1904), were pioneered in psychometrics. In cognitive science, Latent Semantic Analysis (LSA) (Deerwester et al., 1990; =-=Landauer & Dumais, 1997-=-), Hyperspace Analogue to Language (HAL) (Lund, Burgess, & Atchley, 1995; Lund & Burgess, 1996), and related research (Landauer, McNamara, Dennis, & Kintsch, 2007) is entirely within the scope of VSMs...

The Google similarity distance

by Rudi Cilibrasi, Paul M. B. Vitányi , 2005
"... Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of ‘society ’ is ‘database, ’ and the equivalent of ‘use ’ is ‘way to search the database. ’ We present a new theory of similarity between ..."
Abstract - Cited by 320 (9 self) - Add to MetaCart
Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of ‘society ’ is ‘database, ’ and the equivalent of ‘use ’ is ‘way to search the database. ’ We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wideweb using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide

Measuring praise and criticism: Inference of semantic orientation from association

by Peter D. Turney, Michael L. Littman - ACM Transactions on Information Systems , 2003
"... The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., “honest”, “intrepid”) and negative semantic orientation indicates criticism (e.g., “disturbing”, “superfluous”). Semantic orientation varies in both direction (positive or neg ..."
Abstract - Cited by 311 (6 self) - Add to MetaCart
The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., “honest”, “intrepid”) and negative semantic orientation indicates criticism (e.g., “disturbing”, “superfluous”). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This article introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8 % on the full test set, but the accuracy rises above 95 % when the algorithm is allowed to abstain from classifying mild words.

Discovering Word Senses from Text.

by Patrick Pantel , Dekang Lin , General Terms  , Keywords  - In Proceedings of the 8th ACM Conference on Knowledge Discovery and Data Mining (KDD-02), , 2002
"... ..."
Abstract - Cited by 293 (18 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ame contexts tend to be similar. This is known as the Distributional Hypothesis [3]. There have been many approaches to compute the similarity between words based on their distribution in a corpus [4]=-=[8]-=-[12]. The output of these programs is a ranked list of similar words to each word. For example, [12] outputs the following similar words for wine and suit: wine: beer, white wine, red wine, Chardonnay...

Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL

by Peter D. Turney , 2001
"... This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of wo ..."
Abstract - Cited by 262 (13 self) - Add to MetaCart
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).

Automatic Identification of Word Translations from Unrelated English and German Corpora

by Reinhard Rapp , 1999
"... Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is ..."
Abstract - Cited by 244 (2 self) - Add to MetaCart
Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University