See this document in CiteSeerX!

Latent Semantic Indexing: A Probabilistic Analysis (1998)  (Make Corrections)  (64 citations)
Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, S. Vempala



  Home/Search   Context   Related

 
View or download:
mit.edu/~vempala/papers/lsi.ps
cmu.edu/academic/class/15...lsipods.ps
ibm.com/cs/people/pragh/prtv.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  mit.edu/~vempala/papers/papers (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We also propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging... (Update)

Cited by:   More
Improving Random Projections Using Marginal Information - Ping Li Trevor   (Correct)
Random Projection for High Dimensional Data Clustering: - Cluster Ensemble Approach   (Correct)
Machine Learning, 56, 9--33, 2004 - Clustering Large Graphs   (Correct)

Similar documents (at the sentence level):
70.2%:   Latent Semantic Indexing: A Probabilistic Analysis - Papadimitriou, Raghavan.. (1998)   (Correct)

Active bibliography (related documents):   More   All
0.2:   Thematic Indexing of Spoken Documents by Using Self-Organizing Maps - Kurimo (2000)   (Correct)
0.2:   Polynomial Time Approximation Schemes for Geometric k-Clustering - Ostrovsky, Rabani (2000)   (Correct)
0.2:   Finding Terminology Translations From Non-Parallel Corpora - Fung, McKeown (1997)   (Correct)

Similar documents based on text:   More   All
0.3:   Motion Planning on a Graph - Christos Papadimitriou Prabhakar   (Correct)
0.3:   A Mathematical View of Latent Semantic Indexing: - Kontostathis, Pottenger (2002)   (Correct)
0.3:   Efficient Algorithms for Universal Portfolios - Kalai, Vempala (2002)   (Correct)

Related documents from co-citation:   More   All
35:   Indexing by latent semantic analysis - Deerwester, Dumais et al. - 1990
12:   Using linear algebra for intelligent information retrieval - Berry, Dumais et al. - 1995
11:   Authoritative sources in a hyperlinked environment - Kleinberg - 1997

BibTeX entry:   (Update)

C.H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent Semantic Indexing: A Probabilistic Analysis. In Proocedings of the ACM Conference on Principles of Database Systems (PODS), Seattle (to appear), 1998. http://citeseer.ist.psu.edu/article/papadimitriou98latent.html   More

@inproceedings{ papadimitriou98latent,
    author = "Christos H. Papadimitriou and Hisao Tamaki and Prabhakar Raghavan and Santosh Vempala",
    title = "Latent Semantic Indexing: {A} Probabilistic Analysis",
    pages = "159--168",
    year = "1998",
    url = "citeseer.ist.psu.edu/article/papadimitriou98latent.html" }
Citations (may not include all citations):
2441   Johns Hopkins University Press (context) - Golub, Van Loan - 1989
1256   Introduction to modern information retrieval (context) - Salton, McGill - 1983
624   The Algebraic Eigenvalue Problem (context) - Wilkinson - 1965
568   Indexing by latent semantic analysis - Deerwester, Dumais et al. - 1990
375   Probability inequalities for sums of bounded random variable.. (context) - Hoeffding - 1963
192   Using linear algebra for intelligent information retrieval - Berry, Dumais et al. - 1995
115   Approximating the permanent (context) - Jerrum, Sinclair - 1989
76   Improving the retrieval of information from external sources (context) - Dumais - 1991
43   Clustering in large graphs and matrices - Drineas, Frieze et al. - 1999
41   Probabilistic models of information retrieval - Fuhr - 1992
41   Spectra of Graphs (context) - Cvetkovi'c, Doob et al. - 1979
36   A survey of information retrieval and filtering methods - Faloutsos, Oard
35   The Johnson-Lindenstrauss Lemma and the Sphericity of some g.. (context) - Frankl, Maehara - 1988
29   Extensions of Lipshitz mapping into Hilbert space (context) - Johnson, Lindenstrauss - 1984
23   Information retrieval algorithms: a survey (context) - Raghavan - 1997
22   Fast Monte-Carlo Algorithms for finding low-rank approximati.. - Frieze, Kannan et al. - 1998
10   Using latent semantic analysis to improve information retrie.. (context) - Dumais, Furnas et al. - 1988
9   University of Tennessee (context) - Berry, Do et al. - 1993
5   A comparison of text retrieval methods (context) - Turtle, Croft - 1992
5   Handbook for matrix computation II (context) - Golub, Reinsch - 1971
3   Combining fuzzy information from multiple sources (context) - Fagin - 1996
3   Mining information networks through spectral methods (context) - Chakrabarti, Dom et al. - 1997
3   Invited talk (context) - Brewer - 1997
3   Using nonlinear dynamical systems to mine categorical data (context) - Gibson, Kleinberg et al. - 1997



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www-math.mit.edu/~vempala/papers/papers.html):   More
Fast Monte-Carlo Algorithms for finding low-rank approximations - Frieze, Kannan, al. (1998)   (Correct)
Semi-Definite Relaxations for Minimum Bandwidth and.. - Blum, Konjevod, Ravi, .. (1998)   (Correct)
The Colin de Verdière number and sphere.. - Kotlov.. (1996)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC