• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

DMCA

Probabilistic Latent Semantic Indexing (1999)

Cached

  • Download as a PDF

Download Links

  • [www-connex.lip6.fr]
  • [www.cs.pitt.edu]
  • [faculty.cs.byu.edu]
  • [www3.cs.pitt.edu]
  • [cs.brown.edu]
  • [www-poleia.lip6.fr]
  • [www.cs.brown.edu]
  • [www.cs.brown.edu]
  • [cs.brown.edu]
  • [www-connex.lip6.fr]
  • [people.cs.pitt.edu]
  • [people.cs.pitt.edu]
  • [cs.brown.edu]
  • [people.cs.pitt.edu]
  • [people.cs.pitt.edu]
  • [www-dbs.cs.uni-sb.de]
  • [www.cs.brown.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Thomas Hofmann
Citations:1223 - 10 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Hofmann99probabilisticlatent,
    author = {Thomas Hofmann},
    title = {Probabilistic Latent Semantic Indexing},
    year = {1999}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain-specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methodsaswell as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.

Keyphrases

probabilistic latent semantic indexing    singular value decomposition    factor analysis    expectation maximization algorithm    count data    retrieval experiment    different dimensionality    probabilistic variant    statistical latent class model    domain-specific synonymy    polysemous word    direct term    text document    training corpus    substantial performance gain    novel approach    latent semantic indexing    test collection    utilized model    solid statistical foundation    proper generative data model   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University