#### DMCA

## Unsupervised Learning by Probabilistic Latent Semantic Analysis (2001)

### Cached

### Download Links

Venue: | Machine Learning |

Citations: | 605 - 4 self |

### Citations

11694 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ities, documents and words. 3.2. Model fitting with the EM algorithm The standard procedure for maximum likelihood estimation in latent variable models is the Expectation Maximization (EM) algorithm (=-=Dempster, Laird, & Rubin, 1977-=-). EM Figure 1. Graphical model representation of the aspect model in the asymmetric (a) and symmetric (b) parameterization.s182 T. HOFMANN alternates two steps: (i) an expectation (E) step where post... |

3943 |
Introduction to Modern Information Retrieval
- Salton, McGill
- 1985
(Show Context)
Citation Context ...ough there are also notable differences. The key idea in LSA is to map high-dimensional count vectors, such as termfrequency (tf) vectors arising in the vector space representation of text documents (=-=Salton & McGill, 1983-=-), to a lower dimensional representation in a so-called latent semantic space. In doing so, LSA aims at finding a data mapping which provides information beyond the lexical level of word occurrences. ... |

3704 | Latent semantic analysis
- Dumais, T
- 2004
(Show Context)
Citation Context ...se this includes synonyms, i.e., words with identical or almost identical meaning. As the name PLSA indicates, our approach has been largely inspired and influenced by Latent Semantic Analysis (LSA) (=-=Deerwester et al., 1990-=-), although there are also notable differences. The key idea in LSA is to map high-dimensional count vectors, such as termfrequency (tf) vectors arising in the vector space representation of text docu... |

1767 | A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge
- Landauer, Dumais
- 1997
(Show Context)
Citation Context ...erality, LSA has proven to be a valuable analysis tool for many different problems in practice and thus has a wide range of possible applications (e.g., Deerwester et al., 1990; Foltz & Dumais, 1992; =-=Landauer & Dumais, 1997-=-; Wolfe et al., 1998; Bellegarda, 1998). Despite its success, there are a number of shortcomings of LSA. First of all, the methodological foundation remains to a large extent unsatisfactory and incomp... |

1643 | Learning the parts of objects by non-negative matrix factorization - Lee, Seung - 1999 |

980 | A view of the EM algorithm that justifies incremental, sparse, and other variants - Neal, Hinton - 1999 |

670 | Using Linear Algebra for Intelligent Information Retrieval - Berry, Dumais, et al. - 1995 |

625 | Distributional clustering of english words - Pereira, Tishby, et al. - 1993 |

561 |
Bayesian classification (AutoClass): Theory and results
- Cheeseman, Stutz
- 1995
(Show Context)
Citation Context ...closely related to our approach is the distributional clustering model (Pereira, Tishby, & Lee, 1993; Baker & McCallum, 1998) and the multinomial (maximum likelihood) version of Autoclass clustering (=-=Cheeseman & Stutz, 1996-=-), an unsupervised version of a naive Bayes’ classifier.sPROBABILISTIC LATENT SEMANTIC ANALYSIS 187 Figure 5. The 2 aspects to most likely generate the word ‘flight’ (left) and ‘love’ (right), derived... |

297 | Distributional clustering of words for text classification
- Baker, McCallum
- 1998
(Show Context)
Citation Context ...ents, one typically associates a latent class variable with each document in the collection. Most closely related to our approach is the distributional clustering model (Pereira, Tishby, & Lee, 1993; =-=Baker & McCallum, 1998-=-) and the multinomial (maximum likelihood) version of Autoclass clustering (Cheeseman & Stutz, 1996), an unsupervised version of a naive Bayes’ classifier.sPROBABILISTIC LATENT SEMANTIC ANALYSIS 187 F... |

291 |
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
- Witten, Bell
- 1991
(Show Context)
Citation Context ....g., for tasks like text retrieval based on keywords. The co-occurrence table representation immediately reveals the problem of data sparseness (Katz, 1987), also known as the zero-frequency problem (=-=Witten & Bell, 1991-=-). A typical term-document matrix derived from short articles, text summaries or abstracts may only have a small fraction of non-zero entries (typically well below 1%), which reflects the fact that on... |

121 | Unsupervised learning from dyadic data
- Hofmann, Puzicha
- 1998
(Show Context)
Citation Context ... 3. Probabilistic latent semantic analysis 3.1. The aspect model The starting point for our novel Probabilistic Latent Semantic Analysis is a statistical model which has been called the aspect model (=-=Hofmann, Puzicha, & Jordan, 1999-=-). The aspect model has independently been proposed by Saul and Peveira (1997) in the context of language modeling, where it is referred to as aggregate Markov model. In the statistical literature sim... |

116 | Deterministic annealing EM algorithm - Ueda, Nakano - 1998 |

112 |
G.C.Fox, A deterministic annealing approach to clustering
- Rose
- 1990
(Show Context)
Citation Context ... which is known as annealing and is based on an entropic regularization term. The resulting method is called Tempered Expectation Maximization (TEM) and is closely related to deterministic annealing (=-=Rose, Gurewitz, & Fox, 1990-=-). The combination of deterministic annealing with the EM algorithm has been investigated before in Ueda and Nakano (1998), Hofmann, Puzicha, and Jordan (1999). The starting point of TEM is a derivati... |

93 | Aggregate and mixed-order Markov models for statistical language processing - Saul, Pereira - 1997 |

64 | Latent semantic indexing (LSI): TREC-3 report
- Dumais
- 1995
(Show Context)
Citation Context ...o a vector space of reduced dimensionality, the latent semantic space, which in a typical application in document indexing is chosen to have of the order ≈100−300 dimensions (Deerwester et al., 1990; =-=Dumais, 1995-=-). The mapping of the given document/term vectors to its latent space representatives is restricted to be linear and is based on a decomposition of the co-occurrence matrix by SVD. One thus starts wit... |

59 | Learning from text: Matching readers and texts by latent semantic analysis
- Wolfe, Schreiner, et al.
- 1998
(Show Context)
Citation Context ...o be a valuable analysis tool for many different problems in practice and thus has a wide range of possible applications (e.g., Deerwester et al., 1990; Foltz & Dumais, 1992; Landauer & Dumais, 1997; =-=Wolfe et al., 1998-=-; Bellegarda, 1998). Despite its success, there are a number of shortcomings of LSA. First of all, the methodological foundation remains to a large extent unsatisfactory and incomplete. The original m... |

54 | Towards better integration of semantic predictors in statistical language modeling - Coccaro, Jurafsky - 1998 |

18 |
Exploiting both local and global constraints for multi-spanstatistical languagemodeling
- Bellegarda
- 1998
(Show Context)
Citation Context ...ysis tool for many different problems in practice and thus has a wide range of possible applications (e.g., Deerwester et al., 1990; Foltz & Dumais, 1992; Landauer & Dumais, 1997; Wolfe et al., 1998; =-=Bellegarda, 1998-=-). Despite its success, there are a number of shortcomings of LSA. First of all, the methodological foundation remains to a large extent unsatisfactory and incomplete. The original motivation for LSA ... |

10 |
Canonical analysis of contingency tables by maximum likelihood
- Gilula, J
- 1986
(Show Context)
Citation Context ...7) in the context of language modeling, where it is referred to as aggregate Markov model. In the statistical literature similar models have been discussed for the analysis of contingency tables (cf. =-=Gilula & Haberman, 1986-=-). Another closely related technique called non-negative matrix decomposition has been investigated in Lee and Seung (1999). The aspect model is a latent variable model for co-occurrence data which as... |

10 |
Estimation of probabilities for sparse data for the language model component of a speech recogniser
- Katz
- 1987
(Show Context)
Citation Context ...many cases preserve most of the relevant information, e.g., for tasks like text retrieval based on keywords. The co-occurrence table representation immediately reveals the problem of data sparseness (=-=Katz, 1987-=-), also known as the zero-frequency problem (Witten & Bell, 1991). A typical term-document matrix derived from short articles, text summaries or abstracts may only have a small fraction of non-zero en... |

4 |
An analysis of information filtering methods
- Foltz, Dumais
(Show Context)
Citation Context ... space. Due to its generality, LSA has proven to be a valuable analysis tool for many different problems in practice and thus has a wide range of possible applications (e.g., Deerwester et al., 1990; =-=Foltz & Dumais, 1992-=-; Landauer & Dumais, 1997; Wolfe et al., 1998; Bellegarda, 1998). Despite its success, there are a number of shortcomings of LSA. First of all, the methodological foundation remains to a large extent ... |

2 | Linguistic Data Consortium: TDT pilot study corpus documentation. http://www.ldc.upenn.edu/TDT - LDC - 1997 |