Results 1 -
6 of
6
Latent semantic indexing: A probabilistic analysis
, 1998
"... Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underl ..."
Abstract
-
Cited by 210 (8 self)
- Add to MetaCart
Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We also propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.
Terms representation with generalized latent semantic analysis
- In Proc. ranlp
, 2005
"... Document indexing and representation of termdocument relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is m ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Document indexing and representation of termdocument relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by the recent success of co-occurrence based measures of semantic similarity obtained from very large corpora. Our experiments demonstrate that GLSA term vectors efficiently capture semantic relations between terms and outperform related approaches on the synonymy test. 1
Thematic Indexing of Spoken Documents by Using Self-Organizing Maps
- RR 00-5, IDIAP
, 2000
"... A method is presented to provide a useful searchable index for spoken audio documents. The task differs from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take adv ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
A method is presented to provide a useful searchable index for spoken audio documents. The task differs from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with the help of the other documents close to it using a semantic vector space. First, the audio stream is converted into a text stream by a speech recognizer. Then the text of each story is represented by a document vector which is the normalized sum of the word vectors in the story. A large collection of document vectors is used to train a self-organizing map to find the clusters and latent semantic structures in the collection. Because the news stories are quite short and include speech recognition errors, the idea of smoothing the document vectors using the thematic clusters determined by the self-...
Using singular value decomposition to improve a genetic algorithm’s performance
- In Proceedings of the 2003 Congress on Evolutionary Computation CEC2003
, 2003
"... Abstract- The focus of this work is to investigate the effects of applying the singular value decomposition (SVD), a linear algebra technique, to the domain of Genetic Algorithms. Empirical evidence, concerning document comparison, suggests that the SVD can be used to model information in such a way ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract- The focus of this work is to investigate the effects of applying the singular value decomposition (SVD), a linear algebra technique, to the domain of Genetic Algorithms. Empirical evidence, concerning document comparison, suggests that the SVD can be used to model information in such a way that provides both a saving in storage space and an improvement in information retrieval. It will be shown that these beneficial properties can be extended to many other different types of comparison as well. Briefly, vectors representing the genes of individuals are projected into a new low-dimensional space, obtained by the singular value decomposition of a gene-individual matrix. The information about what it means to be a good or bad individual serves as a basis for qualifying candidate individuals for reinsertion into the next generation. Positive results from different approaches of this application are presented and evaluated. In addition, several possible alternative techniques are proposed and considered. 1
Topic Segmentation with Hybrid Document Indexing
"... We present a domain-independent unsupervised topic segmentation approach based on hybrid document indexing. Lexical chains have been successfully employed to evaluate lexical cohesion of text segments and to predict topic boundaries. Our approach is based in the notion of semantic cohesion. It uses ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a domain-independent unsupervised topic segmentation approach based on hybrid document indexing. Lexical chains have been successfully employed to evaluate lexical cohesion of text segments and to predict topic boundaries. Our approach is based in the notion of semantic cohesion. It uses spectral embedding to estimate semantic association between content nouns over a span of multiple text segments. Our method significantly outperforms the baseline on the topic segmentation task and achieves performance comparable to state-of-the-art methods that incorporate domain specific information. 1
Subproblem Optimization by Gene Correlation with Singular Value Decomposition
"... Several ways of using singular value decomposition (SVD), a linear algebra technique typically used for information retrieval, to decompose problems into subproblems are investigated in the genetic algorithm setting. Empirical evidence, concerning document comparison, indicates that using SVD result ..."
Abstract
- Add to MetaCart
Several ways of using singular value decomposition (SVD), a linear algebra technique typically used for information retrieval, to decompose problems into subproblems are investigated in the genetic algorithm setting. Empirical evidence, concerning document comparison, indicates that using SVD results both in a savings in storage space and an improvement in information retrieval. Combining theoretical results and algorithms discovered by others, several problems are identified that the SVD can be used with to determine a substructure. Subproblems are discovered by projecting vectors representing the genes of highly fit individuals into a new low-dimensional space, obtained by truncating the SVD of a strategically chosen gene × individual matrix. Techniques are proposed and evaluated that use the subproblems identified by SVD to influence the evolution of the genetic algorithm. By restricting the locus of optimization to the substructure of highly fit individuals, the performance of the genetic algorithm was improved. Performance was also improved by using SVD to genetically engineer individuals out of the subproblems.

