| Dumais, S. (1994). Latent semantic indexing (lsi) and trec-2. Technical Report TM-ARH-023878, Bellcore. |
....merely excess noise that can be eliminated without reducing performance. Deerwester et al. 19] compare LSI to the Vector Space Model on two small test collections and find that a reduced representation performs at least as well and sometimes better than the full term model. Experiments by Dumais [20] on the TREC corpus, a very large heterogenous document collection, obtained very good results using LSI for the routing problem, scoring slightly below the best system which used massive query expansion and the Vector Space Model. The performance of LSI for the adhoc search task was not as good, ....
....simple to compute the statistic at all integer values of the DCV in a range of interest and then use the average over this range as the final measure of performance. For example, rather than just measuring precision at 10 documents, one might compute precision averaged over all values in the range [1 20]. This measure will smooth out any irregular effects due to the choice of exactly 10 rather than say 8 or 12 as the DCV of interest. The relationships shown above between precision, recall, DCV, and the number of relevant documents suggest that precision is a more accurate measure for low DCV and ....
Susan T. Dumais. Latent semantic indexing (lsi) and trec-2. In The Second Text REtrieval Conference (TREC-2), pages 105--115, 1993.
....term document matrix A to separate the global and general structure, corresponding to the large singular vectors, from local or noisy information, which hides among the small. LSI has been reported to perform quite well on both rather large and small document collections. See for example Dumais [6]. It can handle synonymy 1 (when two words mean the same) and polysemy (when one word has several distinct meanings depending on context) quite well. However LSI needs a substantial computational work to get the SVD, and there is no simple way to determine how many singular vectors that are ....
S. T. Dumais, Latent semantic indexing (LSI): TREC-3 report., in D K Harman Editor, The third Text REtrieval Conference (TREC-3), NIST Special Publication 500-225, 1995, pp. 219--230.
.... operators to more expressive and effective model such as the vector space model[1] in which documents and queries are represented as vectors of weighted terms and similarity is measured as the cosine of the angle between the vectors) latent semantic indexing (another vector based approach[5], which attempts to account for latent relationships between terms, has been shown to outperform the vector space model) probabilistic models[3] which model the information retrieval process in a probabilistic framework. The application of intelligent information management in the domain of ....
S. Dumais. Latent Semantic Indexing(LSI) Routing for TREC-3. TREC 3 Proceedings, November 1994.
....models from multiple databases. They are motivated by the fact that word occurrences follow a highly skewed distribution, with a few words occurring very often, and most words occurring rarely. In the light of evidence suggesting that the important vocabulary words occur frequently in a database [5, 9, 13], it is probable that these words might be acquired by sampling. Callan et al. show that if queries can be run and documents retrieved, then it is possible to sample the contents of each database in a way that will produce an accurate language model for the database. We extend query based ....
S. Dumais. Latent semantic indexing (lsi) and trec-2. In D. K. Harma, editor, The Second Text REtrieval Conference (TREC-2), pages 105--115. Gaithesburg, MD, 1994.
....at http: sift.stanford.edu sets. These include the system developed by Strzalkowski et al. 41] which uses natural language processing techniques, Okapi [31] which uses a probabilistic model, the WIN system [42] which utilises inference networks, and a LSI based system developed by Dumais [16]. 6 Relevance Feedback Relevance feedback has proved to be highly effective for improving information filtering and retrieval. Upon receiving returned articles, the user may provide relevance judgments for these articles. These relevance judgments may subsequently be used to guide the matching ....
S. Dumais. Latent semantic indexing(lsi) routing for trec-3. November 1994.
....present and describe the results of the system on four different data sets, comparing those results to other systems that incorporate unlabeled data. We conclude with a discussion of our current and ongoing work in this area. 2. OUR APPROACH 2. 1 Latent Semantic Indexing Latent SemanticIndexing [8] is basedupon the assumption that there is an underlying semantic structure in textual data, and that the relationship between terms and documents can be re described in this semantic structure form. Textual documents are represented as vectors in a vector space. Each position in a vector ....
....matrix to obtain Xn . Xn is a model of the space that was unobtainable with the training examples alone. The larger matrix contains words that did not occur in the training examples at all; it also provides us with richer and 2 This is in contrast to other uses of LSI for classification [10, 8, 9], in which one centroid vector is formed for each class, and a new example is labeled by those classes whose vector is sufficiently close to it. more reliable patterns for data in the given domain. To classify a test example incorporating the background knowledge in the decision process, the test ....
S. Dumais. Latent semantic indexing (LSI): TREC-3 report. In D. Hartman, editor, The Third Text REtrieval Conference, NIST special publication 500-225, pages 219--230, 1995.
....are indexed by the same search model, we may assume that scores attributed to documents are comparable across collections [12] The document scores are then used to merge the documents from collections into a single list. This strategy is called Raw Score Merging (RSM) However, Dumais [8] mentioned that various statistics may be collection dependant (e.g. the idf value used to weight documents and or queries) and these values may vary widely across collections. Therefore, this phenomenon may invalidate the raw score merging hypothesis. One variant of the RSM strategy is to ....
Dumais S. T.: Latent Semantic Indexing (LSI) and TREC-2. Proceedings of TREC-2, 1994, pp. 105-115.
....by the same or a very similar search engine and that the similarity values are therefore directly comparable [Kwok 1995] Moffat 1995] Such a strategy, called raw score merging, produces a final list sorted by the document score computed by each collection. However, as demonstrated by Dumais [1994], collectiondependent statistics in document or query weights may vary widely among collections, and therefore this phenomenon may invalidate the raw score merging hypothesis. To account for this fact, we might normalize the document score within each collection by dividing them by the maximum ....
Dumais, S. T. (1994). Latent semantic indexing (LSI) and TREC-2. In Proceedings of TREC'2, (pp. 105-115). Gaithersburg: NIST Publication #500-215.
....this low dimensional subspace (which are each linear combinations of the dimensions in the original vector space) then define the axes of the final vector space in which documents will be represented. While LSI was originally proposed for retrieval, its use in clustering [149] and classification [47, 148] have been recently explored as well. In classification tasks, such a representation has shown some utility when used in conjunction with linear classifiers. This seem to follow from the fact that LSI, by creating feature vectors which are linear combinations of the original term space, is helping ....
Dumais, S. T. Latent semantic indexing (LSI) and TREC-2. In Proceeding of the Second Text REtrieval Conference (TREC-2), D. K. Harman, Ed. National Institute of Standards and Technology, 1993, pp. 105--115.
....each information server applies the same (or very similar) search strategy and that the document score values are directly comparable. Such a strategy, called raw score merging, produces a final list based on the retrieval status value computed by each sub collection. However, as demonstrated by Dumais (1993), collectiondependent statistics in document or query weights may vary widely among subcollections, and therefore, this phenomenon may invalidate the raw score merging hypothesis. Finally, Callan et al. 1995) suggest a merging strategy based on the score achieved by both sub collection and ....
Dumais, S. T. (1993, November). Latent semantic indexing (LSI) and TREC-2.
....corpus statistics were computed or maintained. The merging of document rankings produced from different databases is a well known difficult IR problem. Differences in corpus statistics (particularly inverse document frequency, or idf) make document scores from different databases incomparable [5, 15]. Common solutions are to maintain global corpus information, which is not always practical, or to recompute document scores at the search client, which is undesirable excess computation (although not impractical) A third choice is to estimate normalized document scores heuristically, which has ....
S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. K. Harman, editor, The Second Text REtrieval Conference (TREC-2), pages 105--115, Gaithersburg, MD, 1994. National Institute of Standards and Technology, Special Publication 500-215.
....[15, 19] 1.2 Related Work There are three basic approaches for textual documents processing [14] lexical, syntactic, and semantic analysis. A number of systems using syntactic and semantic analysis have been developed and are being used for research, such as DR LINK [17] CLARIT [8] and TREC [7, 29]. However, they are typically not signi cantly better than the best lexical analyzers. We will discuss various lexical analyzers throughout the paper, in relation to our work. Very little has been done so far on hierarchical indexing. In general, it has been shown that hierarchical indexing ....
Dumais, Susan, T. 1995. \Latent semantic indexing (LSI): TREC-3 report." In Overview of 3rd Text Retrieval Conference (TREC-3). Donna K. Harman, ed. 1995. Washington, D. C.: Nist Special Publication.
....benefit to detect synonyms as well as words that refer to the same topic. In many applications this has proven to result in more robust word processing. Although LSA has been applied with remarkable success in different domains including automatic indexing (Latent Semantic Indexing, LSI) [1, 3], it has a number of deficits, mainly due to its unsatisfactory statistical foundation. The primary goal of this paper is to present a novel approach to LSA and factor analysis called Probabilistic Latent Semantic Analysis (PLSA) that has a solid statistical foundation, since it is based on ....
....Depicted are curves for direct term matching, LSI, and the best performing PLSI variant. our experiments, we have actually considered linear combinations of the original similarity score (11) weight ) and the one derived from the latent space representation (weight 1 Gamma ) as suggested in [3] (cf. 16] for a more detailed empirical investigation of linear combination schemes for information retrieval systems) 5.2 Variants of Probabilistic Latent Semantic Indexing Two different schemes to exploit PLSA for indexing have been investigated: i) as a context dependent unigram model to ....
Dumais, S. T. Latent semantic indexing (lsi): Trec-3 report. In Proceedings of the Text REtrieval Conference (TREC-3) (1995), D. Harman, Ed., pp. 219--30.
No context found.
Dumais, S. (1994). Latent semantic indexing (lsi) and trec-2. Technical Report TM-ARH-023878, Bellcore.
No context found.
Dumais, S. T. (1994) Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215 , (pp. 105-116).
No context found.
S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In The Second Text Retrieval Conference (TREC-2), 1994.
No context found.
Dumais, S.T. (1994). Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215 , pp. 105116.
No context found.
Dumais, S. T. 1994. Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215, pp. 105-116
No context found.
Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. Harman, editor, The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 105--116, 1994.
No context found.
Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. Harman, editor, The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 105--116, 1994.
No context found.
Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. Harman, editor, The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 105--116, 1994.
No context found.
Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. Harman, editor, The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 105--116, 1994.
No context found.
Dumais, S.T. Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215, pp. 105-116. 1994.
No context found.
Dumais, S. T. 1994. Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215 , pp. 105-116
No context found.
Dumais, Susan T. 1995. Latent Semantic Indexing (LSI): TREC-3 Report. In Harman, Donna K. (editor). Overview of the Third Text REtrieval Conference (TREC-3).NISTSpe- cial Publication 500-225, National Institute of Standards and Technology, Gaithersburg, MD, (http://trec.nist.gov/pubs.html), pp.219-230.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC