52 citations found. Retrieving documents...
Dumais, S. (1994). Latent semantic indexing (lsi) and trec-2. Technical Report TM-ARH-023878, Bellcore.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Information Retrieval Using Statistical Classification - Hull (1994)   (2 citations)  (Correct)

....merely excess noise that can be eliminated without reducing performance. Deerwester et al. 19] compare LSI to the Vector Space Model on two small test collections and find that a reduced representation performs at least as well and sometimes better than the full term model. Experiments by Dumais [20] on the TREC corpus, a very large heterogenous document collection, obtained very good results using LSI for the routing problem, scoring slightly below the best system which used massive query expansion and the Vector Space Model. The performance of LSI for the adhoc search task was not as good, ....

....simple to compute the statistic at all integer values of the DCV in a range of interest and then use the average over this range as the final measure of performance. For example, rather than just measuring precision at 10 documents, one might compute precision averaged over all values in the range [1 20]. This measure will smooth out any irregular effects due to the choice of exactly 10 rather than say 8 or 12 as the DCV of interest. The relationships shown above between precision, recall, DCV, and the number of relevant documents suggest that precision is a more accurate measure for low DCV and ....

Susan T. Dumais. Latent semantic indexing (lsi) and trec-2. In The Second Text REtrieval Conference (TREC-2), pages 105--115, 1993.


Information Retrieval Using A Krylov Subspace Method. - Blom, Ruhe   (Correct)

....term document matrix A to separate the global and general structure, corresponding to the large singular vectors, from local or noisy information, which hides among the small. LSI has been reported to perform quite well on both rather large and small document collections. See for example Dumais [6]. It can handle synonymy 1 (when two words mean the same) and polysemy (when one word has several distinct meanings depending on context) quite well. However LSI needs a substantial computational work to get the SVD, and there is no simple way to determine how many singular vectors that are ....

S. T. Dumais, Latent semantic indexing (LSI): TREC-3 report., in D K Harman Editor, The third Text REtrieval Conference (TREC-3), NIST Special Publication 500-225, 1995, pp. 219--230.


Bridging the Gap between Desirable Web-based.. - Byrne, Flynn.. (2001)   (Correct)

.... operators to more expressive and effective model such as the vector space model[1] in which documents and queries are represented as vectors of weighted terms and similarity is measured as the cosine of the angle between the vectors) latent semantic indexing (another vector based approach[5], which attempts to account for latent relationships between terms, has been shown to outperform the vector space model) probabilistic models[3] which model the information retrieval process in a probabilistic framework. The application of intelligent information management in the domain of ....

S. Dumais. Latent Semantic Indexing(LSI) Routing for TREC-3. TREC 3 Proceedings, November 1994.


Learning a Monolingual Language Model from a Multilingual Text .. - Ghani, Jones (2000)   (Correct)

....models from multiple databases. They are motivated by the fact that word occurrences follow a highly skewed distribution, with a few words occurring very often, and most words occurring rarely. In the light of evidence suggesting that the important vocabulary words occur frequently in a database [5, 9, 13], it is probable that these words might be acquired by sampling. Callan et al. show that if queries can be run and documents retrieved, then it is possible to sample the contents of each database in a way that will produce an accurate language model for the database. We extend query based ....

S. Dumais. Latent semantic indexing (lsi) and trec-2. In D. K. Harma, editor, The Second Text REtrieval Conference (TREC-2), pages 105--115. Gaithesburg, MD, 1994.


Information Filtering and Retrieval: An Overview - Riordan, Sorensen   (Correct)

....at http: sift.stanford.edu sets. These include the system developed by Strzalkowski et al. 41] which uses natural language processing techniques, Okapi [31] which uses a probabilistic model, the WIN system [42] which utilises inference networks, and a LSI based system developed by Dumais [16]. 6 Relevance Feedback Relevance feedback has proved to be highly effective for improving information filtering and retrieval. Upon receiving returned articles, the user may provide relevance judgments for these articles. These relevance judgments may subsequently be used to guide the matching ....

S. Dumais. Latent semantic indexing(lsi) routing for trec-3. November 1994.


Using LSI for Text Classification in the Presence of.. - Zelikovitz, Hirsh (2001)   (10 citations)  (Correct)

....present and describe the results of the system on four different data sets, comparing those results to other systems that incorporate unlabeled data. We conclude with a discussion of our current and ongoing work in this area. 2. OUR APPROACH 2. 1 Latent Semantic Indexing Latent SemanticIndexing [8] is basedupon the assumption that there is an underlying semantic structure in textual data, and that the relationship between terms and documents can be re described in this semantic structure form. Textual documents are represented as vectors in a vector space. Each position in a vector ....

....matrix to obtain Xn . Xn is a model of the space that was unobtainable with the training examples alone. The larger matrix contains words that did not occur in the training examples at all; it also provides us with richer and 2 This is in contrast to other uses of LSI for classification [10, 8, 9], in which one centroid vector is formed for each class, and a new example is labeled by those classes whose vector is sufficiently close to it. more reliable patterns for data in the given domain. To classify a test example incorporating the background knowledge in the decision process, the test ....

S. Dumais. Latent semantic indexing (LSI): TREC-3 report. In D. Hartman, editor, The Third Text REtrieval Conference, NIST special publication 500-225, pages 219--230, 1995.


Approaches to Collection Selection and Results Merging.. - Rasolofo, Abbaci, Savoy (2001)   (4 citations)  (Correct)

....are indexed by the same search model, we may assume that scores attributed to documents are comparable across collections [12] The document scores are then used to merge the documents from collections into a single list. This strategy is called Raw Score Merging (RSM) However, Dumais [8] mentioned that various statistics may be collection dependant (e.g. the idf value used to weight documents and or queries) and these values may vary widely across collections. Therefore, this phenomenon may invalidate the raw score merging hypothesis. One variant of the RSM strategy is to ....

Dumais S. T.: Latent Semantic Indexing (LSI) and TREC-2. Proceedings of TREC-2, 1994, pp. 105-115.


Report on CLEF-2001 Experiments - Savoy (2001)   (1 citation)  (Correct)

....by the same or a very similar search engine and that the similarity values are therefore directly comparable [Kwok 1995] Moffat 1995] Such a strategy, called raw score merging, produces a final list sorted by the document score computed by each collection. However, as demonstrated by Dumais [1994], collectiondependent statistics in document or query weights may vary widely among collections, and therefore this phenomenon may invalidate the raw score merging hypothesis. To account for this fact, we might normalize the document score within each collection by dividing them by the maximum ....

Dumais, S. T. (1994). Latent semantic indexing (LSI) and TREC-2. In Proceedings of TREC'2, (pp. 105-115). Gaithersburg: NIST Publication #500-215.


Using Machine Learning To Improve Information Access - Sahami (1999)   (15 citations)  (Correct)

....this low dimensional subspace (which are each linear combinations of the dimensions in the original vector space) then define the axes of the final vector space in which documents will be represented. While LSI was originally proposed for retrieval, its use in clustering [149] and classification [47, 148] have been recently explored as well. In classification tasks, such a representation has shown some utility when used in conjunction with linear classifiers. This seem to follow from the fact that LSI, by creating feature vectors which are linear combinations of the original term space, is helping ....

Dumais, S. T. Latent semantic indexing (LSI) and TREC-2. In Proceeding of the Second Text REtrieval Conference (TREC-2), D. K. Harman, Ed. National Institute of Standards and Technology, 1993, pp. 105--115.


Report on the TREC-5 Experiment: Data Fusion and.. - Savoy, Le Calv, Vrajitoru (1988)   (8 citations)  (Correct)

....each information server applies the same (or very similar) search strategy and that the document score values are directly comparable. Such a strategy, called raw score merging, produces a final list based on the retrieval status value computed by each sub collection. However, as demonstrated by Dumais (1993), collectiondependent statistics in document or query weights may vary widely among subcollections, and therefore, this phenomenon may invalidate the raw score merging hypothesis. Finally, Callan et al. 1995) suggest a merging strategy based on the score achieved by both sub collection and ....

Dumais, S. T. (1993, November). Latent semantic indexing (LSI) and TREC-2.


The Effects of Query-Based Sampling on Automatic.. - Callan, French.. (2000)   (4 citations)  (Correct)

....corpus statistics were computed or maintained. The merging of document rankings produced from different databases is a well known difficult IR problem. Differences in corpus statistics (particularly inverse document frequency, or idf) make document scores from different databases incomparable [5, 15]. Common solutions are to maintain global corpus information, which is not always practical, or to recompute document scores at the search client, which is undesirable excess computation (although not impractical) A third choice is to estimate normalized document scores heuristically, which has ....

S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. K. Harman, editor, The Second Text REtrieval Conference (TREC-2), pages 105--115, Gaithersburg, MD, 1994. National Institute of Standards and Technology, Special Publication 500-215.


Hierarchical Indexing and Document Matching in BoW - Geffet, Feitelson (2001)   (1 citation)  (Correct)

....[15, 19] 1.2 Related Work There are three basic approaches for textual documents processing [14] lexical, syntactic, and semantic analysis. A number of systems using syntactic and semantic analysis have been developed and are being used for research, such as DR LINK [17] CLARIT [8] and TREC [7, 29]. However, they are typically not signi cantly better than the best lexical analyzers. We will discuss various lexical analyzers throughout the paper, in relation to our work. Very little has been done so far on hierarchical indexing. In general, it has been shown that hierarchical indexing ....

Dumais, Susan, T. 1995. \Latent semantic indexing (LSI): TREC-3 report." In Overview of 3rd Text Retrieval Conference (TREC-3). Donna K. Harman, ed. 1995. Washington, D. C.: Nist Special Publication.


Probabilistic Latent Semantic Indexing - Hofmann (1999)   (115 citations)  (Correct)

....benefit to detect synonyms as well as words that refer to the same topic. In many applications this has proven to result in more robust word processing. Although LSA has been applied with remarkable success in different domains including automatic indexing (Latent Semantic Indexing, LSI) [1, 3], it has a number of deficits, mainly due to its unsatisfactory statistical foundation. The primary goal of this paper is to present a novel approach to LSA and factor analysis called Probabilistic Latent Semantic Analysis (PLSA) that has a solid statistical foundation, since it is based on ....

....Depicted are curves for direct term matching, LSI, and the best performing PLSI variant. our experiments, we have actually considered linear combinations of the original similarity score (11) weight ) and the one derived from the latent space representation (weight 1 Gamma ) as suggested in [3] (cf. 16] for a more detailed empirical investigation of linear combination schemes for information retrieval systems) 5.2 Variants of Probabilistic Latent Semantic Indexing Two different schemes to exploit PLSA for indexing have been investigated: i) as a context dependent unigram model to ....

Dumais, S. T. Latent semantic indexing (lsi): Trec-3 report. In Proceedings of the Text REtrieval Conference (TREC-3) (1995), D. Harman, Ed., pp. 219--30.


Distributed Information Retrieval - Callan (2000)   (15 citations)  (Correct)

....algorithms; they usually cannot be compared directly. Solutions include computing normalized scores (Kwok et al. 1995; Viles and French, 1995; Kirsch, 1997; Xu and Callan, 1998) estimating normalized scores (Callan et al. 1995b; Lu et al. 1996a) and merging based on unnormalized scores (Dumais, 1994). The most accurate solution is to normalize the scores of documents from different databases, either by using global corpus statistics (e.g. Kwok et al. 1995; Viles and French, 1995; Xu and Callan, 1998) or by recomputing document scores at the search client (Kirsch, 1997) However, this ....

Dumais, S. T. (1994). Latent semantic indexing (LSI) and TREC-2. In Harman, D. K., editor, The Second Text REtrieval Conference (TREC-2), pages 105--115, Gaithersburg, MD. National Institute of Standards and Technology, Special Publication 500-215.


Learning a Monolingual Language Model from a Multilingual Text .. - Ghani, Jones (2000)   (Correct)

....databases. Query based sampling is motivated by the fact that word occurrences follow a highly skewed distribution, with a few words occurring very often, and most words occurring rarely. In the light of evidence suggesting that the important vocabulary words occur frequently in a database [5, 9, 13], it is probable that these words might be acquired by sampling. Callan et al. show that if queries can be run and documents retrieved, then it is possible to sample the contents of each database in a way that will produce an accurate language model for the database. We extend query based ....

S. Dumais. Latent semantic indexing (lsi) and trec-2. In D. K. Harma, editor, The Second Text REtrieval Conference (TREC-2), pages 105--115. Gaithesburg, MD, 1994.


Invading the Fortress: How to Besiege Reinforced.. - Jeroen.. (2000)   (Correct)

....occur. By keeping the n first singular values and zeroing out the others, a semantic space can be defined in which to compare documents, keywords or combinations of both. In the TREC proceedings, very good results have been reported by Dumais, using LSI in combination with tf:idf weighing [7]. 4.3 Application in Complete Databases As we already indicated, the models described above have never been very popular in library production environments [18] although Blair [1] seems to suggest that a certain conservativeness of the managing staff may have something to do with it. And of ....

S. Dumais. Latent semantic indexing (lsi): Trec-3 report. In Overview of the Third Text Retrieval Conference (TREC-3), pages 219--230, Gaithersburg, Maryland, November 1994.


Dynamic Categorization: A Method For Decreasing Information Overload - Pratt (1999)   (1 citation)  (Correct)

....such as Zipf s law (van Rijsbergen 1979) entropy, or information theory (Koller and Sahami 1996) Another approach is to transform the space of features into a reduced set of features by finding relationships among terms in the collection. Both latent semantic indexing (Deerwester, et al. 1990, Dumais 1993) and linear least squares fit (Yang and Chute 1994b) are techniques that have been used for such feature space transformations. 42 2.0 Previous Approaches to Organizing Documents 2.4.3 Use in Matching Documents to Queries Most systems that employ clustering techniques group all documents in a ....

Dumais ST (1993). Latent Semantic Indexing (LSI) and TREC-2. The Second Text REtrieval Conference (TREC-2), :105-115.


Detection of Heterogeneities in a Multiple Text Database Environment - Meng   (4 citations)  (Correct)

....even when the same term weighting scheme is used in two local search engines, the same document may still be represented differently. As an example, consider again the case where the idf information of a term is used to compute the weight of the term in each document. It has been observed [11, 19] that the use of local idf s has the tendency to reward the rare use of a term in one local system and penalize the common use of the term in another local system. For example, consider two local systems, D1 and D2, such that D1 contains research papers in computer science and D2 contains research ....

S. Dumais. Latent Semantic Indexing (LSI) and TREC-2. TREC-2 Conference, 1994, pp. 105-115.


A Parallel Computing Approach to Creating.. - Chen, Schatz, Ng, .. (1996)   (13 citations)  (Correct)

.... of relatedness between terms using statistical co occurrence algorithms (e.g. cosine, Jaccard, Dice similarity functions) 10] 34] 32] Some algorithms, however, perform cluster analysis to further group terms of similar meanings [32] Other algorithms, such as latent semantic indexing [19], perform statistical analysis to identify important semantic descriptors. Stiles [38] was one of the early researchers who reported improved retrieval performance using a method based on term association (with collections of librarian applied subject tags) Doyle [18] further argued that the ....

S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In Text Retrieval Conference (TREC-2), pages 105--115, Bethesda, MD, November 4-6 1994.


Adaptive Filtering of Multilingual Document Streams - Oard (1997)   (Correct)

.... performance of that same technique [Duma96] This result is significant because an adaptive text filtering system based on Latent Semantic Indexing achieved a selection effectiveness nearly equal to that of the best participating systems at the third Text Retrieval Conference (TREC 3) [Duma95]. But the reported retrieval effectiveness results for CL LSI were achieved with an experiment design that matched the retrieval application to the characteristics of the parallel document collection that was used to develop the translation technique. No corpus based system that we know of has yet ....

.... a technique developed by Dumais for adaptive monolingual text filtering in which Latent Semantic Indexing (LSI) is used to develop relatively short feature vectors that describe the relevant training documents, and the mean of the relevant documents feature vectors is used as the profile [Duma95]. LSI feature vectors describing newly arrived documents are then used to rank order the newly arrived documents in order of decreasing similarity with the profile using the cosine similarity measure. LSI feature vectors are constructed by counting the frequency with which each term occurs in a ....

[Article contains additional citation context not shown here]

S. T. Dumais. Latent Semantic Indexing (LSI): TREC-3 report. In D. K. Harman, editor, Overview of the Third Text REtrieval Conference, pages 219--230. NIST, Department of Commerce, November 1995. http://www-nlpir.nist.gov/TREC/.


FastMap: A Fast Algorithm for Indexing, Data-Mining and.. - Faloutsos, Lin (1995)   (40 citations)  (Correct)

.... the K L transform suffers from two drawbacks: ffl it can not be applied at all on the distance case ffl even in the features case, it may be slow for large databases (N AE 1) with many attributes (n AE 1) The latter situation appears, e.g. in information retrieval and filtering [FD92] Dum94] where documents correspond to V dimensional vectors (V being the vocabulary size of the collection, typically in the tens of thousands) Sub section 3.3 provides such an example. 2.3 Retrieval and Clustering As mentioned before, the retrieval engine will be a Spatial Access Method (SAM) ....

Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. K. Harman, editor, The Second Text Retrieval Conference (TREC-2), pages 105--115, Gaithersburg, MD, March 1994. NIST. Special Publication 500-215.


Alignment of Spanish and English TREC Topic Descriptions - Oard (1997)   (1 citation)  (Correct)

.... based our work on a technique developed by Dumais for monolingual routing in which Latent Semantic Indexing (LSI) is used to develop relatively short feature vectors that describe the relevant training documents, and the mean of the relevant documents feature vectors is used as the routing query [2]. LSI feature vectors describing newly arrived documents are then used to rank order the newly arrived documents in order of decreasing similarity with the routing query using the cosine similarity measure. LSI feature vectors are constructed by counting the frequency with which each term occurs ....

S. T. Dumais. Latent Semantic Indexing (LSI): TREC-3 report. In Donna Harman, editor, Overview of the Third Text REtrieval Conference, pages 219--230. NIST, November 1994.


Automatic Discovery of Language Models for Text Databases - Callan, Connell, Du (1999)   (46 citations)  (Correct)

.... very often, and most words occurring rarely [16] Words in the middle of the frequency range are thought to be the most useful for distinguishing among documents within a single database [10] There is also evidence that highly frequent words may be useful for distinguishing among databases [3]. These bits of evidence suggest that the important vocabulary occurs frequently in a database, and might therefore be acquired by sampling. The resource requirements, measured in queries run and documents examined, are likely to be reasonable. The algorithm for query based sampling is simple. 1. ....

S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. K. Harman, editor, The Second Text REtrieval Conference (TREC-2), pages 105-- 115, Gaithersburg, MD, 1994. National Institute of Standards and Technology, Special Publication 500215.


FastMap: A Fast Algorithm for Indexing, Data-Mining and.. - Faloutsos, Lin (1995)   (40 citations)  (Correct)

.... the K L transform suffers from two drawbacks: ffl it can not be applied at all on the distance case ffl even in the features case, it may be slow for large databases (N AE 1) with many attributes (n AE 1) The latter situation appears, e.g. in information retrieval and filtering [16] [13], where documents correspond to V dimensional vectors (V being the vocabulary size of the collection, typically in the tens of thousands) In section 4 we provide experimental results on such a dataset. 2.3 Retrieval and Clustering As mentioned before, the retrieval engine will be a Spatial ....

Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. K. Harman, editor, The Second Text Retrieval Conference (TREC-2), pages 105--115, Gaithersburg, MD, March 1994. NIST. Special Publication 500215.


Efficiently Supporting Ad Hoc Queries in Large Datasets.. - Korn, Jagadish.. (1997)   (28 citations)  (Correct)

....Value Decomposition (SVD) of the data matrix. SVD is a popular and powerful operation, and it has been used in numerous applications, such as statistical analysis (as the driving engine behind the Principal Component Analysis [11] text retrieval under the name of Latent Semantic Indexing [4], pattern recognition and dimensionality reduction as the Karhunen Loeve (KL) transform [3] and face recognition [25] SVD is particularly useful in settings that involve least squares optimization such as in linear regression, dimensionality reduction, and matrix approximation. See [24] or [15] ....

Susan T. Dumais. Latent semantic indexing (lsi) and trec-2. In D. K. Harman, editor, The Second Text Retrieval Conference (TREC-2), pages 105--115, Gaithersburg, MD, March 1994. NIST. Special publication 500-215.


Quasi-Cubes: A space-efficient way to support approximate.. - Barbara, Sullivan (1998)   (4 citations)  (Correct)

....parameters for every plane in the cube, a process that can potentially slow down the Quasi Cube construction. 2. 3 Singular Value Decomposition Singular Value Decomposition (SVD) 20, 25] is a technique that has been used to approximate matrices, perform statistical analysis [15] text retrieval [8] and dimensional reduction [7] Recently, Korn et al. [17] published a technique to use SVD to compress large matrices into a format that supports approximate queries. The formal definition of SVD follows. Given an N Theta M matrix of reals X, we can express it using Equation 5, where U is a ....

S. Dumais. Latent semantic indexing (LSI) and trec-2. In Proceedings of the Second Text Retrieval Conference, Gaithersburg, Maryland, March 1994.


Experimental Investigation of High Performance Cognitive and.. - Douglas Oard   (Correct)

....was the first to apply LSI to the text filtering problem [4] He tried three cognitive filtering techniques on a small USENET news collection: closest match, average match, and clustering. Dumais has evaluated Foltz s average match technique in the first three Text REtrieval Conferences (TREC) [3]. In the average match technique, the vector representation of the information need is built as the mean of the concept vectors for relevant documents in the training set. Dumais reports that the performance of the average match technique exceeds that of a query based LSI implementation, and the ....

S. T. Dumais. Latent semantic indexing (LSI): TREC-3 report. In D. Harman, editor, Overview of the Third Text REtrieval Conference, pages 219--230. NIST, Nov. 1994. http://potomac.ncsl.nist.gov/TREC/.


Projections for Efficient Document Clustering - Schütze, Silverstein (1997)   (2 citations)  (Correct)

....truncation methods TF c to avoid confusing these methods with the LSI methods. We test three instantiations of LSI d, LSI 150, LSI 50, and LSI 20. The constant 150 is typical for the range of truncation constants in which LSI is competitive with or superior to term based similarity search [9, 11, 12]. We test the two lower truncation constants, LSI 20 and LSI 50, to explore how a substantial reduction in the constant affects time efficiency and clustering effectiveness. 3 Experimental Design We would like to compare projection techniques both according to time efficiency and according to ....

Susan T. Dumais. Latent semantic indexing (lsi): Trec3 report. pages 219--230, 1995. In [15].


A Comparison of Classifiers and Document Representations.. - Schütze, Hull, Pedersen (1995)   (11 citations)  (Correct)

....modeled by learning algorithms. Furthermore, these factors are capturing the structure of the document collection as a whole and are not tuned for particular queries. Previous work has shown that LSI is more successfulwhen applied to a local region on a query specific basis [19] Dumais [9] also applies LSI to the routing task, but uses the judged documents for all the queries to generate her reduced representation, a method that corresponds roughly to taking the union of the local LSI regions for each query. We compute a separate LSI representation for each query using only the ....

Susan T. Dumais. Latent semantic indexing (lsi) and trec-2. In The Second Text REtrieval Conference (TREC-2), pages 105-- 115, 1993.


Image Retrieval Using Image Context Vectors - Gallant, Johnston (1994)   (1 citation)  (Correct)

....vectors weighted sum of pair ICVs Table 1: Analogies Between Document Processing Image Processing. Another motivating factor is the success of context vector approaches and Latent Semantic Indexing approaches in document retrieval [Gallant, 1991b; Gallant et al., 1993; Deerwester et al., 1990; Dumais, 1993], and our long standing interest in generalizing the context vector approach to image data. II. The Image Context Vector Approach The key to our approach is the conversion of images to easily searchable Image Context Vector representations, as illustrated in Figure 1. Pairs of Features ....

Dumais S (1993) Latent semantic indexing (LSI) and TREC-2. The Second Text REtrieval Conference: Bethesda, MD, sponsored by NIST (to appear).


How Well Can Passage Meaning be Derived without.. - Landauer, Laham.. (1997)   (13 citations)  (Correct)

.... Its first success was in improving bag of words IR by allowing queries to correctly match documents of similar meaning with which they shared no words and to reject documents of the wrong meaning that did contain some query words (see Deerwester, Dumais, Furnas, Landauer Harshman, 1990; Dumais, 1991, 1994). More recent applications have addressed LSA s ability to represent word and passage meaning more directly. For example, Landauer and Dumais (1996, 1997) found that after training on a student encyclopedia (or, more recently, a corpus of newspaper text) LSA chose the same answers on a ....

Dumais, S. T. (1994). Latent semantic indexing (LSI) and TREC-2. In D. Harman (Ed.), National Institute of Standards and Technology Text Retrieval Conference, NIST special publication.


Using Relevance to Train a Linear Mixture of Experts - Vogt, Cottrell, Belew, Bartell (1997)   (6 citations)  (Correct)

....test was indexed using SMART and both bnn and ltc weightings. Furthermore, the adhoc queries were indexed twice, once using the all of the parsed text of the topics, and once using only the DESC field (the so called short topics ) The LSI expert mimicked the approach used by Dumais in TREC 3 [Dumais, 1995]. Specifically, since doing a Singular Value Decomposition on the full 104; 113 Theta 74; 520 term by document matrix from the ltc expert would have taken much too long, it was subsampled. Only those terms occurring in 5 or more documents were used, and only a randomly selected subset of about ....

.... representation for all the WSJ documents, and the corresponding representations for the queries and FBIS documents were obtained by first removing any terms that were not in the reduced WSJ vocabulary, and then projecting the resulting 26,395 vector down to 300 dimensions as described in [Dumais, 1995]. Once document and query vectors were fixed, relevance scores were computed using the standard inner product rule. 2.3 Combining Scores Bartell showed that one very effective measure of how well an IR system performs is one which compares the rank ordering produced by the system to that ....

Dumais, S. T. (1995). Latent semantic indexing (LSI): TREC-3 report. In [Harman, 1995a].


Noise Reduction in a Statistical Approach to Text Categorization - Yang (1995)   (16 citations)  (Correct)

....over baseline text matching, since there would be no reason to use LSI otherwise, given that the additional cost of the SVD computation is far more expensive than baseline text matching. Evaluation results, however, have shown no reliable improvement of LSI over baseline text matching [20] 21] [22]. My hypothesis about truncated SVD is different from the hypothesis in LSI. I do not claim an improvement of synonym representation in a document matrix by using suchan approach. Whether such a hypothesis is true or false is not crucial for LLSF because the document to categories mapping is based ....

Dumais S. (1994) Latent Semantic Indexing (LSI) and TREC2. In: DK Harman, Ed. The Second Text REtrieval Conference (TREC-2):105--116.


Alignment of Spanish and English TREC Topic Descriptions - Douglas Oard (1997)   (1 citation)  (Correct)

.... our work on a technique developed by Dumais for monolingual routing in which Latent Semantic Indexing (LSI) is used to develop relatively short feature vectors that describe the relevant training documents, and the mean of the relevant documents feature vectors is used as the routing query [2]. LSI feature vectors describing newly arrived documents are then used to rank order the newly arrived documents in order of decreasing similarity with the routing query using the cosine similarity measure. LSI feature vectors are constructed by counting the frequency with which each term occurs ....

S. T. Dumais. Latent Semantic Indexing (LSI): TREC-3 report. In Donna Harman, editor, Overview of the Third Text REtrieval Conference, pages 219--230. NIST, November 1994. http://potomac.ncsl.nist.gov/ TREC/.


Cross-Language Text Retrieval Research in the USA - Oard (1997)   (4 citations)  (Correct)

.... performance of that same technique [9] This result is particularly significant because a monolingual text retrieval system based on Latent Semantic Indexing has achieved effectiveness measures nearly equal to those of the best participating systems at the third Text Retrieval Conference [8]. In CL LSI a set of representative bilingual documents are first used to form a training collection by adjoining a translation of each document to the document itself. A rank revealing matrix decomposition (the singular value decomposition) is then used to compute a mapping from sparse term based ....

S. T. Dumais. Latent Semantic Indexing (LSI): TREC-3 report. In Donna Harman, editor, Overview of the Third Text REtrieval Conference, pages 219--230. NIST, November 1994. http://www-nlpir.nist.gov/TREC/.


The Maximum-Margin Approach to Learning Text Classifiers -.. - Joachims (2000)   (17 citations)  (Correct)

No context found.

Dumais, S. (1994). Latent semantic indexing (lsi) and trec-2. Technical Report TM-ARH-023878, Bellcore.


Latent Semantic Analysis for German literature - Investigation Preslav Nakov (2001)   (Correct)

No context found.

Dumais, S. T. (1994) Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215 , (pp. 105-116).


The Infocious Web Search Engine: Improving Web Searching.. - Ntoulas, Chao, Cho (2005)   (Correct)

No context found.

S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In The Second Text Retrieval Conference (TREC-2), 1994.


A Framework for Understanding LSI Performance - April Kontostathis And (2003)   (Correct)

No context found.

Dumais, S.T. (1994). Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215 , pp. 105116.


Analysis of the values in the LSI Term-Term Matrix - Mill, Kontostathis   (Correct)

No context found.

Dumais, S. T. 1994. Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215, pp. 105-116


Assessing the Impact of Sparsification on LSI Performance - Kontostathis, Pottenger.. (2004)   (Correct)

No context found.

Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. Harman, editor, The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 105--116, 1994.


Identification of Critical Values in Latent Semantic.. - Kontostathis, Pottenger, ..   (Correct)

No context found.

Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. Harman, editor, The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 105--116, 1994.


Assessing the Impact of Sparsification on LSI Performance - Kontostathis, Pottenger.. (2004)   (Correct)

No context found.

Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. Harman, editor, The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 105--116, 1994.


Identification of Critical Values in Latent Semantic.. - Kontostathis, Pottenger, .. (2005)   (Correct)

No context found.

Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. Harman, editor, The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 105--116, 1994.


Detecting Patterns in the LSI Term-Term Matrix - Kontostathis, Pottenger (2002)   (1 citation)  (Correct)

No context found.

Dumais, S.T. Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215, pp. 105-116. 1994.


Assessing the Impact of Sparsification on LSI Performance - Kontostathis, Pottenger.. (2004)   (Correct)

No context found.

Dumais, S. T. 1994. Latent Semantic Indexing (LSI) and TREC-2. In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215 , pp. 105-116


Automatic Indexing: An Approach Using an Index Term Corpus and.. - Lahtinen (2000)   (3 citations)  (Correct)

No context found.

Dumais, Susan T. 1995. Latent Semantic Indexing (LSI): TREC-3 Report. In Harman, Donna K. (editor). Overview of the Third Text REtrieval Conference (TREC-3).NISTSpe- cial Publication 500-225, National Institute of Standards and Technology, Gaithersburg, MD, (http://trec.nist.gov/pubs.html), pp.219-230.


Building Efficient and Effective Metasearch Engines - Meng, Yu, Liu (2002)   (11 citations)  (Correct)

No context found.

S. Dumais. Latent Semantic Indexing (LSI) and TREC-2. TREC-2 Conference, Gaithersburg, 1994, pp. 105-115.


Towards a Highly-Scalable Metasearch Engine - Meng, Yu, Wu   (Correct)

No context found.

S. Dumais. Latent Semantic Indexing (LSI) and TREC-2. TREC-2 Conference, 1994, pp. 105-115.


Challenges and Solutions for Building an Efficient and . . . - Meng, al. (1999)   (Correct)

No context found.

S. Dumais. Latent Semantic Indexing (LSI) and TREC-2. TREC-2 Conference, Gaithersburg, 1994, pp. 105-115.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC