8 citations found. Retrieving documents...
Dumais, S. T., "Latent Semantic Indexing (LSI) and TREC-2," in National Institute of Standards and Technology Text Retrieval Conference, D. Harman, Ed.: NIST, 1994.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Identification of High-Level Concept Clones in Source Code - Marcus, Maletic (2001)   (2 citations)  (Correct)

....engineering, or program understanding. This paper is not concerned in detail about removal of the identified clones. Ideally, these high level concept clones would be combined in one or two modules or classes during reengineering. Existing research describes several methods for clone removal [4, 5, 12]. 6. Combining multiple detection methods We are currently investigating the combination of methods through two different approaches. One approach is to apply two or more methods to the same code and then merge the results. Three methods that are based on structural information [5, 20, 31] ....

Dumais, S. T., "Latent Semantic Indexing (LSI) and TREC2 ", in Proceedings of The Second Text Retrieval Conference (TREC-2), March 1994, pp. 105-115.


Detecting Patterns in the LSI Term-Term Matrix - Kontostathis, Pottenger (2002)   (1 citation)  (Correct)

....correlations between the values in the term term matrix and the higher order co occurrence implicit in the data. This work also presents preliminary work toward detecting patterns in the data. LSI has been applied to a wide variety of learning tasks, such as classification [21] and filtering [8, 9]. LSI is a vector space approach for modeling documents, and many have claimed that the technique brings out the latent semantics in a collection of documents [5,7] LSI is based on well known mathematical technique called Singular Value Decomposition (SVD) The algebraic foundation for Latent ....

....co occurrence can be extended to third, fourth, or n order co occurrence. This work provides the theoretical foundation for understanding the use of limited transitivity in LSI. Eventually, the patterns we are detecting will be used to approximate the SVD algorithm, which is resource intensive [7,8,9], at much lower cost. In [13] we describe an unsupervised learning algorithm that develops clusters of terms, by applying an equivalence relation to the LSI term term matrix. The ultimate goal of the current line of work is a theoretically sound, effective and efficient unsupervised clustering ....

Dumais, S. T. (1994), "Latent Semantic Indexing (LSI) and TREC-2." In: D. Harman (Ed.), The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology Special Publication 500-215 , pp. 105-116


Supporting Program Comprehension Using Semantic and.. - Maletic, Marcus (2001)   (3 citations)  (Correct)

....the word by context matrix is very large and (quite often) sparse. SVD reduces the number of dimensions without great loss of descriptiveness. Single value decomposition is the underlying operation in a number of applications including statistical principal component analysis [22] text retrieval [6, 11], pattern recognition and dimensionality reduction [10] and natural language understanding [25] For complete details of Latent Semantic Indexing see [9] The resulting profile is that each word is represented as a vector in a d dimensional space. Performance depends strongly on the choice of ....

Dumais, S. T., "Latent Semantic Indexing (LSI) and TREC2 ", in Proceedings of The Second Text Retrieval Conference (TREC-2), March 1994, pp. 105-115.


Distributed Processing of Similarity Queries - Papadopoulos, Manolopoulos (2001)   (Correct)

.... transformed (e.g. by means of the Fourier transform) and then some components (these with the highest energy) are used to index the underlying set [2, 10] documents in a Text Database can be represented as vectors (e.g. using the Latent Semantic Indexing technique) in a high dimensional space [8]; records in traditional alphanumeric databases can be viewed as points in a high dimensional space, assuming one dimension for each record attribute [16] These applications require databases that are huge in volume. Often, multiple computer systems are used in order to support efficient and ....

S.T. Dumais, "Latent semantic indexing (LSI) and TREC-2," in The 2nd Text Retrieval Conference, D.K. Harman (Ed.), MD, March 1994, pp. 105--115.


Fax: An Alternative to SGML - Church, Gale, Helfman, Lewis   (Correct)

....and the system would fax back some 1. The OCR errors slow the indexing process considerably since they make the vocabulary too large to fit in main memory. Our data has a huge vocabulary (3 million words) most of which are OCR errors. By comparison, the TREC text collection (Dumais, 1994) has a much smaller vocabulary (1 million words) The difference in vocabulary sizes is especially significant given that TREC is considerably larger (2 gigabytes) than our OCR output (1 gigabyte) relevant documents. In this way, a user could call the home office from any public fax machine ....

Dumais, S. (1994) "Latent Semantic Indexing (LSI) and TREC-2," in Harman, D. (ed.) The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology, Gaithersburg, MD, USA.


Support For Software Maintenance Using Latent Semantic Analysis - Maletic, Marcus (2000)   (Correct)

....as a vector in this space. The similarity of any two words, any two passages, or any word and any text passage, are computed by measures on their vectors. Often the cosine of the contained angle between the vectors in the semantic space is used as the degree of qualitative similarity of meaning [3]. The length of vectors is also useful as a measure. One of the criticisms of this method, when applied to natural language texts is that it does not make use of word order, syntactic relations, or morphology. But very good representations and results are derived without this information [1] ....

Dumais, S. T., "Latent Semantic Indexing (LSI) and TREC-2," in Proceedings of The Second Text Retrieval Conference (TREC-2), March 1994, pp. 105-115.


Using Latent Semantic Analysis to Identify Similarities in.. - Maletic, Marcus (2000)   (Correct)

....the word by context matrix is very large and (quite often) sparse. SVD reduces the number of dimensions without great loss of descriptiveness. Single value decomposition is the underlying operation in a number of applications including statistical principal component analysis [10] text retrieval [2, 6], pattern recognition and dimensionality reduction [5] and natural language understanding [11, 12] Latent Semantic Analysis is comprised of four steps [4, 12] 1. A large body of text is represented as an occurrence matrix (i j) in which rows stand for individual word types, columns for ....

.... j j j i j i j j i j i j freq freq freq freq freq 1 1 , 1 , log ) 1 log( a measure of the first order association of a word and its context. 3. The matrix is then subject to Singular Value Decomposition (SVD) [6, 10, 17, 20]: ij] ik] kk] jk] where [ij] is the occurrence matrix, ik] and [jk] have orthonormal columns, kk] is a diagonal matrix of singular value where k max(i,j) In SVD, a rectangular matrix is decomposed into the product of three other matrices. One component matrix describes the original ....

Dumais, S. T., "Latent Semantic Indexing (LSI) and TREC-2," in Proceedings of The 2 nd Text Retrieval Conference (TREC-2), March 1994, pp. 105-115.


Using Latent Semantic Analysis To Aid Speech Recognition And .. - Lee Mccauley The   (Correct)

No context found.

Dumais, S. T., "Latent Semantic Indexing (LSI) and TREC-2," in National Institute of Standards and Technology Text Retrieval Conference, D. Harman, Ed.: NIST, 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC