Results 1 -
8 of
8
Indexing by latent semantic analysis
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1990
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract
-
Cited by 2168 (30 self)
- Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 or-thogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are re-turned. initial tests find this completely automatic method for retrieval to be promising.
Enhancing Performance in Latent Semantic Indexing (LSI) Retrieval
, 1992
"... We have previously described an extension of the vector retrieval method called "Latent Semantic Indexing" (LSI) (Deerwester, et al., 1990; Dumais, et al., 1988; Furnas, et al., 1988). The LSI approach partially overcomes the problem of variability in human word choice by automatically organizing ob ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
We have previously described an extension of the vector retrieval method called "Latent Semantic Indexing" (LSI) (Deerwester, et al., 1990; Dumais, et al., 1988; Furnas, et al., 1988). The LSI approach partially overcomes the problem of variability in human word choice by automatically organizing objects into a "semantic" structure more appropriate for information retrieval. This is done by modeling the implicit higher-order structure in the association of terms with objects. Initial tests find this completely automatic method to be a promising way to improve users' access to many kinds of textual materials or to objects for which textual descriptions are available. This paper describes some enhancements to the basic LSI method, including differential term weighting and relevance feedback. Appropriate term weighting improves performance by an average of 40%, and feedback based on 3 relevant documents improves performance by an average of 67%. September 1, 1992 D R A F T Dumais - 2 1....
Neural Networks and Information Extraction in Astronomical Information Retrieval
- Vistas in Astronomy
, 1996
"... Introduction We firstly examine a Kohonen self-organizing feature map (SOFM) interface to large document collections. Low-dimensional representations of documents or other related objects have been used for a long time. Factor space (Ossorio, 1966) and latent semantic indexing (Deerwester et al., 19 ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Introduction We firstly examine a Kohonen self-organizing feature map (SOFM) interface to large document collections. Low-dimensional representations of documents or other related objects have been used for a long time. Factor space (Ossorio, 1966) and latent semantic indexing (Deerwester et al., 1990) are two examples. The Kohonen map is a method with similar objectives. It offers the advantage of presenting results in a display-friendly manner (a point made in Lin et al., 1991, and Murtagh and Hern'andez-Pajares, 1995). Secondly, we look at perspectives for information extraction in astronomy. Astronomy offers a relatively well-demarcated domain and set of themes which have already been comprehensively indexed and cross-linked albeit manually. Compared to other domains (Lewis and Jones, 1995), astronomical texts offer particular challenges but user needs are often more clearcut. Thus astronomical object names, wavelength ranges, instruments and observing sites are Prep
Information Retrieval on the Web: Selected Topics
- IBM research, Tokyo Research Laboratory, IBM
, 1999
"... In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. Although the numerical gures vary, the overall trends cited by the sources are consistent and point to exponential growth during the coming decade. And Internet users are increasingly using search engines and search services to nd speci c information of interest. However, users are not satis ed with the performance of the current generation of search engines; the slow speed of retrieval, communication delays, and poor quality of retrieved results (e.g., noise and broken links) are commonly cited problems. The main body of our paper focuses on linear algebraic models and techniques for solving these problems. keywords: clustering, indexing, information retrieval, Internet, late...
Information Retrieval and Ranking on the Web: Benchmarking Studies II
- IBM TRL Research Report, RT0298
, 1999
"... The exponential growth of information available on the World Wide Web has been documented in numerous studies. The studies also indicate that Internet users are turning to search engines and search services in increasing numbers to find the information they are seeking, but they are not necessarily ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The exponential growth of information available on the World Wide Web has been documented in numerous studies. The studies also indicate that Internet users are turning to search engines and search services in increasing numbers to find the information they are seeking, but they are not necessarily satisfied with their performance. Specific problems which have been cited in user surveys include the speed of transmission and retrieval of information and the format for presenting the results from searches. In this report, we describe some of the components of a new Web-based search and retrieval system prototype, which is part of a larger information outlining and visualization system for Web documents. Our system is based on a modified version of latent semantic indexing, the output from which is used to rank the relevance of Web pages for an input query. We compare the speed of retrieval and ranking when two different methods (subspace iteration and Lanczos followed by Sturm sequ...
Multivariate Data Analysis Applied to Bibliographical Information Retrieval: SIMBAD Quality-Control
- Vistas in Astronomy
, 1995
"... this paper, we describe the second method. Preprint submitted to Elsevier Science 5 July 2 Parametrization of the Bibliographical Information 2.1 Method The articles in major astronomical journals are an important source of useful information to check the coherence of the SIMBAD data. We wanted to ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
this paper, we describe the second method. Preprint submitted to Elsevier Science 5 July 2 Parametrization of the Bibliographical Information 2.1 Method The articles in major astronomical journals are an important source of useful information to check the coherence of the SIMBAD data. We wanted to show that the bibliographical reference information concerning one object is quantifiable and can be compared to other information in the database to detect anomalies. One of the possibilities to analyse the content of a document is to use multivariate data analysis. Factor-space (Ossorio, 1965) is an n-dimensional relevancy space. This space is described by n axes representing a set of n subject matter headings. The words and phrases can be used to scale the axes, and documents are then a vector average of the terms within them. These relevancy scores may be obtained either directly, using human judgement (Ossorio, 1965) or via automated evaluation of classified collections of documents using statistical analysis (Kurtz, 1992). Presently, we simplify our Factor-space by using keywords instead of subject matter headings (Lesteven, 1994). Our research is presently based on SIMBAD and on the NASA-STI bibliographic database. In the NASA-STI database, references are described by the title, the authors, the journal, a list of keywords, the publication year, an abstract and other information. For each category of SIMBAD data (object type, spectral type, IRAS flux, : : :) we can build a specialized "bibliographical space" where the variables are the keywords associated with the selected references and the individuals are the different values of the checked data. To extract the information, we used a Principal Component Analysis (PCA). PCA is the simplest of the multivariate methods...
Free Text Information Retrieval: An Assessment of Publicly Available Unix-Based Systems
, 1994
"... Introduction To support information retrieval in astronomy, we believe that it is advantageous if the retrieval engine is astronomically-knowledgeable. In other words, the user ought not to have to fight against the limitations of the indexing or querying system. Consider, for example, a retrieval ..."
Abstract
- Add to MetaCart
Introduction To support information retrieval in astronomy, we believe that it is advantageous if the retrieval engine is astronomically-knowledgeable. In other words, the user ought not to have to fight against the limitations of the indexing or querying system. Consider, for example, a retrieval system which rejects a partially numeric object name (e.g. 47 Tuc) on the grounds of supporting character-text only. Organisationlly, we see this work as: (i) selecting important desiderata, and (ii) finding tools which support these requirements. As such desiderata, we have chosen the following issues: (i) Is there a comprehensive astronomical vocabulary available, which ought to be in large measure supported by a retrieval system? We use the IAU Thesaurus, towards this end. (ii) The next phase will involve support of astronomical object names: what tools are available to support free text querying based on object names? (iii) A third phase will seek to go beyond that: to support add
Indexing by Latent Semantic Analysis
- Journal of the American Society for Information Science
, 2001
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract
- Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising. Deerwester - 1 - 1.

