Results 1 -
7 of
7
An Association Thesaurus for Information Retrieval
- In RIAO 94 Conference Proceedings
, 1994
"... Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent benefits for retrieval performance, and it is difficult to construct a thesaurus automatically for large text databases. In this paper, an approach, called PhraseFinder ..."
Abstract
-
Cited by 132 (10 self)
- Add to MetaCart
Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent benefits for retrieval performance, and it is difficult to construct a thesaurus automatically for large text databases. In this paper, an approach, called PhraseFinder, is proposed to construct collection-dependent association thesauri automatically using large full-text document collections. The association thesaurus can be accessed through natural language queries in INQUERY, an information retrieval system based on the probabilistic inference network. Experiments are conducted in INQUERY to evaluate different types of association thesauri, and thesauri constructed for a variety of collections. 1 Introduction A thesaurus is a set of items ( phrases or words ) plus a set of relations between these items. Although thesauri are commonly used in both commercial and experimental IR systems, experiments have shown inconsistent effects on retrieval effectiven...
Probabilistic Models in Information Retrieval
- The Computer Journal
, 1992
"... In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR alon ..."
Abstract
-
Cited by 87 (4 self)
- Add to MetaCart
In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR along with the corresponding event space clarify the interpretation of the probabilistic parameters involved. For the estimation of these parameters, three different learning strategies are distinguished, namely query-related, document-related and description-related learning. As a representative for each of these strategies, a specific model is described. A new approach regards IR as uncertain inference; here, imaging is used as a new technique for estimating the probabilistic parameters, and probabilistic inference networks support more complex forms of inference. Finally, the more general problems of parameter estimation, query expansion and the development of models for advanced document representations are discussed.
A probabilistic framework for vague queries and imprecise information in databases
- PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES
, 1990
"... A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objec ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the system can be further improved. For specifying different kinds of conditions in vague queries, the notion of vague pred-icates is introduced. Based on the underlying probabilistic model, also imprecise or missing attribute values can be treated easily. In addition, the corresponding formulas can be applied in combination with standard predicates (from two-valued logic), thus extending standard database systems for coping with missing or imprecise data.
ACIRD: Intelligent Internet Documents Organization and Retrieval
- IEEE Transactions on Knowledge and Data Engineering
, 2002
"... This paper presents an intelligent Internet information system, Automatic Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier and two-phase s ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper presents an intelligent Internet information system, Automatic Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier and two-phase search engine. The knowledge acquisition process of ACIRD automatically learns classification knowledge from classified Internet documents. The document classifier applies learned classification knowledge to classify newly collected Internet documents into one or more classes. Experimental results indicate that ACIRD performs as well or better than human experts in both knowledge acquisition and document classification. By using the learned classification knowledge and the given class lattice, the ACIRD two-phase search engine responds to user queries with hierarchically structured navigable results (instead of a conventional flat ranked document list), which greatly aids users in locating information from numerous, diversified Internet documents.
Introduction and Overview
- Journal of the American Society for Information Science
, 1993
"... This paper provides a partial overview of practice, problems, proposals, and plans relating to the handling of 'composite documents ' by extended information storage and retrieval sys-tems. It aims to describe such documents, to explore various areas of application for them, to portray a number of r ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper provides a partial overview of practice, problems, proposals, and plans relating to the handling of 'composite documents ' by extended information storage and retrieval sys-tems. It aims to describe such documents, to explore various areas of application for them, to portray a number of representative test collec-tions under development, to survey related stu-dies as well as previous work of this author, and to examine plans for further investigation. 1.1. Background Experimental information retrieval (IR) sys-tems, some dating back to the sixties, have demonstrated the viability of fully automatic document storage and retrieval methodologies with. small to medium size bibliographic collec-tions [72]. Many of these experimental systems utilize the vector space model in which each important term (such as a word stem) identifies a different dimension in a space, so that matrix methods and vector operations can be defined on queries and documents. Statistical techniques have been very effective, and probabilistic enhancements have given additional improve-ments [84]. However, the basic vector space model is oriented towards recording the essential information in the text of a title/abstract combi-This work was supported by the NSF under grant IST-8418877 and by Virginia's Center for Innovative Technolo-gy under grant INF-85-016.
Generalized Vector Space Model In Information Retrieval
"... Abstract. In information retrieval, it is common to model index terms and documents as vectore in a suitably defined vector space. The main di]ficulty with this approach is that the explicit repreeentation of term vectors is not known a priorL For th ~ mason, the vector space model adopted by Salton ..."
Abstract
- Add to MetaCart
Abstract. In information retrieval, it is common to model index terms and documents as vectore in a suitably defined vector space. The main di]ficulty with this approach is that the explicit repreeentation of term vectors is not known a priorL For th ~ mason, the vector space model adopted by Salton for the SMART system treats the terms as a set of orthogonal vectom In such a model it is often necessary to adopt a separate, corrective procedure to take into account the correlations between terms. In this paper, we propose a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme. We also demonstrate how such correlations can be included with minimal modification in the existing vector based information retrieval systems. The preliminary experimental. results obtained from the new model are very encouraging. 1.

