Results 1 - 10
of
36
Machine Learning in Automated Text Categorization
- ACM Computing Surveys
, 2002
"... The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this p ..."
Abstract
-
Cited by 839 (13 self)
- Add to MetaCart
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.
Information retrieval on the Web
- ACM Computing Surveys
, 2000
"... In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited
Searching and browsing collections of structural information
- In IEEE Advances in Digital Libraries (ADL’2000
, 1997
"... This paper proposes a new approach to querying collections of structured textual information such as SGML/XML documents. Knowledge about the structure of documents is an additional resource that should be exploited during retrieval since the semantics of the different textual objects can be used to ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This paper proposes a new approach to querying collections of structured textual information such as SGML/XML documents. Knowledge about the structure of documents is an additional resource that should be exploited during retrieval since the semantics of the different textual objects can be used to specify an information need much more precisely. However, the traditional probabilistic retrieval model lacks the ability to handle structural information. We define a new retrieval function based on the probabilistic model which overcomes this drawback. The presented query language allows the assignment of structural roles to individual terms. The efficient evaluation of queries in this framework requires appropriate index structures. We design text and structure indexes and show how their information is combined during evaluation. The implementation supports additional functionalities such as a table of contents for browsing. First evaluation results show the feasibility of the approach on collections of unstructured documents. 1
Supporting Component-Based Software Development with Active Component Repository Systems
, 2001
"... ..."
Machine Learning in Automated Text Categorisation
, 1999
"... The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to the early ’60s. Until the late ’80s, the most effective approach to the problem seemed to be that of manually building automatic classifiers by means of knowledgeengineering ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to the early ’60s. Until the late ’80s, the most effective approach to the problem seemed to be that of manually building automatic classifiers by means of knowledgeengineering techniques, i.e. manually defining a set of rules encoding expert knowledge on how to classify documents under a given set of categories. In the ’90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest, prompted by which the machine learning paradigm to automatic classifier construction has emerged and definitely superseded the knowledge-engineering approach. Within the machine learning paradigm, a general inductive process (called the learner) automatically builds a classifier (also called the rule, or the hypothesis) by “learning”, from a set of previously classified documents, the characteristics of one or more categories. The advantages of this approach are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this survey we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues pertaining to document indexing, classifier construction, and classifier evaluation, will be discussed in detail. A final section will be devoted to the techniques that have specifically been devised for an emerging application such as the automatic classification of Web pages into “Yahoo!-like ” hierarchically structured sets of categories.
Soft Information retrieval: applications of fuzzy set theory and neural networks
- Neuro-fuzzy Techniques for Intelligent Information Systems
, 1999
"... Abstract. This paper presents a short survey of fuzzy and neural approaches to Information Retrieval. The goal of such approaches is to de ne exible Information Retrieval Systems able to deal with the inherent vagueness and uncertainty of the retrieval process. In this survey we address if and how s ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Abstract. This paper presents a short survey of fuzzy and neural approaches to Information Retrieval. The goal of such approaches is to de ne exible Information Retrieval Systems able to deal with the inherent vagueness and uncertainty of the retrieval process. In this survey we address if and how some approaches met their goal. 1.
The BNR model: foundations and performance of a Bayesian network L.M. de Campos et al
- Applied Soft Computing
, 2004
"... network-based retrieval model ..."
Information Retrieval and Situation Theory
- SIGIR Forum
, 1996
"... This paper is an essay to convince the reader that Situation Theory presents many characteristics that are both adequate and appropriate for the study of IR. Here we concentrate on the ones we have experience with: ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This paper is an essay to convince the reader that Situation Theory presents many characteristics that are both adequate and appropriate for the study of IR. Here we concentrate on the ones we have experience with:
Information Retrieval Methods For Multimedia Objects
, 2000
"... We describe five major concepts that are essential for multimedia retrieval: uncertain inference addresses vagueness of queries and imprecision of content representations. Predicate logic allows for dealing with spatial and temporal relationships. The document structure has to be considered in order ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We describe five major concepts that are essential for multimedia retrieval: uncertain inference addresses vagueness of queries and imprecision of content representations. Predicate logic allows for dealing with spatial and temporal relationships. The document structure has to be considered in order to retrieve the most relevant part of a document in response to a query. Whereas fact retrieval employs an open world assumption, content-based retrieval should be based on an open world assumption. In order to perform inferences based on the content of multimedia objects, inconsistencies have to be dealt with. Based on these concepts, we present DOLORES, a logic-based multimedia retrieval system with a multilayered architecture. Below the top-level presentation layer, the semantic layer uses a conceptual model for structured documents which is transformed into a probabilistic object-oriented logic (POOL) supporting aggregated objects, di#erent kinds of propositions (terms, classifications and attributes) and even rules as being contained in objects. This four-valued logic is translated into probabilistic Datalog which is interpreted by the HySpirit inference engine. Multimedia objects are stored either in a relational database management system or an information retrieval engine.

