This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
331.9 An Evaluation of Statistical Approaches to Text Categorization - Yang (1997)(Correct)
This paper is a comparative study of text categorization methods. Fourteen methods are investigated, based on
previously published results and newly obtained results from additional experiments. Corpu... / decision trees and that text classification has a number of
302.1 Hierarchically classifying documents using very few words - Koller, Sahami (1997)(Correct)
The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. Existing classification schemes which igno... / are often inadequate in text classification where the there is a large
271.4 Statistical Pattern Recognition: A Review - Jain, Duin, Mao (2000)(Correct)
this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number 110296. unknown Statistical Pattern Recognition: A Review
Anil K. Jain, , Robert P.W. Duin, and Jianchang Mao,... / patterns document classification efficiently searching
257.1 Enhanced hypertext categorization using hyperlinks - Chakrabarti, Dom, Indyk (1998)(Correct)
A major challenge in indexing unstructured hypertext databases
is to automatically extract meta-data that enables
structured search using topic taxonomies, circumvents keyword
ambiguity, and improves ... / addressed in the extensive text classification literature. Links clearly br at c whose goal is to route documents into those subtrees that
239.9 Information Extraction from HTML: Application of a General Machine.. - Freitag (1998)(Correct)
Because the World Wide Web consists primarily of
text, information extraction is central to any effort that
would use the Web as a resource for knowledge discovery.
We show how information extraction ... / We regard IE as a kind of text classification which has strong br problem of document classification but also presents
208.6 Training Algorithms for Linear Text Classifiers - Lewis, Schapire, Callan, Papka (1996)(Correct)
Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classifiers. We propose that two machine learning algorithms, the Widrow-Hoff and EG algorithms, be used i... / routing and other text classification tasks operate similarly. br TREC Document Routing topics - varies
199.9 Learning to Classify Text from Labeled and Unlabeled Documents - Nigam (1998)(Correct)
This paper shows that the accuracy of learned text classifiers can be improved by augmenting
a small number of labeled training documents with a large pool of unlabeled documents. This is
significa... / because in many important text classification problems obtaining br Li and Kenji Yamanishi. Document classification using a finite mixture
185.1 A Sequential Algorithm for Training Text Classifiers - Lewis, Gale (1994)(Correct)
The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully ... / Introduction Text classification is the automated grouping
178.7 Feature Subset Selection Using A Genetic Algorithm - Yang, Honavar (1997)(Correct)
Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is... / user interest profiles for text classification Yang et al. a and br problems as well as a document classification task. Section .
177.1 Boosting and Rocchio Applied to Text Filtering - Schapire, Singer, Singhal (1998)(Correct)
We discuss two learning algorithms for text filtering: modified
Rocchio and a boosting algorithm called AdaBoost. We show
how both algorithms can be adapted to maximize any general
utility matrix that... / filtering is just binary text classification into the categories br feedbackand more recently for document routing as a comparison
157.1 Concept Indexing - A Fast Dimensionality Reduction Algorithm with.. - Karypis, Han (2000)(Correct)
In recent years, we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. This has led to an increased i... / clustering of words for text classification. In SIGIR- . br are primarily used for document classification and for improving the
148.9 An Adaptive Web Page Recommendation Service - Balabanovic (1997)(Correct)
An adaptive recommendation service seeks to adapt
to its users, providing increasingly personalized recommendations
over time. In this paper we introduce
the "Fab" adaptive web page recommendation ser... / relevance work on document classification which considers br . Document Classification Document classification lies at
148.5 Distributional Clustering of Words for Text Classification - Baker, McCallum (1998)(Correct)
This paper describes the application of Distributional Clustering [20] to document classification. This approach clusters words into groups based on the distribution of class labels associated with ea... / Clustering of Words for Text Classification L. Douglas Baker yz br Clustering to document classification. This approach clusters
148.5 Making Large-Scale Support Vector Machine Learning Practical - Joachims (1998)(Correct)
Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understo... / on two benchmark tasks a text classification task and an image
120.0 A Study of Approaches to Hypertext Categorization - Yang, Slattery, Ghani (2002)(Correct)
Hypertext poses new research challenges for text classification. Hyperlinks, HTML tags, category labels distributed over linked documents, and meta data extracted from related web sites all provide ... / new research challenges for text classification. Hyperlinks HTML tags
109.0 Error-Correcting Output Coding for Text Classification - Berger (1999)(Correct)
This paper applies error-correcting output coding (ECOC) to the task of document categorization. ECOC, of recent vintage in the AI literature, is a method for decomposing a multiway classification pro... / Output Coding for Text Classification Adam Berger School of
99.9 Using Machine Learning To Improve Information Access - Sahami (1999)(Correct)
The explosion of on-line information has given rise to many query-based search engines (such as Alta Vista) and manually constructed topic hierarchies (such as Yahoo! ). But with the current growth ra... / methods can be inadequate for text classification where there is a large br . . Document Classification .
99.9 Using Maximum Entropy for Text Classification - Nigam, Lafferty, McCallum (1999)(Correct)
This paper proposes the use of maximum entropy
techniques for text classification. Maximum
entropy is a probability distribution estimation
technique widely used for a variety of
natural language task... / Using Maximum Entropy for Text Classification Kamal Nigam y
97.1 A Bayesian Approach to Filtering Junk E-Mail - Sahami, Dumais, Heckerman, Horvitz (1998)(Correct)
In addressing the growing problem of junk E-mail on
the Internet, we examine methods for the automated
construction of filters to eliminate such unwanted messages
from a user's mail stream. By casting... / to be a straight-forward text classification problem we show that by br not only employ traditional document classification techniques based on the
96.2 Automated Learning of Decision Rules for Text Categorization - Apte, Damerau, Weiss (1994)(Correct)
We describe the results of extensive experiments on large document collections using optimized
rule-based induction methods. The goal of these methods is to automatically discover
classification pat... / article. Most applications of text classification involve classes that are br retrieval of knowledge. Document classifications are typically assigned
89.8 TREC and TIPSTER Experiments With INQUERY - James Callan (1995)(Correct)
INQUERY is a probablistic information retrieval system based upon a Bayesian
inference network model. This paper describes recent improvements to the system as
a result of participation in the TIPSTER... / feedback and simulated document routing. Experiments with one and
85.7 Applying Co-Training methods to Statistical Parsing - Sarkar (2001)(Correct)
We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures ... / been used successfully in text classification in combination of labeled br Yarowsky document classification Blum and Mitchell
85.7 Enhancing Supervised Learning with Unlabeled Data - Goldman, Zhou (2000)(Correct)
In many practical learning scenarios, there is
a small amount of labeled data along with
a large pool of unlabeled data. Many supervised
learning algorithms have been developed
and extensively stu... / applications to the area of text classification. For example Riloff and
85.7 Document Clustering using Word Clusters via the Information.. - Slonim, Tishby (2000)(Correct)
We present a novel implementation of the recently introduced information
bottleneck method for unsupervised document clustering.
Given a joint empirical distribution of words and documents,
p(x; y), w... / to evaluate supervised text classification algorithms. In this way we br is shown to provide good document classification accuracy for the
85.7 Multistrategy Learning for Information Extraction - Freitag (1998)(Correct)
Information extraction (IE) is the problem
of filling out pre-defined structured summaries
from text documents. We are interested
in performing IE in non-traditional
domains, where much of the text is... / memorization term-space text classification and relational rule br to adapt ideas from document classification to the IE setting. A
80.8 Information extraction for semi-structured documents - Smith, Lopez (1997)(Correct)
this paper constitutes a suitable basis for building an effective solution to extracting information from semi-structured documents for two principal reasons. First, it provides an extensible architec... / A Basis For High-Precision Text Classification Acm Tois br closely related to work on document classification which characterises a
71.4 Centroid-Based Document Classification: Analysis & Experimental.. - Han, Karypis (2000)(Correct)
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet,
digital libraries, news sources, and company-wide intranets. Automatic text categorization,... / Naive Bayesian algorithm for text classification Rainbow has options br Centroid-Based Document Classification Analysis Experimental
71.4 Feature Selection and Dualities in Maximum Entropy Discrimination - Jebara, Jaakkola (2000)(Correct)
Incorporating feature selection into a classification
or regression method often carries
a number of advantages. In this paper we
formalize feature selection specifically from a
discriminative per... / Transductive Inference for Text Classification using Support Vector br ranging from image and document classification to problems in
71.4 Centroid-Based Document Classification: Analysis Experimental Results - Han (2000)(Correct)
In this paper we present a simple linear-time centroid-based document
classification algorithm, that despite its simplicity and robust performance,
has not been extensively studied and analyzed. O... / to be very effective in text classification We were not able to br Centroid-Based Document Classification Analysis Experimental
63.8 Workflow Applications to Research Agenda: Scalable and Dynamic Work.. - Sheth (1997)(Correct)
this paper, we focus
on two issues. The first issue relates to the challenges that could be addressed
by evolving the current workflow technology. Two of the challenges to which
we focus our attention... / activities and facilitating document routing imaging and reporting. In
63.6 Text Categorization Using Weight Adjusted k-Nearest Neighbor.. - Han, Karypis, Kumar (1999)(Correct)
Text categorization is the task of deciding whether a document belongs to a set of prespecified classes of documents.
Automatic classification schemes can greatly facilitate the process of categorizat... / has also been used in text classification Yan CH The key br and VSM. Keywords Classification text categorization nearest
63.6 Machine Learning in Automated Text Categorization - Sebastiani (1999)(Correct)
this paper concentrates on unknown Machine Learning in Automated
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Text Categorization
The automated categorization (or classification) o... / text categorization text classification . INTRODUCTION In the br PENOT N. . Automatic document classification natural langage
61.7 Towards Language Independent Automated Learning of Text.. - Apte, Damerau, Weiss (1994)(Correct)
We describe the results of extensivemachine learning experiments on large collections of
Reuters' English and German newswires. The goal of these experiments was to automatically
discover classifica... / Most applications of text classification involve classes that are br and portable technique for document classification. Automated Learning
60.8 Incremental Relevance Feedback for Information Filtering - Allan (1996)(Correct)
We use data from the TREC routing experiments to explore how relevance feedback can be applied incrementally --- using a few judged documents each time --- to achieve results that are as good as if th... / used.BSA In the area of text classification efforts have been made to
59.5 Learning Routing Queries in a Query Zone - Singhal (1997)(Correct)
Word usage is domain dependent. A common word in one
domain can be quite infrequent in another. In this study we
exploit this property of word usage to improve document
routing. We show that routing q... / of word usage to improve document routing. We show that routing
57.9 Little Words Can Make a Big Difference for Text Classification - Riloff (1995)(Correct)
Most information retrieval systems use stopword lists
and stemming algorithms. However, we have found
that recognizing singular and plural nouns, verb forms,
negation, and prepositions can produce dra... / Can Make a Big Difference for Text Classification Ellen Riloff
57.7 Concept Based Query Expansion - Qiu (1993)(Correct)
Query expansion methods have been studied for a long
time - with debatable success in many instances. In this
paper we present a probabilistic query expansion model
based on a similarity thesaurus whi... / Pea Spa Use of document classification. Documents are first br Use of document classification. Documents are first classified
57.1 Kernel expansions with unlabeled examples - Szummer, Jaakkola (2001)(Correct)
Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for ... / with naive Bayes models for text classification the co-training
57.1 Maximum Likelihood Estimation for Filtering Thresholds - Yi Zhang Jamie (2001)(Correct)
Information filtering systems based on statistical retrieval models
usually compute a numeric score indicating how well each
document matches each profile. Documents with scores above
profile-specific... / learning algorithms for text classification have also been used for br A. Singhal. . Boosting for document routing. In Proceedings of the
57.1 Athena: Mining-based Interactive Management of Text Databases - Agrawal, Bayardo, Srikant (2000)(Correct)
We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive miningbased operations. Requirements of any such system include speed and ... / with other techniques for text classification CDAR LR Lan br for hierarchy reorganization document routing and identification of
57.1 Less is More: Active Learning with Support Vector Machines - Schohn, Cohn (2000)(Correct)
We describe a simple active learning heuristic
which greatly enhances the generalization behavior
of support vector machines (SVMs) on several
practical document classification tasks. We
observe a... / particularly those involving text classification Joachims b Dumais et br on several practical document classification tasks. We observe a
57.1 Using Error-Correcting Codes For Text Classification - Ghani (2000)(Correct)
This paper explores in detail the use of Error
Correcting Output Coding (ECOC) for learning
text classifiers. We show that the accuracy of a
Naive Bayes Classifier over text classification
tasks c... / Error-Correcting Codes For Text Classification Rayid Ghani
55.0 Evaluating and Optimizing Autonomous Text Classification Systems - Lewis (1995)(Correct)
Text retrieval systems typically produce a ranking of documents and let a user decide how far down that ranking to go. In contrast, programs that filter text streams, software that categorizes documen... / and Optimizing Autonomous Text Classification Systems David D. Lewis
54.5 Record-Boundary Discovery in Web Documents - Embley, Jiang, Ng (1999)(Correct)
Extraction of information from unstructured or semistructured Web documents
often requires a recognition and delimitation of records. (By "record" we mean a
group of information relevant to some entit... / or to solve similar document classification problems such as to
51.4 Employing EM and Pool-Based Active Learning for Text Classification - McCallum (1998)(Correct)
This paper shows how a text classifier's need
for labeled training documents can be reduced
by taking advantage of a large pool
of unlabeled documents. We modify the
Query-by-Committee (QBC) method of... / Active Learning for Text Classification Andrew Kachites McCallum br does not perform well on document classification. In our experience vote
51.4 Joins that Generalize: Text Classification Using WHIRL - Cohen (1998)(Correct)
WHIRL is an extension of relational databases that can perform "soft joins" based on the similarity of textual identifiers; these soft joins extend the traditional operation of joining tables based on... / Joins that Generalize Text Classification Using WHIRL William W.
46.8 The TREC-5 Filtering Track - Lewis (1997)(Correct)
The TREC-5 filtering track, an evaluation of binary text classification systems, was a repeat of the filtering evaluation run in a trial version for TREC-4, with only the data set and participants cha... / an evaluation of binary text classification systems was a repeat of br learning for knowledge-based document routing a report on the TREC-
46.8 The TREC-4 Filtering Track - Lewis (1997)(Correct)
The TREC-4 filtering track was an experiment in the evaluation of binary text classification systems. In contrast to ranking systems, binary text classification systems may need to produce result sets... / in the evaluation of binary text classification systems. In contrast to br learning for knowledge-based document routing a report on the TREC-
45.7 Employing EM in Pool-Based Active Learning for Text Classification - McCallum, Nigam (1998)(Correct)
This paper shows how a text classifier's need for labeled training data
can be reduced by a combination of active learning and Expectation
Maximization (EM) on a pool of unlabeled data. Query-by-Commi... / Active Learning for Text Classification Andrew McCallum zy br in pool-based sampling for document classification tends to select
45.7 Role of Verbs in Document Analysis - Klavans, Kan (1998)(Correct)
We present results of two methods for assessing
the event profile of news articles as a function
of verb type. The unique contribution of this
research is the focus on the role of verbs, rather
than n... / as a discriminant in document classification. Motivation
45.7 Feature subset selection in text-learning - Mladenic (1998)(Correct)
This paper describes several known and some new methods
for feature subset selection on large text data. Experimental comparison
given on real-world data collected from Web users shows that characte... / reported to work well in text classification domains that scores
45.4 Text-Learning and Related Intelligent Agents: A Survey - Mladenic (1999)(Correct)
This article
surveys a part of text learning where
supervised learning methods are used for text unknown INTELLIGENT INFORMATION RETRIEVAL
Text-Learning and Related
Intelligent Agents: A Survey
Dunja... / In this approach to text classification the system searches for br worth the effort. Work on document classification that extends the
42.8 Restricted Bayes Optimal Classifiers - Tong, Koller (2000)(Correct)
We introduce the notion of restricted Bayes optimal classifiers. These classifiers attempt to combine the flexibility of the generative approach to classification with the high accuracy associated wit... / Naive Bayes classifier in text classification Mitchell We
42.8 Hierarchical Classification of Web Content - Dumais, Chen (2000)(Correct)
This paper explores the use of hierarchical structure for
classifying a large, heterogeneous collection of web
content. The hierarchical structure is initially used to train
different second-level cla... / structures. KEYWORDS Text classification text categorization br KEYWORDS Text classification text categorization
39.9 First-Order Learning for Web Mining - Craven (1998)(Correct)
We present compelling evidence that the World Wide Web is a domain in which applications can benefit from using first-order learning methods, since the graph structure inherent in hypertext naturall... / used first-order methods for text classification but the focus was on
37.6 Generality versus Size in Genetic Programming - Rosca (1996)(Correct)
Genetic Programming (GP) uses variable size
representations as programs. Size becomes an important
and interesting emergent property of the
structures evolved by GP. The size of programs
can be both a... / fitting to chaotic data text classification feature detection image
36.3 Lossless Document Image Compression - Inglis (1999)(Correct)
Document image compression reduces the storage requirements for digitised books or documents
by using characters as the fundamental unit of compression. Compression gains can
be achieved by identifyin... / and non-text zones. The text classification is extended to include br determining the classification document zones without resorting
36.3 Vector-Based Natural Language Call Routing - Chu-Carroll, Carpenter (1999)(Correct)
This paper describes a domain independent, automatically trained natural language call
router for directing incoming calls in a call center. Our call router directs customer calls based
on their respo... / McDonough et al. and document routing Sch utze Hull and br Natural Language Call Routing Document Construction Morphological
34.2 Efficient text categorization - Grobelnik, Mladenic (1998)(Correct)
We present an approach to text categorization using machine learning techniques. The approach is developed and tested on large text hierarchy named Yahoo that is available on the Web. We handle the la... / in for hierarchical document classification. For a new document the
31.8 Learning Probabilistic User Models - Billsus, Pazzani (1996)(Correct)
We describe two applications that use rated text documents to induce a model of the user's interests.
Based on our experiments with these applications we propose the use of a probabilistic learning
al... / neighbor approaches on text classification tasks. Using rated text
31.8 Knowledge-Based Approaches to Query Expansion in Information Retrieval - Bodner, Song (1996)(Correct)
This paper unknown Bodner, R. and Song, F. (1996) Knowledge-based approaches to query expansion in
information retrieval. In McCalla, G. (Ed.), Advances in Artificial Intelligence (pp.
146-158). New... / term classification document classification syntactic context and br and Frei term classification document classification syntactic
29.7 Exploration of Text Collections with Hierarchical Feature Maps - Merkl (1997)(Correct)
Document classification is one of the central issues in information
retrieval research. The aim is to uncover similarities
between text documents. In other words, classification techniques
are used to... / maps to perform the text classification task. Each of the br Abstract Document classification is one of the central
28.9 Recent Experiments with INQUERY - Allan (1995)(Correct)
this paper focuses on relevant differences to the previously
published algorithms.
1 Description of Ad-Hoc Experiments unknown Recent Experiments with INQUERY
James Allan, Lisa Ballesteros, James P. ... / broader research interests in document routing document filtering br interests in document routing document filtering distributed IR
28.5 Exploiting Structure for Intelligent Web Search - Kruschwitz (2001)(Correct)
Together with the rapidly growing amount of online data we register an immense need for intelligent search engines that access a restricted amount of data as found in intranets or other limited domain... / Structural Information for Text Classification on the WWW. In Proceedings br to be built in advance. Document classification is closely related to
28.5 Toward Optimal Active Learning through Sampling Estimation of Error.. - Roy, McCallum, al. (2001)(Correct)
This paper presents an active learning method that directly
optimizes expected future error. This is in contrast
to many other popular techniques that instead
aim to reduce version space size. Thes... / work. . Naive Bayes Text Classification Text classification is br . Naive Bayes Text Classification Text classification is not only
28.5 A Statistical Learning Model of Text Classification for Support.. - Joachims (2001)(Correct)
This paper develops a theoretical learning model of text classification for Support Vector Machines (SVMs). It connects the statistical properties of text-classification tasks with the generalization ... / Statistical Learning Model of Text Classification for Support Vector
28.5 Incremental Document Clustering for Web Page Classification - Wai-Chiu Wong And (2001)(Correct)
Introduction
We consider document clustering for Web pages. Traditionally, the document
classification task is carried out manually. In order to assign a document to
an appropriate class, people woul... / work conducted on automatic text classification. One approach is to learn br pages. Traditionally the document classification task is carried out
28.5 SOM-based Methodology for Building Large Text Archives - Arnulfo Azcarraga And (2001)(Correct)
Self-Organizing Maps (SOMs) have recently been used to archive over 7 million documents. Not
only have SOMs been shown to scale up to very large document collections, these maps also
allow for a novel... / various general methods for text classification. Benchmark cases are now br to feature extraction and document classification. These techniques span
28.5 On the Automated Classification of Web Sites - Pierre (2001)(Correct)
In this paper we discuss several issues related to automated text
classification of web sites. We analyze the nature of web content
and metadata in relation to requirements for text features. We
find ... /
28.5 Weight adjustment schemes for a centroid based classifier - Shankar, Karypis (2000)(Correct)
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital
libraries, news sources, and company-wide intra-nets. Automatic text categorization... / clustering of words for text classification. In SIGIR- . br has been widely used for document classification and has been shown to
28.5 Analyzing the Effectiveness and Applicability of Co-training - Nigam, Ghani (2000)(Correct)
Recently there has been significant interest in supervised
learning algorithms that combine labeled and unlabeled data
for text learning tasks. The co-training setting [1] applies to
datasets that hav... / labeled and unlabeled data text classification . INTRODUCTION There
28.5 Transforming Paper Documents into XML Format with WISDOM++ - Altamura, Esposito, Malerba (2000)(Correct)
The transformation of scanned paper documents to a form suitable for an
Internet browser is a complex process that requires solutions to several problems. The
application of an OCR to some parts of th... / document analysis document classification document br analysis document classification document understanding text
28.5 Assessing the Calibration of Naive Bayes' Posterior Estimates - Bennett (2000)(Correct)
In this paper, we give evidence that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length. While exponential change may be expected as new bits of informati... / reliability posterior text classification Reuters Introduction
28.5 Data mining models as services on the internet - Sarawagi, Nagaralu (2000)(Correct)
The goal of this article is to raise a debate on the usefulness
of providing data mining models as services on the internet.
These services can be provided by anyone with adequate
data and expertise a... / servers. . . Document classification services An imminently
28.5 The Class Imbalance Problem: Significance and Strategies - Japkowicz (2000)(Correct)
Although the majority of conceptlearning
systems previously designed usually assume
that their training sets are well-balanced, this
assumption is not necessarily correct. Indeed, there
exist many dom... / domains. In particular text classification domains could be good test
28.5 Text Classification Using WordNet Hypernyms - Scott, Matwin (1998)(Correct)
This paper describes experiments in Machine Learning for text classification using a new representation of text based on WordNet hypernyms. Six binary classification tasks of varying difficulty are de... / Text Classification Using WordNet Hypernyms
28.5 ifile: An Application of Machine Learning to E-Mail Filtering - Rennie (1998)(Correct)
With the proliferation of electronic mail in the modern era, it becomes ever more
important to devise methods for the organization, categorization and searching of such
mail. Mail filtering is such a ... / filter which makes use of the text classification algorithm naive Bayes. br Results . Document Classification Accuracies Experiment
27.2 A Tutorial on Automated Text Categorisation - Sebastiani (1999)(Correct)
The automated categorisation (or classification) of texts into topical categories has a long history,
dating back at least to 1960. Until the late '80s, the dominant approach to the problem
involved... / first introduced to the text classification literature in Batch br retrieval techniques and document classification The machine learning
27.2 Automatic Labeling of Self-Organizing Maps for Information Retrieval - Merkl, Rauber (1999)(Correct)
The self-organizing map is a very popular unsupervised neural network model for the analysis of high dimensional input data as in information retrieval applications. However, the interpretation of the... / approach on an example from text classification using a real-world br experimental results from document classification with self-organizing
27.2 Building Hierarchical Classifiers Using Class Proximity - Wang, Zhou, Liew (1999)(Correct)
In this paper, we address the need to automatically classify text documents into topic hierarchies like those in ACM Digital Library and Yahoo!. The existing local approach constructs a classifier at ... / The central issue in document classification is separating feature
27.2 Workflow Support for Electronic Commerce Applications - Kumar, Zhao (1999)(Correct)
Internet-based electronic commerce is becoming the next frontier of new business opportunities. However, commerce on the Internet is seriously hindered by the lack of a common language for collaborati... / does not provide support for document routing. In this paper we describe br Language provides support for routing documents and managing workflows
27.2 Clustering Transactions Using Large Items - Ke Wang Chu (1999)(Correct)
In traditional data clustering, similarity of a cluster of
objects is measured by pairwise similarity of objects
in that cluster. We argue that such measures are not
appropriate for transactions that ... / term widely used in text document classification and clustering is
27.2 Machine Learning in Automated Text Categorisation - Sebastiani (1999)(Correct)
this paper. Aside from (i) the automatic assignment
of documents to a predefined set of categories, which is the main topic of
this paper, the term has also been used to mean (ii) the automatic defini... / text categorisation text classification . INTRODUCTION In the br E.andGuthriz J. A. . Document classification by machine theoryand
27.2 Machine Learning in Automated Text Categorisation: a Bibliography - Sebastiani (1999)(Correct)
m, NL.
Yu, E. S. and Liddy, E. D. 1999. Feature selection in text categorization using the Baldwin
e#ect. In Proceedings of IJCNN-99, International Joint Conference on Neural Networks
(Washington, DC... / H. . Improving short text classification using unlabeled background br A probabilistic approach to document classification. In E. A. Fox P.
26.0 Method Combination For Document Filtering - Hull (1996)(Correct)
There is strong empirical and theoretic evidence that combination of retrieval methods can improve performance. In this paper, we systematically compare combination strategies in the context of docume... / and optimizing autonomous text classification systems. In Proceedings br is distinguished from document routing because it is assumed that
25.5 A Multilevel Approach to Intelligent Information Filtering: Model.. - Mostafa (1997)(Correct)
this article, a filtering model is proposed that decomposes the overall task into subsystem functionalities and highlights the need for multiple adaptation techniques to cope with uncertainties. A fil... / by a vector-space model document classification by unsupervised learning
23.1 A Comparison of Classifiers and Document Representations for the.. - Schütze, Hull, Pedersen (1995)(Correct)
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification techni... / as a problem of statistical text classification. Documents are to be br of statistical text classification. Documents are to be assigned to
22.4 Feature Selection and Feature Extraction for Text Categorization - Lewis (1992)(Correct)
The effect of selecting varying numbers and kinds of features
for use in predicting category membership was investigated
on the Reuters and MUC-3 text categorization data
sets. Good categorization per... / in probabilistic models for text classification tasks the formula in br topic categories to support document routing and retrieval. Particular
22.2 Document Image Understanding: Geometric and Logical Layout - Haralick (1994)(Correct)
Introduction
Document Image Understanding encompasses the
technology required to make paper documents equivalent
to other computer exchange media like floppies,
tapes, and cdroms. The physical reader... / for Layout Analysis in Document Classification nd ICDAR Tsukuba
21.2 Using Grammatical Inference to Improve Precision in Information.. - Freitag (1997)(Correct)
The field of information extraction (IE) is concerned
with applying natural language processing (NLP) and
information retrieval (IR) techniques to the automatic
extraction of essential details from te... / shown to work for document classification also work with suitable
20.2 Cluster-Based Text Categorization: A Comparison of Category Search.. - Makoto, Takenobu (1995)(Correct)
Text categorization can be viewed as a process of category search, in which one or more categories for a test document are searched for by using given training documents with known categories. In this... / clustering for automatic text classification. In Proceedings of the br methods for automatic document classification. Journal of
18.1 Creating Customized Authority Lists - Chang, Cohn, McCallum (1999)(Correct)
The proliferation of hypertext and the popularity of Kleinberg
's HITS algorithm have brought about an increased interest
in link analysis. While HITS and its older relatives from
the Bibliometrics li... / and uses statistical text classification to categorize the papers
18.1 Feature Reduction for Neural Network Based Text Categorization - Savio Lam (1999)(Correct)
In a text categorization model using an artificial neural
network as the text classifier, scalability is poor if the neural
network is trained using the raw feature space since textural
data has a ver... / neural network model used for text classification. In Section the
18.1 Information Filtering in Changing Domains - Lanquillon (1999)(Correct)
The task of information filtering is to classify
documents from a stream into either relevant
or irrelevant according to a particular user interest
with the objective to reduce information
load. When ... / seminal work on autonomous text classification system see Lewis
18.1 Feature selection for unbalanced class distribution and Naive Bayes - Mladenic, Grobelnik (1999)(Correct)
This paper describes an approach to feature
subset selection that takes into account problem
specifics and learning algorithm characteristics.
It is developed for the Naive
Bayesian classifier applied... / cross entropy used in text-classification experiments Koller and br process since for each document classification not all but only
18.1 Practical Evaluation of IR within Automated Classification Systems - Dolin, Pierre, Butler, Avedon (1999)(Correct)
This paper describes some of the work we have done
to evaluate and compare the use of three IR systems
(Verity, LSI, and SMART) as black boxes within an automated
classification environment. We use au... / classification schemes and document classification. There are many commonly br Retrieval System for document routing. feel that an analysis of
18.1 PowerBookmarks: A System for Personalizable Web Information.. - Quoc (1999)(Correct)
We extend the notion of bookmark management by introducing the functionalities
of hypermedia databases. PowerBookmarks is a Web information organization,
sharing, and management tool, which parses met... / Personalization Classification Document Integration Internet
17.3 What do Advanced Transaction Models Have to Offer for Workflows ? - Worah (1996)(Correct)
Workflow management systems are finding wide applicability in small and large organizational
settings. In this paper, we briefly review four large-scale applications to gauge their modeling
and run-ti... / activities and facilitating document routing imaging and reporting.
17.2 Multilevel Security in the UNIX Tradition - McIlroy, Reeds (1992)(Correct)
The original UNIXÒ system was designed to be small and intelligible, achieving
power by generality rather than by a profusion of features. In this spirit we have designed
and implemented IX, a multile... / system. IX supports document classification with mandatory access