Home     Top: Information Retrieval: Classification    [Classification   Digital Libraries   Extraction   Filtering   Metasearch   Retrieval   Search Engines   World Wide Web]

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Ordered by the expected number of citations based on the year of publication

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

954.2   Text Categorization with Support Vector Machines: Learning with Many.. - Joachims (1998)   (Correct)
This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are ... / in contrast to conventional text classification methods SVMs will prove to

377.1   Learning to Extract Symbolic Knowledge from the World Wide Web - Craven, DiPasquo, Freitag, McCallum, .. (1998)   (Correct)
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understanda... / . . Statistical Text Classification br . Related Work . . Document Classification

371.4   Learning to Construct Knowledge Bases from the World Wide Web - Craven, Freitag, McCallum, Mitchell, .. (2000)   (Correct)
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understanda... / . Statistical Text Classification In this section we br of related work. . Document Classification Our work is related to

347.8   WebWatcher: A Tour Guide for the World Wide Web - Joachims, Freitag, Mitchell (1996)   (Correct)
We explore the notion of a tour guide software agent for assisting users browsing the world wide web. A web tour guide agent provides assistance similar to that provided by a human tour guide in a mus... / Learning Intelligent Agents Text Classification World Wide Web

331.9   An Evaluation of Statistical Approaches to Text Categorization - Yang (1997)   (Correct)
This paper is a comparative study of text categorization methods. Fourteen methods are investigated, based on previously published results and newly obtained results from additional experiments. Corpu... / decision trees and that text classification has a number of

302.1   Hierarchically classifying documents using very few words - Koller, Sahami (1997)   (Correct)
The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. Existing classification schemes which igno... / are often inadequate in text classification where the there is a large

285.7   A Comparison of Event Models for Naive Bayes Text Classification - Mccallum, Nigam (1998)   (Correct)
Recent approaches to text classification have used two di#erent first-order probabilistic models for classification, both of which make the naive Bayes assumption. unknown A Comparison of Event Mode... /

271.4   Statistical Pattern Recognition: A Review - Jain, Duin, Mao (2000)   (Correct)
this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number 110296. unknown Statistical Pattern Recognition: A Review Anil K. Jain, , Robert P.W. Duin, and Jianchang Mao,... / patterns document classification efficiently searching

257.1   Enhanced hypertext categorization using hyperlinks - Chakrabarti, Dom, Indyk (1998)   (Correct)
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves ... / addressed in the extensive text classification literature. Links clearly br at c whose goal is to route documents into those subtrees that

254.5   Text Classification from Labeled and Unlabeled Documents using EM - Nigam, Mccallum, Thrun, Mitchell (1999)   (Correct)
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important ... / in The Netherlands. Text Classification from Labeled and Unlabeled br H.Yamanishi K. Document classification using a finite mixture

239.9   Information Extraction from HTML: Application of a General Machine.. - Freitag (1998)   (Correct)
Because the World Wide Web consists primarily of text, information extraction is central to any effort that would use the Web as a resource for knowledge discovery. We show how information extraction ... / We regard IE as a kind of text classification which has strong br problem of document classification but also presents

223.1   A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text .. - Joachims (1996)   (Correct)
A probabilistic analysis of the Rocchio relevance feedback algorithm, one of the most popular learning methods from information retrieval, is presented in a text categorization framework. The analysis... / selection. Applied to text classification this means that we want to

218.1   Transductive Inference for Text Classification using Support Vector.. - Joachims (1999)   (Correct)
This paper introduces Transductive Support Vector Machines (TSVMs) for text classification. While regular Support Vector Machines (SVMs) try to induce a general decision function for a learning task, ... / Transductive Inference for Text Classification using Support Vector

208.6   Training Algorithms for Linear Text Classifiers - Lewis, Schapire, Callan, Papka (1996)   (Correct)
Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classifiers. We propose that two machine learning algorithms, the Widrow-Hoff and EG algorithms, be used i... / routing and other text classification tasks operate similarly. br TREC Document Routing topics - varies

199.9   Learning to Classify Text from Labeled and Unlabeled Documents - Nigam (1998)   (Correct)
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is significa... / because in many important text classification problems obtaining br Li and Kenji Yamanishi. Document classification using a finite mixture

195.7   Selection of Relevant Features and Examples in Machine Learning - Blum, Langley (1997)   (Correct)
In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant f... / it is not uncommon in a text classification task to represent examples

185.1   A Sequential Algorithm for Training Text Classifiers - Lewis, Gale (1994)   (Correct)
The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully ... / Introduction Text classification is the automated grouping

181.8   Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)   (Correct)
Consider the task of exploring the Web in order to find pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowledge bases. This paper ... / reinforcement learning text classification World Wide Web

178.7   Feature Subset Selection Using A Genetic Algorithm - Yang, Honavar (1997)   (Correct)
Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is... / user interest profiles for text classification Yang et al. a and br problems as well as a document classification task. Section .

177.1   Boosting and Rocchio Applied to Text Filtering - Schapire, Singer, Singhal (1998)   (Correct)
We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any general utility matrix that... / filtering is just binary text classification into the categories br feedbackand more recently for document routing as a comparison

157.1   Concept Indexing - A Fast Dimensionality Reduction Algorithm with.. - Karypis, Han (2000)   (Correct)
In recent years, we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. This has led to an increased i... / clustering of words for text classification. In SIGIR- . br are primarily used for document classification and for improving the

157.1   On the Learnability and Design of Output Codes for Multiclass Problems - Crammer, Singer (2000)   (Correct)
Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefined output codes. In t... / character recognition text classification phoneme classification

148.9   An Adaptive Web Page Recommendation Service - Balabanovic (1997)   (Correct)
An adaptive recommendation service seeks to adapt to its users, providing increasingly personalized recommendations over time. In this paper we introduce the "Fab" adaptive web page recommendation ser... / relevance work on document classification which considers br . Document Classification Document classification lies at

148.5   Distributional Clustering of Words for Text Classification - Baker, McCallum (1998)   (Correct)
This paper describes the application of Distributional Clustering [20] to document classification. This approach clusters words into groups based on the distribution of class labels associated with ea... / Clustering of Words for Text Classification L. Douglas Baker yz br Clustering to document classification. This approach clusters

148.5   Making Large-Scale Support Vector Machine Learning Practical - Joachims (1998)   (Correct)
Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understo... / on two benchmark tasks a text classification task and an image

145.4   Learning approaches for Detecting and Tracking News Events - Yang, Carbonell, Brown, Pierce.. (1999)   (Correct)
This paper studies the effective use of information retrieval and machine learning techniques in a new task, event detection and tracking. The objective is to automatically detect novel events from ch... / algorithms to allow document classification based on both information

142.8   Improving Category Specific Web Search by Learning Query Modifications - Glover, Flake, Lawrence, Birmingham, .. (2001)   (Correct)
Users looking for documents within specific categories may have a difficult time locating valuable documents using general purpose search engines. We present an automated method for learning query mod... / compared to other methods for text classification A brief

128.5   Estimating the Generalization Performance of an SVM Efficiently - Joachims (2000)   (Correct)
This paper proposes and analyzes an approach to estimating the generalization performance of a support vector machine (SVM) for text classification. Without any computation intensive resampling, the n... /

127.2   A Winnow-Based Approach to Context-Sensitive Spelling Correction - Golding, Roth (1999)   (Correct)
A large class of machine-learning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of v... / tasks such as text classification where the context is

120.0   A Study of Approaches to Hypertext Categorization - Yang, Slattery, Ghani (2002)   (Correct)
Hypertext poses new research challenges for text classification. Hyperlinks, HTML tags, category labels distributed over linked documents, and meta data extracted from related web sites all provide ... / new research challenges for text classification. Hyperlinks HTML tags

120.0   Text Classification using String Kernels - Lodhi, Saunders, Shawe-Taylor.. (2002)   (Correct)
We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subse... / Published Text Classification using String Kernels Huma

118.1   Constructing Biological Knowledge Bases by Extracting Information.. - Craven, Kumlien (1999)   (Correct)
Recently, there has been much effort in making databases for molecular biology more accessible and interoperable. However, information in text form, such as MEDLINE records, remains a greatly underuti... / this task a statistical text classification method and a relational

114.2   Improving Text Classification by Shrinkage in a Hierarchy of Classes - McCallum, Rosenfeld, Mitchell, Ng (1998)   (Correct)
When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. This paper shows that the acc... / Improving Text Classification by Shrinkage in a Hierarchy br approach to hierarchical document classification the Pachinko Machine

111.1   Information Extraction as a Basis for High-Precision Text.. - Riloff, Lehnert (1994)   (Correct)
this article. For the purpose of text classification, the answer keys serve only as a set of correct classifications for each text. If a text has instantiated key templates associated with it in the ... /

109.0   Building Domain-Specific Search Engines with Machine Learning.. - McCallum, Nigam, Rennie, Seymore (1999)   (Correct)
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with the general, Web-wide search engines. For example, www.camps... / in reinforcement learning text classification and information extraction

109.0   Error-Correcting Output Coding for Text Classification - Berger (1999)   (Correct)
This paper applies error-correcting output coding (ECOC) to the task of document categorization. ECOC, of recent vintage in the AI literature, is a method for decomposing a multiway classification pro... / Output Coding for Text Classification Adam Berger School of

100.0   Data Mining on Symbolic Knowledge Extracted from the Web - Ghani, Jones, Mladenic, Nigam.. (2000)   (Correct)
Information extractors and classifiers operating on unrestricted, unstructured texts are an errorful source of large amounts of potentially useful information, especially when combined with a crawler ... / information extractors text classification and relational learning.

100.0   An annotation tool for Web browsers and its applications to.. - Denoue, Vignollet (2000)   (Correct)
With bookmark programs, current Web browsers provide a limited support to personalize the Web. We present a new Web annotation tool which uses the Document Object Model Level 2 and Dynamic HTML to del... / the future works including document classification and summarization and

99.9   Using Machine Learning To Improve Information Access - Sahami (1999)   (Correct)
The explosion of on-line information has given rise to many query-based search engines (such as Alta Vista) and manually constructed topic hierarchies (such as Yahoo! ). But with the current growth ra... / methods can be inadequate for text classification where there is a large br . . Document Classification .

99.9   Using Maximum Entropy for Text Classification - Nigam, Lafferty, McCallum (1999)   (Correct)
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language task... / Using Maximum Entropy for Text Classification Kamal Nigam y

99.9   A Machine Learning Approach to Building Domain-Specific Search Engines - McCallum, Nigam, Rennie, Seymore (1999)   (Correct)
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are al... / in reinforcement learning text classification and information extraction

97.1   A Bayesian Approach to Filtering Junk E-Mail - Sahami, Dumais, Heckerman, Horvitz (1998)   (Correct)
In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user's mail stream. By casting... / to be a straight-forward text classification problem we show that by br not only employ traditional document classification techniques based on the

96.2   Automated Learning of Decision Rules for Text Categorization - Apte, Damerau, Weiss (1994)   (Correct)
We describe the results of extensive experiments on large document collections using optimized rule-based induction methods. The goal of these methods is to automatically discover classification pat... / article. Most applications of text classification involve classes that are br retrieval of knowledge. Document classifications are typically assigned

89.8   TREC and TIPSTER Experiments With INQUERY - James Callan (1995)   (Correct)
INQUERY is a probablistic information retrieval system based upon a Bayesian inference network model. This paper describes recent improvements to the system as a result of participation in the TIPSTER... / feedback and simulated document routing. Experiments with one and

85.7   Applying Co-Training methods to Statistical Parsing - Sarkar (2001)   (Correct)
We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures ... / been used successfully in text classification in combination of labeled br Yarowsky document classification Blum and Mitchell

85.7   Improving Multi-class Text Classification with Naive Bayes - Rennie (2001)   (Correct)
There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seek... /

85.7   Support Vector Machine Active Learning with Applications to Text.. - Tong, Koller (2001)   (Correct)
Support vector machines have met with signi cant success in numerous real-world learning tasks. However, like most machine learning algorithms, they are generally applied using a randomly selected tra... / Learning with Applications to Text Classification a b Figure a A

85.7   Learning from Labeled and Unlabeled Data using Graph Mincuts - Blum, Chawla (2001)   (Correct)
Many application domains suffer from not having enough labeled training data for learning. However, large amounts of unlabeled examples can often be gathered cheaply. As a result, there has been a... /

85.7   Enhancing Supervised Learning with Unlabeled Data - Goldman, Zhou (2000)   (Correct)
In many practical learning scenarios, there is a small amount of labeled data along with a large pool of unlabeled data. Many supervised learning algorithms have been developed and extensively stu... / applications to the area of text classification. For example Riloff and

85.7   Document Clustering using Word Clusters via the Information.. - Slonim, Tishby (2000)   (Correct)
We present a novel implementation of the recently introduced information bottleneck method for unsupervised document clustering. Given a joint empirical distribution of words and documents, p(x; y), w... / to evaluate supervised text classification algorithms. In this way we br is shown to provide good document classification accuracy for the

85.7   Multistrategy Learning for Information Extraction - Freitag (1998)   (Correct)
Information extraction (IE) is the problem of filling out pre-defined structured summaries from text documents. We are interested in performing IE in non-traditional domains, where much of the text is... / memorization term-space text classification and relational rule br to adapt ideas from document classification to the IE setting. A

81.8   SPIRIT: Sequential Pattern Mining with Regular Expression Constraints - Garofalakis, Rastogi, Shim (1999)   (Correct)
Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional mining systems pr... / inherently fuzzy nature of document classification there are numerous

80.8   Information extraction for semi-structured documents - Smith, Lopez (1997)   (Correct)
this paper constitutes a suitable basis for building an effective solution to extracting information from semi-structured documents for two principal reasons. First, it provides an extensible architec... / A Basis For High-Precision Text Classification Acm Tois br closely related to work on document classification which characterises a

71.4   Centroid-Based Document Classification: Analysis & Experimental.. - Han, Karypis (2000)   (Correct)
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. Automatic text categorization,... / Naive Bayesian algorithm for text classification Rainbow has options br Centroid-Based Document Classification Analysis Experimental

71.4   Improving text categorization methods for event tracking - Yang, Ault, Pierce, Lattimer (2000)   (Correct)
Automated tracking of events from chronologically ordered document streams is a new challenge for statistical text classification. Existing learning techniques must be adapted or improved in order to ... / challenge for statistical text classification. Existing learning

71.4   Feature Selection and Dualities in Maximum Entropy Discrimination - Jebara, Jaakkola (2000)   (Correct)
Incorporating feature selection into a classification or regression method often carries a number of advantages. In this paper we formalize feature selection specifically from a discriminative per... / Transductive Inference for Text Classification using Support Vector br ranging from image and document classification to problems in

71.4   Centroid-Based Document Classification: Analysis Experimental Results - Han (2000)   (Correct)
In this paper we present a simple linear-time centroid-based document classification algorithm, that despite its simplicity and robust performance, has not been extensively studied and analyzed. O... / to be very effective in text classification We were not able to br Centroid-Based Document Classification Analysis Experimental

68.0   Using and combining predictors that specialize - Freund, Schapire, Singer, Warmuth (1997)   (Correct)
We study online learning algorithms that predict by combining the predictions of several subordinate prediction algorithms, sometimes called "experts." These simple algorithms belong to the multipli... / in this paper for such a text-classification task. Thus our results

63.8   Workflow Applications to Research Agenda: Scalable and Dynamic Work.. - Sheth (1997)   (Correct)
this paper, we focus on two issues. The first issue relates to the challenges that could be addressed by evolving the current workflow technology. Two of the challenges to which we focus our attention... / activities and facilitating document routing imaging and reporting. In

63.6   Text Categorization Using Weight Adjusted k-Nearest Neighbor.. - Han, Karypis, Kumar (1999)   (Correct)
Text categorization is the task of deciding whether a document belongs to a set of prespecified classes of documents. Automatic classification schemes can greatly facilitate the process of categorizat... / has also been used in text classification Yan CH The key br and VSM. Keywords Classification text categorization nearest

63.6   Machine Learning in Automated Text Categorization - Sebastiani (1999)   (Correct)
this paper concentrates on unknown Machine Learning in Automated Fabrizio Sebastiani Consiglio Nazionale delle Ricerche, Italy Text Categorization The automated categorization (or classification) o... / text categorization text classification . INTRODUCTION In the br PENOT N. . Automatic document classification natural langage

61.7   Towards Language Independent Automated Learning of Text.. - Apte, Damerau, Weiss (1994)   (Correct)
We describe the results of extensivemachine learning experiments on large collections of Reuters' English and German newswires. The goal of these experiments was to automatically discover classifica... / Most applications of text classification involve classes that are br and portable technique for document classification. Automated Learning

60.8   Incremental Relevance Feedback for Information Filtering - Allan (1996)   (Correct)
We use data from the TREC routing experiments to explore how relevance feedback can be applied incrementally --- using a few judged documents each time --- to achieve results that are as good as if th... / used.BSA In the area of text classification efforts have been made to

59.5   Learning Routing Queries in a Query Zone - Singhal (1997)   (Correct)
Word usage is domain dependent. A common word in one domain can be quite infrequent in another. In this study we exploit this property of word usage to improve document routing. We show that routing q... / of word usage to improve document routing. We show that routing

57.9   Little Words Can Make a Big Difference for Text Classification - Riloff (1995)   (Correct)
Most information retrieval systems use stopword lists and stemming algorithms. However, we have found that recognizing singular and plural nouns, verb forms, negation, and prepositions can produce dra... / Can Make a Big Difference for Text Classification Ellen Riloff

57.7   Concept Based Query Expansion - Qiu (1993)   (Correct)
Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus whi... / Pea Spa Use of document classification. Documents are first br Use of document classification. Documents are first classified

57.1   Kernel expansions with unlabeled examples - Szummer, Jaakkola (2001)   (Correct)
Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for ... / with naive Bayes models for text classification the co-training

57.1   Improving Multiclass Text Classification with the Support Vector.. - Rennie, Rifkin (2001)   (Correct)
We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substant... / Improving Multiclass Text Classification with the Support Vector

57.1   Maximum Likelihood Estimation for Filtering Thresholds - Yi Zhang Jamie (2001)   (Correct)
Information filtering systems based on statistical retrieval models usually compute a numeric score indicating how well each document matches each profile. Documents with scores above profile-specific... / learning algorithms for text classification have also been used for br A. Singhal. . Boosting for document routing. In Proceedings of the

57.1   Athena: Mining-based Interactive Management of Text Databases - Agrawal, Bayardo, Srikant (2000)   (Correct)
We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive miningbased operations. Requirements of any such system include speed and ... / with other techniques for text classification CDAR LR Lan br for hierarchy reorganization document routing and identification of

57.1   Less is More: Active Learning with Support Vector Machines - Schohn, Cohn (2000)   (Correct)
We describe a simple active learning heuristic which greatly enhances the generalization behavior of support vector machines (SVMs) on several practical document classification tasks. We observe a... / particularly those involving text classification Joachims b Dumais et br on several practical document classification tasks. We observe a

57.1   Using Error-Correcting Codes For Text Classification - Ghani (2000)   (Correct)
This paper explores in detail the use of Error Correcting Output Coding (ECOC) for learning text classifiers. We show that the accuracy of a Naive Bayes Classifier over text classification tasks c... / Error-Correcting Codes For Text Classification Rayid Ghani

57.1   Combining Statistical and Relational Methods for Learning in.. - Slattery, Craven (1998)   (Correct)
We present a new approach to learning hypertext classifiers that combines a statistical text-learning method with a relational rule learner. This approach is well suited to learning in hypertext dom... / which is commonly used for text classification and then we describe an

57.1   Pink Panther: A Complete Environment for Ground-Truthing and.. - Yanikoglu, al. (1998)   (Correct)
We describe a new approach for the automatic evaluation of document page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: segmentation quality is asses... /

55.0   Evaluating and Optimizing Autonomous Text Classification Systems - Lewis (1995)   (Correct)
Text retrieval systems typically produce a ranking of documents and let a user decide how far down that ranking to go. In contrast, programs that filter text streams, software that categorizes documen... / and Optimizing Autonomous Text Classification Systems David D. Lewis

54.5   Record-Boundary Discovery in Web Documents - Embley, Jiang, Ng (1999)   (Correct)
Extraction of information from unstructured or semistructured Web documents often requires a recognition and delimitation of records. (By "record" we mean a group of information relevant to some entit... / or to solve similar document classification problems such as to

54.5   On The Use Of Support Vector Machines For Phonetic Classification - Clarkson, Moreno (1999)   (Correct)
Support Vector Machines (SVMs) represent a new approach to pattern classification which has recently attracted a great deal of interest in the machine learning community. Their appeal lies in their st... / from vision problems to text classification. However their application

51.4   Employing EM and Pool-Based Active Learning for Text Classification - McCallum (1998)   (Correct)
This paper shows how a text classifier's need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) method of... / Active Learning for Text Classification Andrew Kachites McCallum br does not perform well on document classification. In our experience vote

51.4   Joins that Generalize: Text Classification Using WHIRL - Cohen (1998)   (Correct)
WHIRL is an extension of relational databases that can perform "soft joins" based on the similarity of textual identifiers; these soft joins extend the traditional operation of joining tables based on... / Joins that Generalize Text Classification Using WHIRL William W.

46.8   The TREC-5 Filtering Track - Lewis (1997)   (Correct)
The TREC-5 filtering track, an evaluation of binary text classification systems, was a repeat of the filtering evaluation run in a trial version for TREC-4, with only the data set and participants cha... / an evaluation of binary text classification systems was a repeat of br learning for knowledge-based document routing a report on the TREC-

46.8   The TREC-4 Filtering Track - Lewis (1997)   (Correct)
The TREC-4 filtering track was an experiment in the evaluation of binary text classification systems. In contrast to ranking systems, binary text classification systems may need to produce result sets... / in the evaluation of binary text classification systems. In contrast to br learning for knowledge-based document routing a report on the TREC-

45.7   Employing EM in Pool-Based Active Learning for Text Classification - McCallum, Nigam (1998)   (Correct)
This paper shows how a text classifier's need for labeled training data can be reduced by a combination of active learning and Expectation Maximization (EM) on a pool of unlabeled data. Query-by-Commi... / Active Learning for Text Classification Andrew McCallum zy br in pool-based sampling for document classification tends to select

45.7   Role of Verbs in Document Analysis - Klavans, Kan (1998)   (Correct)
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than n... / as a discriminant in document classification. Motivation

45.7   Feature subset selection in text-learning - Mladenic (1998)   (Correct)
This paper describes several known and some new methods for feature subset selection on large text data. Experimental comparison given on real-world data collected from Web users shows that characte... / reported to work well in text classification domains that scores

45.4   Text-Learning and Related Intelligent Agents: A Survey - Mladenic (1999)   (Correct)
This article surveys a part of text learning where supervised learning methods are used for text unknown INTELLIGENT INFORMATION RETRIEVAL Text-Learning and Related Intelligent Agents: A Survey Dunja... / In this approach to text classification the system searches for br worth the effort. Work on document classification that extends the

44.4   On the Naive Bayes Model for Text Categorization - Eyheramendy, Lewis, Madigan (2003)   (Correct)
This paper empirically compares the performance of four probabilistic models for text classification- Poisson, Bernoulli, Multinomial and Negative Binomial. We examine the "naive Bayes" assumption in ... /

42.8   Two Decades Of Statistical Language Modeling: Where Do We Go From.. - Rosenfeld (2000)   (Correct)
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was... / machine translation document classification and routing optical

42.8   Restricted Bayes Optimal Classifiers - Tong, Koller (2000)   (Correct)
We introduce the notion of restricted Bayes optimal classifiers. These classifiers attempt to combine the flexibility of the generative approach to classification with the high accuracy associated wit... / Naive Bayes classifier in text classification Mitchell We

42.8   Hierarchical Classification of Web Content - Dumais, Chen (2000)   (Correct)
This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train different second-level cla... / structures. KEYWORDS Text classification text categorization br KEYWORDS Text classification text categorization

42.8   Representation of Electronic Mail Filtering Profiles: A User Study - Pazzani (2000)   (Correct)
Electronic mail offers the promise of rapid communication of essential information. However, electronic mail is also used to send unwanted messages. A variety of approaches can learn a profile of a us... / Some of the earliest text classification methods e.g.Rocchio

42.8   Discriminant-EM Algorithm with Application to Image Retrieval - Wu, Tian, Huang (2000)   (Correct)
In many vision applications, the practice of supervised learning faces several difficulties, one of which is that insufficient labeled training data result in poor generalization. In image retrieval, ... / have some applications in text classification. Although EM o ers a

42.8   XML Based Schema Definition for Support of Inter-organizational.. - van der Aalst, Kumar (2000)   (Correct)
The full potential of the web as a medium for electronic commerce can be realized only when multiple partners in a supply chain can route information among themselves in a seamless way. Commerce on th... /

42.8   Tracking Conversational Context for Machine Mediation of Human.. - Jebara, Ivanov, al. (2000)   (Correct)
We describe a system that tracks conversational context using speech recognition and topic modeling. Topics are described by computing the frequency of words for each class. We thus reliably detect... / topic-spotting arise in the text classification community which uses typed

40.5   ORBWork: A Reliable Distributed CORBA-based Workflow Enactment System .. - Das, Kochut, Miller, Sheth, Worah (1996)   (Correct)
Key limitations of the state-of-art workflow products and research prototypes include the lack of adequate support for functioning in heterogeneous environments that involve humans and automated tas... / activities and facilitating document routing imaging and reporting. In

40.0   Bootstrapping an Ontology-based Information Extraction System - Maedche, Neumann, Staab (2002)   (Correct)
Automatic intelligent web exploration will benefit from shallow information extraction techniques if the latter can be brought to work within many different domains. The major bottleneck for this, how... / e-mail routing fine-grained text classification automatic metadata

39.9   First-Order Learning for Web Mining - Craven (1998)   (Correct)
We present compelling evidence that the World Wide Web is a domain in which applications can benefit from using first-order learning methods, since the graph structure inherent in hypertext naturall... / used first-order methods for text classification but the focus was on

37.6   Generality versus Size in Genetic Programming - Rosca (1996)   (Correct)
Genetic Programming (GP) uses variable size representations as programs. Size becomes an important and interesting emergent property of the structures evolved by GP. The size of programs can be both a... / fitting to chaotic data text classification feature detection image

36.3   Lossless Document Image Compression - Inglis (1999)   (Correct)
Document image compression reduces the storage requirements for digitised books or documents by using characters as the fundamental unit of compression. Compression gains can be achieved by identifyin... / and non-text zones. The text classification is extended to include br determining the classification document zones without resorting

36.3   Vector-Based Natural Language Call Routing - Chu-Carroll, Carpenter (1999)   (Correct)
This paper describes a domain independent, automatically trained natural language call router for directing incoming calls in a call center. Our call router directs customer calls based on their respo... / McDonough et al. and document routing Sch utze Hull and br Natural Language Call Routing Document Construction Morphological

34.2   Using HTML Formatting to Aid in Natural Language Processing on the.. - DiPasquo (1998)   (Correct)
Because of its magnitude and the fact that it is not computer understandable, the World Wide Web has become a prime candidate for automatic natural language tasks. This thesis argues that there is inf... / work has been reported on document classification. Craven et al.

34.2   An Environment for Morphosyntactic Processing of Unrestricted Spanish .. - Carmona, Cervell, Màrquez, Martí.. (1998)   (Correct)
We present in this paper a fast, broad-coverage, accurate morphological analyzer for Spanish words, MACO+, which is an extended and improved version of that described in (Acebo et al., 1994). The earl... /

34.2   Efficient text categorization - Grobelnik, Mladenic (1998)   (Correct)
We present an approach to text categorization using machine learning techniques. The approach is developed and tested on large text hierarchy named Yahoo that is available on the Web. We handle the la... / in for hierarchical document classification. For a new document the

32.9   Building and Maintaining Analysis-Level Class Hierarchies Using.. - Godin, Mili (1993)   (Correct)
Software reuse is one of the most advertised advantages of object-orientation. Inheritance, in all its forms, plays an important part in achieving greater reuse, at all stages of development. Class hi... / measurements and document classification and browsing

31.8   Learning Probabilistic User Models - Billsus, Pazzani (1996)   (Correct)
We describe two applications that use rated text documents to induce a model of the user's interests. Based on our experiments with these applications we propose the use of a probabilistic learning al... / neighbor approaches on text classification tasks. Using rated text

31.8   Knowledge-Based Approaches to Query Expansion in Information Retrieval - Bodner, Song (1996)   (Correct)
This paper unknown Bodner, R. and Song, F. (1996) Knowledge-based approaches to query expansion in information retrieval. In McCalla, G. (Ed.), Advances in Artificial Intelligence (pp. 146-158). New... / term classification document classification syntactic context and br and Frei term classification document classification syntactic

31.8   Automatically Acquiring Conceptual Patterns Without an Annotated.. - Riloff, Shoen (1995)   (Correct)
Previous work on automated dictionary construction for information extraction has relied on annotated text corpora. However, annotating a corpus is time-consuming and difficult. We propose that conc... / on an untagged text corpus. Text classification experiments in the MUC-

29.7   Exploration of Text Collections with Hierarchical Feature Maps - Merkl (1997)   (Correct)
Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to... / maps to perform the text classification task. Each of the br Abstract Document classification is one of the central

28.9   Recent Experiments with INQUERY - Allan (1995)   (Correct)
this paper focuses on relevant differences to the previously published algorithms. 1 Description of Ad-Hoc Experiments unknown Recent Experiments with INQUERY James Allan, Lisa Ballesteros, James P. ... / broader research interests in document routing document filtering br interests in document routing document filtering distributed IR

28.5   Exploiting Structure for Intelligent Web Search - Kruschwitz (2001)   (Correct)
Together with the rapidly growing amount of online data we register an immense need for intelligent search engines that access a restricted amount of data as found in intranets or other limited domain... / Structural Information for Text Classification on the WWW. In Proceedings br to be built in advance. Document classification is closely related to

28.5   Micro-Workflow: A Workflow Architecture Supporting Compositional.. - Manolescu (2001)   (Correct)
This dissertation proposes micro-workflow, a new workflow architecture that bridges the gap between the type of functionality provided by current workflow systems and the type of workflow functionalit... / . From Document Routing to Middleware Services

28.5   Optimizing the parSOM Neural Network Implementation for Data Mining.. - Tomsich, Rauber, Merkl (2001)   (Correct)
The self-organizing map is a prominent unsupervised neural network model which lends itself to the analysis of high-dimensional input data and data mining applications. However, the high execution tim... / application arena is text classification where documents in a

28.5   Toward Optimal Active Learning through Sampling Estimation of Error.. - Roy, McCallum, al. (2001)   (Correct)
This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. Thes... / work. . Naive Bayes Text Classification Text classification is br . Naive Bayes Text Classification Text classification is not only

28.5   The Power of Word Clusters for Text Classification - Slonim, Tishby (2001)   (Correct)
The recently introduced Information Bottleneck method [21] provides an information theoretic framework, for extracting features of one variable, that are relevant for the values of another variable. S... /

28.5   A Statistical Learning Model of Text Classification for Support.. - Joachims (2001)   (Correct)
This paper develops a theoretical learning model of text classification for Support Vector Machines (SVMs). It connects the statistical properties of text-classification tasks with the generalization ... / Statistical Learning Model of Text Classification for Support Vector

28.5   Incremental Document Clustering for Web Page Classification - Wai-Chiu Wong And (2001)   (Correct)
Introduction We consider document clustering for Web pages. Traditionally, the document classification task is carried out manually. In order to assign a document to an appropriate class, people woul... / work conducted on automatic text classification. One approach is to learn br pages. Traditionally the document classification task is carried out

28.5   SOM-based Methodology for Building Large Text Archives - Arnulfo Azcarraga And (2001)   (Correct)
Self-Organizing Maps (SOMs) have recently been used to archive over 7 million documents. Not only have SOMs been shown to scale up to very large document collections, these maps also allow for a novel... / various general methods for text classification. Benchmark cases are now br to feature extraction and document classification. These techniques span

28.5   On the Automated Classification of Web Sites - Pierre (2001)   (Correct)
In this paper we discuss several issues related to automated text classification of web sites. We analyze the nature of web content and metadata in relation to requirements for text features. We find ... /

28.5   Weight adjustment schemes for a centroid based classifier - Shankar, Karypis (2000)   (Correct)
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intra-nets. Automatic text categorization... / clustering of words for text classification. In SIGIR- . br has been widely used for document classification and has been shown to

28.5   Analyzing the Effectiveness and Applicability of Co-training - Nigam, Ghani (2000)   (Correct)
Recently there has been significant interest in supervised learning algorithms that combine labeled and unlabeled data for text learning tasks. The co-training setting [1] applies to datasets that hav... / labeled and unlabeled data text classification . INTRODUCTION There

28.5   Transforming Paper Documents into XML Format with WISDOM++ - Altamura, Esposito, Malerba (2000)   (Correct)
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of th... / document analysis document classification document br analysis document classification document understanding text

28.5   Assessing the Calibration of Naive Bayes' Posterior Estimates - Bennett (2000)   (Correct)
In this paper, we give evidence that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length. While exponential change may be expected as new bits of informati... / reliability posterior text classification Reuters Introduction

28.5   Data mining models as services on the internet - Sarawagi, Nagaralu (2000)   (Correct)
The goal of this article is to raise a debate on the usefulness of providing data mining models as services on the internet. These services can be provided by anyone with adequate data and expertise a... / servers. . . Document classification services An imminently

28.5   The Class Imbalance Problem: Significance and Strategies - Japkowicz (2000)   (Correct)
Although the majority of conceptlearning systems previously designed usually assume that their training sets are well-balanced, this assumption is not necessarily correct. Indeed, there exist many dom... / domains. In particular text classification domains could be good test

28.5   Text Classification Using WordNet Hypernyms - Scott, Matwin (1998)   (Correct)
This paper describes experiments in Machine Learning for text classification using a new representation of text based on WordNet hypernyms. Six binary classification tasks of varying difficulty are de... / Text Classification Using WordNet Hypernyms

28.5   ifile: An Application of Machine Learning to E-Mail Filtering - Rennie (1998)   (Correct)
With the proliferation of electronic mail in the modern era, it becomes ever more important to devise methods for the organization, categorization and searching of such mail. Mail filtering is such a ... / filter which makes use of the text classification algorithm naive Bayes. br Results . Document Classification Accuracies Experiment

27.2   A Tutorial on Automated Text Categorisation - Sebastiani (1999)   (Correct)
The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved... / first introduced to the text classification literature in Batch br retrieval techniques and document classification The machine learning

27.2   Automatic Labeling of Self-Organizing Maps for Information Retrieval - Merkl, Rauber (1999)   (Correct)
The self-organizing map is a very popular unsupervised neural network model for the analysis of high dimensional input data as in information retrieval applications. However, the interpretation of the... / approach on an example from text classification using a real-world br experimental results from document classification with self-organizing

27.2   Building Hierarchical Classifiers Using Class Proximity - Wang, Zhou, Liew (1999)   (Correct)
In this paper, we address the need to automatically classify text documents into topic hierarchies like those in ACM Digital Library and Yahoo!. The existing local approach constructs a classifier at ... / The central issue in document classification is separating feature

27.2   Automatic Verb Classification Using Distributions of Grammatical.. - Stevenson, Merlo (1999)   (Correct)
We apply machine learning techniques to classify automatically a set of verbs into lexical semantic classes, based on distributional approximations of diatheses, extracted from a very large annota... / Dorr and document classification Klavans and Kan

27.2   Workflow Support for Electronic Commerce Applications - Kumar, Zhao (1999)   (Correct)
Internet-based electronic commerce is becoming the next frontier of new business opportunities. However, commerce on the Internet is seriously hindered by the lack of a common language for collaborati... / does not provide support for document routing. In this paper we describe br Language provides support for routing documents and managing workflows

27.2   Clustering Transactions Using Large Items - Ke Wang Chu (1999)   (Correct)
In traditional data clustering, similarity of a cluster of objects is measured by pairwise similarity of objects in that cluster. We argue that such measures are not appropriate for transactions that ... / term widely used in text document classification and clustering is

27.2   FACILE: Classifying Texts Integrating Pattern Matching and.. - Ciravegna, Lavelli, Mana, Matiasek.. (1999)   (Correct)
Successfully managing information means being able to find relevant new information and to correctly integrate it with pre-existing knowledge. Much information is nowadays stored as multilingual textu... / More linguistically oriented text classification can be achieved by the use

27.2   Machine Learning in Automated Text Categorisation - Sebastiani (1999)   (Correct)
this paper. Aside from (i) the automatic assignment of documents to a predefined set of categories, which is the main topic of this paper, the term has also been used to mean (ii) the automatic defini... / text categorisation text classification . INTRODUCTION In the br E.andGuthriz J. A. . Document classification by machine theoryand

27.2   Machine Learning in Automated Text Categorisation: a Bibliography - Sebastiani (1999)   (Correct)
m, NL. Yu, E. S. and Liddy, E. D. 1999. Feature selection in text categorization using the Baldwin e#ect. In Proceedings of IJCNN-99, International Joint Conference on Neural Networks (Washington, DC... / H. . Improving short text classification using unlabeled background br A probabilistic approach to document classification. In E. A. Fox P.

26.0   Method Combination For Document Filtering - Hull (1996)   (Correct)
There is strong empirical and theoretic evidence that combination of retrieval methods can improve performance. In this paper, we systematically compare combination strategies in the context of docume... / and optimizing autonomous text classification systems. In Proceedings br is distinguished from document routing because it is assumed that

25.5   A Multilevel Approach to Intelligent Information Filtering: Model.. - Mostafa (1997)   (Correct)
this article, a filtering model is proposed that decomposes the overall task into subsystem functionalities and highlights the need for multiple adaptation techniques to cope with uncertainties. A fil... / by a vector-space model document classification by unsupervised learning

23.1   A Comparison of Classifiers and Document Representations for the.. - Schütze, Hull, Pedersen (1995)   (Correct)
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification techni... / as a problem of statistical text classification. Documents are to be br of statistical text classification. Documents are to be assigned to

22.8   Automated Learning and Discovery: State-Of-The-Art and Research.. - Thrun, Faloutsos, Mitchell, Wasserman (1998)   (Correct)
This report summarizes the CONALD meeting, which took place June 11-13, 1998, at Carnegie Mellon University. CONALD brought together an interdisciplinary group of scientists, concerned with decision m... / information retrieval and text classification. In information retrieval br in text streams document routing and classification

22.4   Feature Selection and Feature Extraction for Text Categorization - Lewis (1992)   (Correct)
The effect of selecting varying numbers and kinds of features for use in predicting category membership was investigated on the Reuters and MUC-3 text categorization data sets. Good categorization per... / in probabilistic models for text classification tasks the formula in br topic categories to support document routing and retrieval. Particular

22.2   Document Image Understanding: Geometric and Logical Layout - Haralick (1994)   (Correct)
Introduction Document Image Understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like floppies, tapes, and cdroms. The physical reader... / for Layout Analysis in Document Classification nd ICDAR Tsukuba

21.2   Using Grammatical Inference to Improve Precision in Information.. - Freitag (1997)   (Correct)
The field of information extraction (IE) is concerned with applying natural language processing (NLP) and information retrieval (IR) techniques to the automatic extraction of essential details from te... / shown to work for document classification also work with suitable

20.2   Cluster-Based Text Categorization: A Comparison of Category Search.. - Makoto, Takenobu (1995)   (Correct)
Text categorization can be viewed as a process of category search, in which one or more categories for a test document are searched for by using given training documents with known categories. In this... / clustering for automatic text classification. In Proceedings of the br methods for automatic document classification. Journal of

18.1   Creating Customized Authority Lists - Chang, Cohn, McCallum (1999)   (Correct)
The proliferation of hypertext and the popularity of Kleinberg 's HITS algorithm have brought about an increased interest in link analysis. While HITS and its older relatives from the Bibliometrics li... / and uses statistical text classification to categorize the papers

18.1   Feature Reduction for Neural Network Based Text Categorization - Savio Lam (1999)   (Correct)
In a text categorization model using an artificial neural network as the text classifier, scalability is poor if the neural network is trained using the raw feature space since textural data has a ver... / neural network model used for text classification. In Section the

18.1   Text Classification by Bootstrapping with Keywords, EM and Shrinkage - McCallum, Nigam (1999)   (Correct)
When applying text classi cation to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance. This paper presents an alternative appro... /

18.1   Multi-Label Text Classification with a Mixture Model Trained by EM - McCallum (1999)   (Correct)
In many important document classi cation tasks, documents may each be associated with multiple class labels. This paper describes a Bayesian classi cation approach in which the multiple classes that c... /

18.1   Information Filtering in Changing Domains - Lanquillon (1999)   (Correct)
The task of information filtering is to classify documents from a stream into either relevant or irrelevant according to a particular user interest with the objective to reduce information load. When ... / seminal work on autonomous text classification system see Lewis

18.1   Feature selection for unbalanced class distribution and Naive Bayes - Mladenic, Grobelnik (1999)   (Correct)
This paper describes an approach to feature subset selection that takes into account problem specifics and learning algorithm characteristics. It is developed for the Naive Bayesian classifier applied... / cross entropy used in text-classification experiments Koller and br process since for each document classification not all but only

18.1   Exploiting Structural Information for Text Classification on the WWW - Fürnkranz (1999)   (Correct)
In this paper, we report on a set of experiments that explore the utility of making use of the structural information of WWW documents. Our working hypothesis is that it is often easier to classify ... / Structural Information for Text Classification on the WWW Johannes

18.1   Practical Evaluation of IR within Automated Classification Systems - Dolin, Pierre, Butler, Avedon (1999)   (Correct)
This paper describes some of the work we have done to evaluate and compare the use of three IR systems (Verity, LSI, and SMART) as black boxes within an automated classification environment. We use au... / classification schemes and document classification. There are many commonly br Retrieval System for document routing. feel that an analysis of

18.1   Semantic Indexing for a Complete Subject Discipline - Chung, He, Powell, Schatz (1999)   (Correct)
As part of the Illinois Digital Library Initiative (DLI) project we developed "scalable semantics" technologies. These statistical techniques enabled us to index large collections for deeper search th... / the hierarchical subject classification. Documents from INSPEC were used

18.1   PowerBookmarks: A System for Personalizable Web Information.. - Quoc (1999)   (Correct)
We extend the notion of bookmark management by introducing the functionalities of hypermedia databases. PowerBookmarks is a Web information organization, sharing, and management tool, which parses met... / Personalization Classification Document Integration Internet

17.3   Applying an Existing Machine Learning Algorithm to Text Categorization - Moulinier (1996)   (Correct)
The information retrieval community is becoming increasingly interested in machine learning techniques, of which text categorization is an application. This paper describes how we have applied an ex... / components include document routing to topic-specific processing

17.3   Text-Based Information Retrieval Using Exponentiated Gradient Descent - Papka, Callan, Barto (1996)   (Correct)
The following investigates the use of single-neuron learning algorithms to improve the performance of text-retrieval systems that accept natural-language queries. A retrieval process is explained that... / document ranking and document classification. A ranking of documents

17.3   What do Advanced Transaction Models Have to Offer for Workflows ? - Worah (1996)   (Correct)
Workflow management systems are finding wide applicability in small and large organizational settings. In this paper, we briefly review four large-scale applications to gauge their modeling and run-ti... / activities and facilitating document routing imaging and reporting.

17.2   Multilevel Security in the UNIX Tradition - McIlroy, Reeds (1992)   (Correct)
The original UNIXÒ system was designed to be small and intelligible, achieving power by generality rather than by a profusion of features. In this spirit we have designed and implemented IX, a multile... / system. IX supports document classification with mandatory access

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute