This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
ACIRD: Intelligent Internet Documents Organization and Retrieval - Lin, Chen, Ho, Huang (2002)(Correct)
In this paper, we present an intelligent Internet information system ACIRD using machine learning
techniques to organize and retrieve Internet Web documents. ACIRD consists of three parts:
knowledge a... / A New Probabilistic Model of Text Classification and Retrieval UMass br knowledge extraction and document classification. Based on the learned
A Study of Approaches to Hypertext Categorization - Yang, Slattery, Ghani (2002)(Correct)
Hypertext poses new research challenges for text classification. Hyperlinks, HTML tags, category labels distributed over linked documents, and meta data extracted from related web sites all provide ... / new research challenges for text classification. Hyperlinks HTML tags
Fast and Accurate Text Classification Via Multiple Linear.. - Chakrabarti, Roy, Soundalgekar (2002)(Correct)
Support vector machines (SVMs) have shown superb performance for text classification tasks. They are accurate, robust, and quick to apply to test instances. Their only potential drawback is their trai... / Fast and accurate text classification via multiple linear br is impractical in the document classification domain because it is
Web Genre Visualization - Dimitrova, Finn, Kushmerick, Smyth (2002)(Correct)
Web users vary widely in terms of their expertise on the
topics for which they search, the amount of detail they seek,
etc. Unfortunately, today's one-size-fits-all Web search
services do not cater to... / We describe how shallow text classification techniques can be used to
Bornholm Text analysis - Rup Nielsen Informatics (2002)(Correct)
this document
also viewed"
VECTOR SPACE MODEL
gyrus fmri pet cortex .. unknown Bornholm Text analysis
rup Nielsen
Informatics and Mathematical Modelling
Technical University of Denmark
DK-28... / April Example On Text Classification
Using Unlabeled Data to Improve Text Classification - Nigam (2001)(Correct)
One key difficulty with text classification learning algorithms is that
they require many hand-labeled examples to learn accurately. This dissertation
demonstrates that supervised learning algorithm... / Unlabeled Data to Improve Text Classification Kamal Paul Nigam May br entity. Keywords text classification text categorization unlabeled
Exploiting Structure for Intelligent Web Search - Kruschwitz (2001)(Correct)
Together with the rapidly growing amount of online data we register an immense need for intelligent search engines that access a restricted amount of data as found in intranets or other limited domain... / Structural Information for Text Classification on the WWW. In Proceedings br to be built in advance. Document classification is closely related to
Automatic Hierarchical E-Mail Classification Using Association Rules - Itskevitch (2001)(Correct)
The explosive growth of on-line communication, in particular e-mail communication, makes
it necessary to organize the information for faster and easier processing and searching.
Storing e-mail message... / message. It was shown that text classification methods that deviate from br Definition . Text Classification Text Classification is the
Predictive Self-Organizing Networks for Text Categorization - Tan (2001)(Correct)
This paper introduces a class of predictive self-organizing neural networks known as Adaptive Resonance Associative Map (ARAM) for classification of free-text documents. Whereas most statistical app... / Map ARAM for text classification based on a popular public
Using Compression For Source Based Classification Of Text - Thaper (2001)(Correct)
This thesis addresses the problem of source based text classification. In a nutshell, this problem involves classifying documents according to "where they came from" instead of the usual "what they co... / the problem of source based text classification. In a nutshell this br . Compression for Classification Text compression techniques
Applying Co-Training methods to Statistical Parsing - Sarkar (2001)(Correct)
We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures ... / been used successfully in text classification in combination of labeled br Yarowsky document classification Blum and Mitchell
Facilitating the exchange of explicit knowledge through ontology.. - Lacher, Groh (2001)(Correct)
In this paper, we give an overview of a system
(CAIMAN) that can facilitate the exchange of relevant
documents between geographically dispersed people in
Communities of Interest. The nature of Comm... / learning techniques for text classification a concept in a personal br Wolff W. . Automatic document classification A thorough evaluation of
Hierarchical Classification of Real Life Documents - Ke Wang Senqiang (2001)(Correct)
and Current class. A
class has the form of branch/sub-branch. For example, 451/430 denotes the class
corresponding to branch 451 and sub-branch 430. Most documents are associated
with one class, and t... / topics. However most document classification techniques assume that
Intelligent Information Triage - Macskassy, Hirsh, Provost, Ramesh (2001)(Correct)
In many applications, large volumes of time-sensitive textual information
require triage: rapid, approximate prioritization for
subsequent action. In this paper, we explore the use of prospective
ind... / corpora can be used to train text classification procedures that will
Document Filtering Boosted By Unlabeled Data - Park, Zhang (2001)(Correct)
This paper describes three learning methods for document
filtering that use unlabeled data. The proposed methods
are based on a committee of the classifiers which are
trained on a small set of labeled... / of unlabeled examples in text classification provides information about
Toward Optimal Active Learning through Sampling Estimation of Error.. - Roy, McCallum, al. (2001)(Correct)
This paper presents an active learning method that directly
optimizes expected future error. This is in contrast
to many other popular techniques that instead
aim to reduce version space size. Thes... / work. . Naive Bayes Text Classification Text classification is br . Naive Bayes Text Classification Text classification is not only
Improving Multi-class Text Classification with Naive Bayes - Rennie (2001)(Correct)
There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seek... / Improving Multi-class Text Classification with Naive Bayes by Jason br an essential part of text classification. Document collections have
Hierarchical Classification of Documents with Error Control - Cheng, Tang, Fu, King (2001)(Correct)
Classification is a function that matches a new object with
one of the predefined classes. Document classification is characterized by
the large number of attributes involved in the objects (docum... / of the predefined classes. Document classification is characterized by the br classes. A special kind of classification document classification has
The Decor Toolbox For Workflow-Embedded Organizational Memory Access - Abecker, Bernardi, al. (2001)(Correct)
We shortly motivate the idea of business-process oriented knowledge management (BPOKM) and sketch
the basic approaches to achieve this goal. Then we describe the DECOR (Delivery of context-sensitiv... / tool to an automatic text classification software. Currently we br documentation automated document routing planning support etc. To
Multivariate Information Bottleneck - Friedman, Mosenzon, Slonim, Tishby (2001)(Correct)
The Information bottleneck method is an unsupervised non-parametric data organization technique. Given a joint distribution P(A,B), this method constructs a new variable T that extracts partitions, or... / been employed for evaluating text classification techniques e.g. br already been applied to document classification gene expression neural
Finding Semantically Related Words in Large Corpora - Smrz, Rychl (2001)(Correct)
The paper deals with the linguistic problem of fully automatic
grouping of semantically related words. We discuss the measures of semantic
relatedness of basic word forms and describe the treatmen... / in machine translation document classification information retrieval
Active Models for Dynamic Networked Organisations - Jřrgensen, al. (2001)(Correct)
This paper points to active models as a general technique
for increasing the flexibility of computerised information
systems. Active models are available for manipulation by
the users at runtime, and ... / Models Workflow document classification and retrieval
Kernel expansions with unlabeled examples - Szummer, Jaakkola (2001)(Correct)
Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for ... / with naive Bayes models for text classification the co-training
Learning to Match and Cluster Entity Names - Cohen, Richman (2001)(Correct)
Introduction
Information retrieval is, in large part, the study of methods for assessing the similarity of pairs of
documents. Document similarity metrics have been used for many tasks including ad h... / ad hoc document retrieval text classification YC and summarization
Maximum Likelihood Estimation for Filtering Thresholds - Yi Zhang Jamie (2001)(Correct)
Information filtering systems based on statistical retrieval models
usually compute a numeric score indicating how well each
document matches each profile. Documents with scores above
profile-specific... / learning algorithms for text classification have also been used for br A. Singhal. . Boosting for document routing. In Proceedings of the
On the Automated Classification of Web Sites - Pierre (2001)(Correct)
In this paper we discuss several issues related to automated text
classification of web sites. We analyze the nature of web content
and metadata in relation to requirements for text features. We
find ... / issues related to automated text classification of web sites. We analyze
The Overview of Web Search Engines - Lam (2001)(Correct)
The World Wide Web allows people to share information globally. The amount
of information grows without bound. In order to extract information that we are
interested in, we need a tool to search the W... / IR includes modeling document classification and categorization
AdaBoost for Query-by-Example in Text - Ultis (2001)(Correct)
This paper describes an implementation of query-by-example, or relevance
feedback, for text. The implementation uses Google's search engine
to perform a keyword query as requested by the user. If th... / to text filtering and text classification. Schapire et. al.
Optimizing Search by Showing Results In Context - Dumais, Cutrell, Chen (2001)(Correct)
We developed and evaluated seven interfaces for integrating
semantic category information with Web search results. List
interfaces were based on the familiar ranked-listing of search
results, sometime... / In addition automatic text classification techniques are used to
Statistical Classification Methods for Arabic News Articles - Sawaf, Zaplo, Ney (2001)(Correct)
In this paper, we present experimental
results on document clustering and
classification achieved on the Arabic
NEWSWIRE corpus using statistical
methods. Arabic is a highly inflecting
language. ... / analysis. Introduction Text classification is a fundamental task in br system. . Text Classification Text classification as
Incremental Document Clustering for Web Page Classification - Wai-Chiu Wong And (2001)(Correct)
Introduction
We consider document clustering for Web pages. Traditionally, the document
classification task is carried out manually. In order to assign a document to
an appropriate class, people woul... / work conducted on automatic text classification. One approach is to learn br pages. Traditionally the document classification task is carried out
SOM-based Methodology for Building Large Text Archives - Arnulfo Azcarraga And (2001)(Correct)
Self-Organizing Maps (SOMs) have recently been used to archive over 7 million documents. Not
only have SOMs been shown to scale up to very large document collections, these maps also
allow for a novel... / various general methods for text classification. Benchmark cases are now br to feature extraction and document classification. These techniques span
Business-Process Oriented Delivery of Knowledge through Domain.. - Abecker, Mentzas (2001)(Correct)
We shortly motivate the idea of possible IT support
business-process oriented knowledge management
(BPOKM) and sketch some basic approaches to
achieve this goal. Then we describe the DECOR
(Delivery o... / tool to an automatic text classification software. Currently we br documentation automated document routing planning support etc. To
Words with Attitude - Jaap Kamps Maarten (2001)(Correct)
The traditional notion of word meaning
used in natural language processing is
literal or lexical meaning as used in dictionaries
and lexicons. This relatively
objective notion of lexical meaning i... /
Concept Indexing - A Fast Dimensionality Reduction Algorithm with.. - Karypis, Han (2000)(Correct)
In recent years, we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. This has led to an increased i... / clustering of words for text classification. In SIGIR- . br are primarily used for document classification and for improving the
Centroid-Based Document Classification: Analysis & Experimental.. - Han, Karypis (2000)(Correct)
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet,
digital libraries, news sources, and company-wide intranets. Automatic text categorization,... / Naive Bayesian algorithm for text classification Rainbow has options br Centroid-Based Document Classification Analysis Experimental
Weight adjustment schemes for a centroid based classifier - Shankar, Karypis (2000)(Correct)
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital
libraries, news sources, and company-wide intra-nets. Automatic text categorization... / clustering of words for text classification. In SIGIR- . br has been widely used for document classification and has been shown to
Relevance and Reinforcement in Interactive Browsing - Leuski (2000)(Correct)
We consider the problem of browsing the top ranked portion
of the documents returned by an information retrieval
system. We describe an interactive relevance feedback agent
that analyzes the inter-doc... / about the individual terms. Text classification and categorization is where
Optimization Approaches to Semi-Supervised Learning - Demiriz, Bennett (2000)(Correct)
We examine mathematical models for semi-supervised support vector machines
(S
3
VM). Given a training set of labeled data and a working set of unlabeled data,
S
3
VM constructs a support vector ma... / methods on web-based text classification problems for example using
Athena: Mining-based Interactive Management of Text Databases - Agrawal, Bayardo, Srikant (2000)(Correct)
We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive miningbased operations. Requirements of any such system include speed and ... / with other techniques for text classification CDAR LR Lan br for hierarchy reorganization document routing and identification of
The Challenge of Discovering Meta-Data - Morik, Haustein (2000)(Correct)
Introduction
Machine learning research has always been driven by
scenarios. The scenarios were put together from ideas
of anthropological disciplines (e.g., linguistics, cognitive
and social science,... / learning tasks to be solved. Text classification that recognizes a document
Restricted Bayes Optimal Classifiers - Tong, Koller (2000)(Correct)
We introduce the notion of restricted Bayes optimal classifiers. These classifiers attempt to combine the flexibility of the generative approach to classification with the high accuracy associated wit... / Naive Bayes classifier in text classification Mitchell We
Boosting for Document Routing - Iyer, Lewis, Schapire, Singer.. (2000)(Correct)
RankBoost is a recently proposed algorithm for learning ranking
functions. It is simple to implement and has strong justifications
from computational learning theory. We describe the algorithm
and pre... / in a study of boosting for text classification and used the same br Boosting for Document Routing Raj D. Iyer David
A Comparative Study on Chinese Text Categorization Methods - He, Tan, Tan (2000)(Correct)
This paper reports our comparative evaluation of three machine learning methods
on Chinese text categorization. Whereas a wide range of methods have been applied
to English text categorization, relati... / performance for Chinese text classification. In our experiments we br has not been used for document classification. To evaluate the three
Language Model Adaptation - Gotoh (2000)(Correct)
15> attempt to exploit longer distance dependencies.
-- infer some notion of `topic' from text.
-- compute topic dependent probability.
8th ELSNET summer school 2
Language Model Adaptation 26 Jul... / clustering of words for text classification. In Proceedings of br application to document classification. ffl Brown et al.
Analyzing the Effectiveness and Applicability of Co-training - Nigam, Ghani (2000)(Correct)
Recently there has been significant interest in supervised
learning algorithms that combine labeled and unlabeled data
for text learning tasks. The co-training setting [1] applies to
datasets that hav... / labeled and unlabeled data text classification . INTRODUCTION There
Feature Selection and Dualities in Maximum Entropy Discrimination - Jebara, Jaakkola (2000)(Correct)
Incorporating feature selection into a classification
or regression method often carries
a number of advantages. In this paper we
formalize feature selection specifically from a
discriminative per... / Transductive Inference for Text Classification using Support Vector br ranging from image and document classification to problems in
Thesis Proposal - Thomas (2000)(Correct)
AI has long been applied to the problem of predicting financial markets.
Recently, developments in both AI and financial economics have
opened up the possibility for close collaboration between the ... / Second I plan to adapt text classification and related techniques for
Recognizing End-User Transactions in Performance Management - Hellerstein, Jayram, Rish (2000)(Correct)
Providing good quality of service (e.g., low response
times) in distributed computer systems requires measuring
end-user perceptions of performance. Unfortunately,
in practice such measures are often ... / to metrics typically used in text classification. The second approach is to br is akin to work done in document classification. The second problem is
Centroid-Based Document Classification: Analysis Experimental Results - Han (2000)(Correct)
In this paper we present a simple linear-time centroid-based document
classification algorithm, that despite its simplicity and robust performance,
has not been extensively studied and analyzed. O... / to be very effective in text classification We were not able to br Centroid-Based Document Classification Analysis Experimental
Clustering by means of Unsupervised Decision Trees or Hierarchical.. - Bellot, El-Bèze (2000)(Correct)
A classical information retrieval system returns a list of documents to a user query. The answer list is often so
long that users cannot explore all the documents retrieved. A classification of the re... / we present and compare two text classification algorithms. The first one br commonly used to evaluate document classification for information retrieval
Hierarchical Classification of Web Content - Dumais, Chen (2000)(Correct)
This paper explores the use of hierarchical structure for
classifying a large, heterogeneous collection of web
content. The hierarchical structure is initially used to train
different second-level cla... / structures. KEYWORDS Text classification text categorization br KEYWORDS Text classification text categorization
Machine Learning for Intelligent Processing of Printed Documents - Esposito, Malerba, Lisi (2000)(Correct)
A paper document processing system is an information system component which
transforms information on printed or handwritten documents into a computer-revisable form. In
intelligent systems for pape... / Authors Document analysis Document classification Document understanding br Document analysis Document classification Document understanding Figure .
Less is More: Active Learning with Support Vector Machines - Schohn, Cohn (2000)(Correct)
We describe a simple active learning heuristic
which greatly enhances the generalization behavior
of support vector machines (SVMs) on several
practical document classification tasks. We
observe a... / particularly those involving text classification Joachims b Dumais et br on several practical document classification tasks. We observe a
Language Sensitive Text Classification - Basili, Moschitti, Pazienza (2000)(Correct)
It is a traditional belief that in order to scale-up to more effective retrieval and access methods modern Information
Retrieval has to consider more the text content. The modalities and techniques to... / Language Sensitive Text Classification Roberto Basili
Enhancing Supervised Learning with Unlabeled Data - Goldman, Zhou (2000)(Correct)
In many practical learning scenarios, there is
a small amount of labeled data along with
a large pool of unlabeled data. Many supervised
learning algorithms have been developed
and extensively stu... / applications to the area of text classification. For example Riloff and
Theme-based Retrieval of Web News - Maria, Silva (2000)(Correct)
We introduce an information system for organization and retrieval
of news articles from Web publications, incorporating a
classification framework based on Support Vector Machines. We
present the data... / information retrieval and text classification tools are necessary. The
Transforming Paper Documents into XML Format with WISDOM++ - Altamura, Esposito, Malerba (2000)(Correct)
The transformation of scanned paper documents to a form suitable for an
Internet browser is a complex process that requires solutions to several problems. The
application of an OCR to some parts of th... / document analysis document classification document br analysis document classification document understanding text
Topic-Based Mixture Language Modelling - Gotoh (2000)(Correct)
This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The appr... / clustering of words for text classification. In Proceedings of br recently been applied to document classification by Baker and McCallum
An Adaptive and Distributed Framework for Advanced IR - Basili, Pazienza, al. (2000)(Correct)
It has been often noticed that modern IR ((Gregory, 1991), (Alan, 1991)) should exhibit capabilities that are sensitive
to the document content, integrate interactivity, multimodality and multilingual... / processors for content-driven text classification. A full toolkit system was
Learning to Create Customized Authority Lists - Chang, Cohn (2000)(Correct)
The proliferation of hypertext and the popularity
of Kleinberg's HITS algorithm have
brought about an increased interest in link
analysis. While HITS and its older relatives
from the Bibliometrics... / into plain text. Statistical text classification is used to categorize the
On Behavior Classification in Adversarial Environments - Riley, Veloso (2000)(Correct)
In order for robotic systems to be successful in domains with other agents possibly interfering with the accomplishing of goals, the agents must be able to adapt to the opponents' behavior. The more q... / other complex domains such as text classification with the bag-of-words
Supporting Distributed Cooperative Work in CAGIS - Ramampiaro (2000)(Correct)
This paper describes how the CAGIS environment can be used to manage work-processes, cooperative processes, and how to share and control information in a distributed, heterogeneous environment. We hav... / Domain Model Construction Document Classification and Browsing br concepts. Document Classification Documents are classified by
Toward Using Text Summarization for Essay-Based Feedback - Burstein, Marcu (2000)(Correct)
We empirically study the impact of using automatically generated summaries in the context of
electronic essay rating. Our results indicate that 40% and 60% discourse-based essay
summaries improve the ... / Consider for a moment the document classification task used by DARPA during
Mining E-mail Authorship - de Vel (2000)(Correct)
In this paper we report an investigation into the learning
of authorship identification or categorisation for the case of
e-mail documents. We use various e-mail document features
such as structural c... / document. Work in e-mail text classification has also been undertaken
Fast Supervised Dimensionality Reduction Algorithm with Applications.. - Karypis, Han (2000)(Correct)
Retrieval techniques based on dimensionality reduction, such
as Latent Semantic Indexing (LSI), have been shown to improve
the quality of the information being retrieved by capturing
the latent meanin... / clustering of words for text classification. In SIGIR- . br are primarily used for document classification and for improving the
Assessing the Calibration of Naive Bayes' Posterior Estimates - Bennett (2000)(Correct)
In this paper, we give evidence that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length. While exponential change may be expected as new bits of informati... / reliability posterior text classification Reuters Introduction