Home     Top: Information Retrieval: Classification    [Classification   Digital Libraries   Extraction   Filtering   Metasearch   Retrieval   Search Engines   World Wide Web]

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Reverse date order

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

Diagnosis in Real Time for Evolutionary Processes in Using Pattern.. - Mouchaweh (2004)   (Correct)
In this paper, we propose to use the evidence classification method Fuzzy Pattern Matching (FPM) to realize the diagnosis in real time. Then we show how the integration of the incremental learning in ... / mining bioinformatics document classification image analysis remote

On the Naive Bayes Model for Text Categorization - Eyheramendy, Lewis, Madigan (2003)   (Correct)
This paper empirically compares the performance of four probabilistic models for text classification- Poisson, Bernoulli, Multinomial and Negative Binomial. We examine the "naive Bayes" assumption in ... /

Abductive Theorem Proving for Analyzing Student Explanations - Jordan, Makatchev, VanLehn (2003)   (Correct)
The Why2-Atlas tutoring system presents students with qualitative physics questions and encourages them to explain their answers via natural language. Although there are inexpensive techniques for ana... /

Automatic Discrimination Of Text Images - Alessi, Battiato, Gallo, Mancuso.. (2003)   (Correct)
This paper introduces a method for the automatic discrimination of digital images based on their semantic content. The proposed system allows to detect if a digital image contains or not a text. This ... / Since text non-text classification could be applied in

On Machine Learning Methods for Chinese Document Categorization - He, Tan (2003)   (Correct)
This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map(ARAM) for Chines... /

ACIRD: Intelligent Internet Documents Organization and Retrieval - Lin, Chen, Ho, Huang (2002)   (Correct)
In this paper, we present an intelligent Internet information system ACIRD using machine learning techniques to organize and retrieve Internet Web documents. ACIRD consists of three parts: knowledge a... / A New Probabilistic Model of Text Classification and Retrieval UMass br knowledge extraction and document classification. Based on the learned

A Study of Approaches to Hypertext Categorization - Yang, Slattery, Ghani (2002)   (Correct)
Hypertext poses new research challenges for text classification. Hyperlinks, HTML tags, category labels distributed over linked documents, and meta data extracted from related web sites all provide ... / new research challenges for text classification. Hyperlinks HTML tags

Bootstrapping an Ontology-based Information Extraction System - Maedche, Neumann, Staab (2002)   (Correct)
Automatic intelligent web exploration will benefit from shallow information extraction techniques if the latter can be brought to work within many different domains. The major bottleneck for this, how... / e-mail routing fine-grained text classification automatic metadata

Extracting Query Modifications from Nonlinear SVMs - Flake, Glover, Lawrence, Giles (2002)   (Correct)
When searching the WWW, users often desire results restricted to a particular document category. Ideally, a user would be able to filter results with a text classifier to minimize false positive resul... /

Fast and Accurate Text Classification Via Multiple Linear.. - Chakrabarti, Roy, Soundalgekar (2002)   (Correct)
Support vector machines (SVMs) have shown superb performance for text classification tasks. They are accurate, robust, and quick to apply to test instances. Their only potential drawback is their trai... / Fast and accurate text classification via multiple linear br is impractical in the document classification domain because it is

Using Web Structure for Classifying and Describing Web Pages - Glover, Tsioutsiouliklis, Lawrence.. (2002)   (Correct)
The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link ... /

Exploiting Confusion Matrices for Automatic Generation of Topic.. - Godbole (2002)   (Correct)
A common way to evaluate a multi-way classifier is a confusion matrix that plots, for each of the learned concepts, the true class of test instances against the predicted classes. Aggregate accuracy... /

Predicting The Sub-Cellular Location Of Proteins From Text Using.. - Stapley, Kelley, Sternberg (2002)   (Correct)
this paper is to treat the protein as a vector of terms from relevant Medline documents. This approach derives from the vector-based model common in information retrieval 8. The term weights of a vect... /

Text Classification and Multilinguism: Getting at Words via N-grams.. - Ismal Biskri Sylvain (2002)   (Correct)
Genuine numerical multilingual text classification is almost impossible if only words are treated as the privileged unit of information. Although text tokenization (in which words are considered as to... /

Web Page Classification Using Spatial - Information Milos Kovacevic (2002)   (Correct)
Extracting and processing information from web pages is an important task in many areas like constructing search engines, information retrieval, and data mining from the Web. Common approach in the ... /

Towards Structural Logistic Regression: Combining Relational and.. - Popescul, Ungar, Lawrence, Pennock (2002)   (Correct)
Inductive logic programming (ILP) techniques are useful for analyzing data in multi-table relational databases. Learned rules can potentially discover relationships that are not obvious in ``flattened... /

Comparing Keyword Extraction Techniques For Websom - Text Archives Arnulfo (2002)   (Correct)
This paper is organized as follows. Section 2 describes the process of deducing the most important keywords. The keyword deduction method is illustrated in section 3 using a WEBSOM-based archive of th... /

Genre Classification and Domain Transfer for Information Filtering - Finn, Kushmerick, Smyth (2002)   (Correct)
The World Wide Web is a vast repository of information, but the sheer volume makes it difficult to identify useful documents. We identify document genre is an important factor in retrieving useful doc... / Koppel and Galit Avneri. Routing documents according to style. In

Partially labeled classification with Markov random walks - Szummer, Jaakkola (2002)   (Correct)
To classify a large number of unlabeled examples we combine a limited number of labeled examples with a Markov random walk representation over the unlabeled examples. The random walk representation ex... / on synthetic examples and on text classification problems. Introduction

Web Genre Visualization - Dimitrova, Finn, Kushmerick, Smyth (2002)   (Correct)
Web users vary widely in terms of their expertise on the topics for which they search, the amount of detail they seek, etc. Unfortunately, today's one-size-fits-all Web search services do not cater to... / We describe how shallow text classification techniques can be used to

Using Structured Self-Organizing Maps in News Integration Websites - Perelomov, Azcarraga, Tan, Chua (2002)   (Correct)
The Bveritas system integrates and organizes news articles from English news websites based in Singapore, Malaysia, Philippines, and Thailand, plus news stories from CNN and Reuters International. The... /

Reversing and Smoothing the Multinomial Naive Bayes Text Classifer - Juan, Ney (2002)   (Correct)
The naive Bayes text classifier has long been a core technique in information retrieval and, more recently, it has emerged as a focus of research itself in machine learning. This paper is concerned ... / smoothing method for text classification and two alternative

Unlabeled Data Can Degrade Classification Performance of Generative.. - Cozman, Cohen (2002)   (Correct)
This paper analyzes the effect of unlabeled training data in generative classifiers. We are interested in classification performance when unlabeled data are added to an existing pool of labeled data. ... / such as web search text classification genetic research and

Text Classification using String Kernels - Lodhi, Saunders, Shawe-Taylor.. (2002)   (Correct)
We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subse... / Published Text Classification using String Kernels Huma

Providing Cross-lingual Information Access in Multilingual Text.. - Steinberger (2002)   (Correct)
Introduction to the linguistic tool set currently available in the AIM sector l IPSC Exploratory Research Project Cross-lingual Indexing (keyword assignment) Cross-lingual Indexing and Document Simi... /

Not Too Hot, Not Too Cold: The Bundled-SVM is Just Right! - Shih, Chang, Rennie, Karger (2002)   (Correct)
The Support Vector Machine (SVM) typically outperforms other algorithms on text classification problems, but requires training time roughly quadratic in the number of training documents. In contra... /

A Comparison Of Efficacy And Assumptions Of Bootstrapping Algorithms.. - Ghani, Jones (2002)   (Correct)
Information Extraction systems offer a way of automating the discovery of information from text documents. Research and commercial systems use considerable training data to learn dictionaries and patt... /

Partially Supervised Classification of Text Documents - Liu, Lee, Yu, Li (2002)   (Correct)
We investigate the following problem: Given a set of documents of a particular topic or class P , and a large set M of mixed documents that contains documents from class P and other types of docume... /

SMOTE: Synthetic Minority Over-sampling Technique - Chawla, Bowyer, Hall, Kegelmeyer (2002)   (Correct)
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-wor... /

Bornholm Text analysis - Rup Nielsen Informatics (2002)   (Correct)
this document also viewed" VECTOR SPACE MODEL gyrus fmri pet cortex .. unknown Bornholm Text analysis rup Nielsen Informatics and Mathematical Modelling Technical University of Denmark DK-28... / April Example On Text Classification

Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: .. - Turney (2002)   (Correct)
unknown Copyright 2002 by National Research Council of Canada Permission is granted to quote short excerpts and to reproduce figures and tables from /

Active Learning for Statistical Natural Language Parsing - Tang, Luo, Roukos (2002)   (Correct)
It is necessary to have a (large) annotated corpus to build a statistical parser. Acquisition of such a corpus is costly and time-consuming. This paper presents a method to reduce this demand using ac... / et al. text classification McCallum and Nigam

Automated Essay Scoring Using Bayes' Theorum - Rudner, Liang (2002)   (Correct)
Two Bayesian models for text classification from the information science field were extended and applied to student produced essays. Both models were calibrated using 462 essays with two score points.... /

An Empirical Study of Active Learning with Support Vector Machines.. - Sassano (2002)   (Correct)
We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. unknown An Empirical Study of Active Learning with Support Vector Machines... / language tasks including text classification Joachims Dumais et

Language and Gender Author Cohort Analysis of E-mail for Computer.. - de Vel, Corney, Anderson, Mohay (2002)   (Correct)
We describe an investigation of authorship gender and language background cohort attribution mining from e-mail text documents. unknown Language and Gender Author Cohort Analysis of E-mail for Comput... / text document. Work in e-mail text classification has also been undertaken

Applying Co-Training to Reference Resolution - Mueller, Rapp, Strube (2002)   (Correct)
In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. unknown Applying Co-Training to Reference Resolution Christoph... /

Dealing with Security within DEEPSIA Project - Milagres, Moreira, Pimentao, Sousa.. (2002)   (Correct)
DEEPSIA (Dynamic on-linE IntErnet Purchasing System based on Intelligent Agents) aims to develop a system to support companies as purchasers in electronic commerce e-procurement processes. To pursue t... /

Using inductive reasoning and reasoning about dynamic domains for.. - Galitsky, Vinogradov (2002)   (Correct)
We report on the novel approach to modeling a dynamic domain with limited knowledge. A domain may include participating agents so that we are uncertain about motivations and decisionmaking principles ... /

Spidering and Filtering Web Pages for Vertical Search Engines - Chau (2002)   (Correct)
Introduction The size of the Web is growing exponentially. The number of indexable pages on the web has exceeded 2 billion (Lyman & Varian, 2000). It is more difficult for search engines to keep an u... /

A Visual Formatting Purpose Representation Language to Enhance.. - Berkowitz, Mastenbrook (2002)   (Correct)
Electronic document collections containing documents in machine-readable form lend themselves to attempts at automated indexing and classification. In fact, in many cases the size of these collections... /

Measuring corpus homogeneity using a range of measures for.. - Cavagliá (2002)   (Correct)
With the ever more widespread use of corpora in language research, it is becoming increasingly important to be able to describe and compare corpora. The analysis of corpus homogeneity is preliminary t... /

Feature Selection for Web Page Classification - Riboni (2002)   (Correct)
Web page classification is significantly different from traditional text classification because of the presence of some additional information, provided by the HTML structure and by the presence of hy... /

Learning Grammars for Noun Phrase Extraction by Partition Search - Belz (2002)   (Correct)
This paper describes an application of Grammar Learning by Partition Search to noun phrase extraction, an essential task in information extraction and many other NLP applications. Grammar Learning by ... /

Literature mining in molecular biology - de Bruijn, Martin (2002)   (Correct)
Literature mining is the process of extracting and combining facts from scientific publications. In recent years, many studies have resulted in computer programs to extract various molecular biology f... /

Multiclass Learning by Probabilistic Embeddings - Dekel, Singer (2002)   (Correct)
We describe a new algorithmic framework for learning multiclass categorization problems. In this framework a multiclass predictor is composed of a pair of embeddings that map both instances and labels... /

Word Sense Disambiguation in Document Space - Linden, Lagus (2002)   (Correct)
We introduce a method for word sense disambiguation that uses an existing topical document map created with an unsupervised method (WEBSOM [1]) on a very large document collection. Results on the SENS... /

Document ANalysis: Table Structure Understanding and Zone Content.. - Wang (2002)   (Correct)
by Yalin Wang Professor Robert M. Haralick Electrical Engineering For the last three decades, the document image analysis researchers have successfully developed many ,nethods for character recog... /

Using Unlabeled Data to Improve Text Classification - Nigam (2001)   (Correct)
One key difficulty with text classification learning algorithms is that they require many hand-labeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithm... / Unlabeled Data to Improve Text Classification Kamal Paul Nigam May br entity. Keywords text classification text categorization unlabeled

Exploiting Structure for Intelligent Web Search - Kruschwitz (2001)   (Correct)
Together with the rapidly growing amount of online data we register an immense need for intelligent search engines that access a restricted amount of data as found in intranets or other limited domain... / Structural Information for Text Classification on the WWW. In Proceedings br to be built in advance. Document classification is closely related to

Improving Category Specific Web Search by Learning Query Modifications - Glover, Flake, Lawrence, Birmingham, .. (2001)   (Correct)
Users looking for documents within specific categories may have a difficult time locating valuable documents using general purpose search engines. We present an automated method for learning query mod... / compared to other methods for text classification A brief

Data-Driven Generation of Decision Trees for Motif-Based Assignment.. - Wang, Wang, Honavar, Dobbs (2001)   (Correct)
This paper describes an approach to data-driven discovery of sequence motif-based models in unknown Data-Driven Generation of Decision Trees for Motif-Based Assignment of Protein Sequences to Funct... / problem is encountered in document classification Salton where it

Evaluating dependence among output errors in ECOC learning machines - Masulli, Valentini (2001)   (Correct)
One of the main factors a ecting the e ectiveness of ECOC methods for classi cation is the dependence among the errors of the computed codeword bits. We present an extended experimental work for ev... /

Automatic Hierarchical E-Mail Classification Using Association Rules - Itskevitch (2001)   (Correct)
The explosive growth of on-line communication, in particular e-mail communication, makes it necessary to organize the information for faster and easier processing and searching. Storing e-mail message... / message. It was shown that text classification methods that deviate from br Definition . Text Classification Text Classification is the

Predictive Self-Organizing Networks for Text Categorization - Tan (2001)   (Correct)
This paper introduces a class of predictive self-organizing neural networks known as Adaptive Resonance Associative Map (ARAM) for classification of free-text documents. Whereas most statistical app... / Map ARAM for text classification based on a popular public

Using Compression For Source Based Classification Of Text - Thaper (2001)   (Correct)
This thesis addresses the problem of source based text classification. In a nutshell, this problem involves classifying documents according to "where they came from" instead of the usual "what they co... / the problem of source based text classification. In a nutshell this br . Compression for Classification Text compression techniques

Micro-Workflow: A Workflow Architecture Supporting Compositional.. - Manolescu (2001)   (Correct)
This dissertation proposes micro-workflow, a new workflow architecture that bridges the gap between the type of functionality provided by current workflow systems and the type of workflow functionalit... / . From Document Routing to Middleware Services

Applying Co-Training methods to Statistical Parsing - Sarkar (2001)   (Correct)
We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures ... / been used successfully in text classification in combination of labeled br Yarowsky document classification Blum and Mitchell

Facilitating the exchange of explicit knowledge through ontology.. - Lacher, Groh (2001)   (Correct)
In this paper, we give an overview of a system (CAIMAN) that can facilitate the exchange of relevant documents between geographically dispersed people in Communities of Interest. The nature of Comm... / learning techniques for text classification a concept in a personal br Wolff W. . Automatic document classification A thorough evaluation of

Hierarchical Classification of Real Life Documents - Ke Wang Senqiang (2001)   (Correct)
and Current class. A class has the form of branch/sub-branch. For example, 451/430 denotes the class corresponding to branch 451 and sub-branch 430. Most documents are associated with one class, and t... / topics. However most document classification techniques assume that

Optimizing the parSOM Neural Network Implementation for Data Mining.. - Tomsich, Rauber, Merkl (2001)   (Correct)
The self-organizing map is a prominent unsupervised neural network model which lends itself to the analysis of high-dimensional input data and data mining applications. However, the high execution tim... / application arena is text classification where documents in a

Intelligent Information Triage - Macskassy, Hirsh, Provost, Ramesh (2001)   (Correct)
In many applications, large volumes of time-sensitive textual information require triage: rapid, approximate prioritization for subsequent action. In this paper, we explore the use of prospective ind... / corpora can be used to train text classification procedures that will

Using Text Classifiers for Numerical Classification - Macskassy, Hirsh, Banerjee, Dayanik (2001)   (Correct)
Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional featurevector -based learning methods, one could treat the presence or abs... / However the use of a text-classification system on this is a bit

Document Filtering Boosted By Unlabeled Data - Park, Zhang (2001)   (Correct)
This paper describes three learning methods for document filtering that use unlabeled data. The proposed methods are based on a committee of the classifiers which are trained on a small set of labeled... / of unlabeled examples in text classification provides information about

Toward Optimal Active Learning through Sampling Estimation of Error.. - Roy, McCallum, al. (2001)   (Correct)
This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. Thes... / work. . Naive Bayes Text Classification Text classification is br . Naive Bayes Text Classification Text classification is not only

Information Triage using Prospective Criteria - Macskassy, Hirsh, Provost.. (2001)   (Correct)
In many applications, large volumes of time-sensitive textual information require triage: rapid, approximate prioritization for subsequent action. In this paper, we explore the use of prospective ... / corpora can be used to train text classification procedures that will

Improving Multi-class Text Classification with Naive Bayes - Rennie (2001)   (Correct)
There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seek... / Improving Multi-class Text Classification with Naive Bayes by Jason br an essential part of text classification. Document collections have

Hierarchical Classification of Documents with Error Control - Cheng, Tang, Fu, King (2001)   (Correct)
Classification is a function that matches a new object with one of the predefined classes. Document classification is characterized by the large number of attributes involved in the objects (docum... / of the predefined classes. Document classification is characterized by the br classes. A special kind of classification document classification has

The Decor Toolbox For Workflow-Embedded Organizational Memory Access - Abecker, Bernardi, al. (2001)   (Correct)
We shortly motivate the idea of business-process oriented knowledge management (BPOKM) and sketch the basic approaches to achieve this goal. Then we describe the DECOR (Delivery of context-sensitiv... / tool to an automatic text classification software. Currently we br documentation automated document routing planning support etc. To

Clipping and Analyzing News Using Machine Learning Techniques - Gründel, Naphtali, Wiech, Gluba.. (2001)   (Correct)
Generating press clippings for companies manually requires a considerable amount of resources. We describe a system that monitors online newspapers and discussion boards automatically. The system extr... / search Emotional analysis Text classification News story database

Multivariate Information Bottleneck - Friedman, Mosenzon, Slonim, Tishby (2001)   (Correct)
The Information bottleneck method is an unsupervised non-parametric data organization technique. Given a joint distribution P(A,B), this method constructs a new variable T that extracts partitions, or... / been employed for evaluating text classification techniques e.g. br already been applied to document classification gene expression neural

Finding Semantically Related Words in Large Corpora - Smrz, Rychl (2001)   (Correct)
The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and describe the treatmen... / in machine translation document classification information retrieval

Active Models for Dynamic Networked Organisations - Jřrgensen, al. (2001)   (Correct)
This paper points to active models as a general technique for increasing the flexibility of computerised information systems. Active models are available for manipulation by the users at runtime, and ... / Models Workflow document classification and retrieval

Hypertext Categorization using Hyperlink Patterns and Meta Data - Ghani, Slattery, Yang (2001)   (Correct)
Hypertext poses new text classification research challenges as hyperlinks, content of linked documents, and meta data about related web sites all provide richer sources of information for hypertex... / Hypertext poses new text classification research challenges as

Improving Text Classification with LSI Using Background Knowledge - Zelikovitz, Hirsh (2001)   (Correct)
We present work in progress that uses Latent Semantic Indexing (LSI) in conjunction with background knowledge and unlabeled examples to improve text classification accuracy. The singular value dec... /

The Power of Word Clusters for Text Classification - Slonim, Tishby (2001)   (Correct)
The recently introduced Information Bottleneck method [21] provides an information theoretic framework, for extracting features of one variable, that are relevant for the values of another variable. S... /

Kernel expansions with unlabeled examples - Szummer, Jaakkola (2001)   (Correct)
Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for ... / with naive Bayes models for text classification the co-training

Improving Multiclass Text Classification with the Support Vector.. - Rennie, Rifkin (2001)   (Correct)
We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substant... / Improving Multiclass Text Classification with the Support Vector

Keyword Spices: A New Method for Building Domain-Specific Web Search.. - Oyama, Kokubo, Ishida, Yamada.. (2001)   (Correct)
This paper presents a new method for building unknown Keyword Spices: A New Method for Building Domain-Specific Web Search Engines Satoshi OYAMA, Takashi KOKUBO Toru ISHIDA Department of Social Inf... / In fact most studies on text classification have been applied to

Learning to Match and Cluster Entity Names - Cohen, Richman (2001)   (Correct)
Introduction Information retrieval is, in large part, the study of methods for assessing the similarity of pairs of documents. Document similarity metrics have been used for many tasks including ad h... / ad hoc document retrieval text classification YC and summarization

Signal Detection Using ICA: Application to Chat Room Topic Spotting - Kolenda, Hansen, Larsen (2001)   (Correct)
Signal detection and pattern recognition for online grouping huge amounts of data and retrospective analysis is becoming increasingly important as knowledge based standards, such as XML and advanced M... / in and used for static text classification in We here extend this

Superimposing Codes Representing Hierarchical Information in Web.. - Fidel Cacheda Dpto (2001)   (Correct)
In this article we describe how superimposed coding can be used to represent hierarchical information, which is especially useful in categorized information retrieval systems (for example, Web direc... / Hierarchical information Document classification have been used in

A Statistical Learning Model of Text Classification for Support.. - Joachims (2001)   (Correct)
This paper develops a theoretical learning model of text classification for Support Vector Machines (SVMs). It connects the statistical properties of text-classification tasks with the generalization ... / Statistical Learning Model of Text Classification for Support Vector

Maximum Likelihood Estimation for Filtering Thresholds - Yi Zhang Jamie (2001)   (Correct)
Information filtering systems based on statistical retrieval models usually compute a numeric score indicating how well each document matches each profile. Documents with scores above profile-specific... / learning algorithms for text classification have also been used for br A. Singhal. . Boosting for document routing. In Proceedings of the

On the Automated Classification of Web Sites - Pierre (2001)   (Correct)
In this paper we discuss several issues related to automated text classification of web sites. We analyze the nature of web content and metadata in relation to requirements for text features. We find ... / issues related to automated text classification of web sites. We analyze

The Overview of Web Search Engines - Lam (2001)   (Correct)
The World Wide Web allows people to share information globally. The amount of information grows without bound. In order to extract information that we are interested in, we need a tool to search the W... / IR includes modeling document classification and categorization

Combining labeled and unlabeled data for text classification with a.. - Ghani (2001)   (Correct)
A major concern with supervised learning techniques for text classification is that they often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled... / and unlabeled data for text classification with a large number of

Least Squares and Estimation Measures via Error Correcting Output Code - Ghaderi, Windeatt (2001)   (Correct)
It is known that the Error Correcting Output Code (ECOC) technique can improve generalisation for problems involving more than two classes. ECOC uses a strategy based on calculating distance to a c... /

Support Vector Machines for Thai Phoneme Recognition - Thubthong, Kijsirikul (2001)   (Correct)
This paper describes an application of SVMs to two phoneme recognition problems: 5 Thai tones, and 12 Thai vowels spoken in isolation unknown International Journal of Uncertainty, Fuzziness and Knowl... /

AdaBoost for Query-by-Example in Text - Ultis (2001)   (Correct)
This paper describes an implementation of query-by-example, or relevance feedback, for text. The implementation uses Google's search engine to perform a keyword query as requested by the user. If th... / to text filtering and text classification. Schapire et. al.

Tone Recognition Of Continuous Thai Speech Under Tonal Assimilation.. - Thubthong, Kijsirikul (2001)   (Correct)
This paper presents a method for continuous Thai tone recognition. One of the main problems in tone recognition is that several interacting factors a#ect F 0 realization of tones. In this paper, we ... /

Information Extraction By Text Classification - Kushmerick, Johnston, McGuinness (2001)   (Correct)
Information extraction and text classification are usually seen as complementary forms of shallow text processing, in that they are aimed at very di#erent tasks. In this paper, we describe two simpl... / Information extraction by text classification Nicholas Kushmerick

Measuring the Structural Similarity among XML Documents and DTDs - Bertino, Guerrini, Mesiti, Rivara, C. (2001)   (Correct)
Sources of XML documents are proliferating on the Web and documents are more and more frequently exchanged among sources. At the same time, there is an increasing need of exploiting database tools to ... / metric can be employed for document classification and document clustering. br be employed for document classification and document clustering. In the

Using Information Extraction Rules for Extending Domain Ontologies - - Sintek, Junker, van Elst, Abecker (2001)   (Correct)
nt, we lay special emphasis on considerations and methods which are necessary to realize such a scenario in industrial practice. In each industrial environment, besides the questions of smooth introdu... / to the task of pattern-based text classification which can be solved with

Emotional Expression Recognition using Support Vector Machines - Dumas (2001)   (Correct)
The objective of this paper is to apply Support Vector Machines to the problem of classifying emotion on images of human faces. This welldefined problem is complicated by the natural variation in peop... / tasks such as text classification Joachims a

Optimizing Search by Showing Results In Context - Dumais, Cutrell, Chen (2001)   (Correct)
We developed and evaluated seven interfaces for integrating semantic category information with Web search results. List interfaces were based on the familiar ranked-listing of search results, sometime... / In addition automatic text classification techniques are used to

Statistical Classification Methods for Arabic News Articles - Sawaf, Zaplo, Ney (2001)   (Correct)
In this paper, we present experimental results on document clustering and classification achieved on the Arabic NEWSWIRE corpus using statistical methods. Arabic is a highly inflecting language. ... / analysis. Introduction Text classification is a fundamental task in br system. . Text Classification Text classification as

Incremental Document Clustering for Web Page Classification - Wai-Chiu Wong And (2001)   (Correct)
Introduction We consider document clustering for Web pages. Traditionally, the document classification task is carried out manually. In order to assign a document to an appropriate class, people woul... / work conducted on automatic text classification. One approach is to learn br pages. Traditionally the document classification task is carried out

Document Classification as an Internet service: Choosing the best.. - Godbole (2001)   (Correct)
This project investigates some of the issues involved in a new proposal for expanding the scope of the field of Data Mining by providing mining models as services on the Internet. This idea can widely... /

Extracting Meaningful Labels for WEBSOM Text Archives - Arnulfo Azcarraga Pris (2001)   (Correct)
Self-Organizing Maps, being used mainly with data that are not pre-labeled, need automatic procedures for extracting keywords as labels for each of the map units. The WEBSOM methodology for building v... /

Comparing Keyword Extraction Techniques for WEBSOM Text Archives - Arnulfo Azcarraga Pris (2001)   (Correct)
The WEBSOM methodology for building very large text archives has a very slow method for extracting meaningful unit labels. This is because the method computes for the relative frequencies of all the w... /

Stable MixeN of Complete - And Incomplete Information (2001)   (Correct)
An increasing number of parameter estimation tasks involve the use of at least two information sources, one complete but limited, the other abundant but incomplete. Standard algorithms such as EM (o... /

Learning probabilistic Datalog rules for information classification.. - Nottelmann, Fuhr (2001)   (Correct)
Probabilistic Datalog is a combination of classical Datalog (i.e., function-free Horn clause predicate logic) with probability theory. Therefore, probabilistic weights may be attached to both facts an... / dimensions like e.g. in text classification. . INTRODUCTION The

Support Vector Machine Active Learning with Applications to Text.. - Tong, Koller (2001)   (Correct)
Support vector machines have met with signi cant success in numerous real-world learning tasks. However, like most machine learning algorithms, they are generally applied using a randomly selected tra... / Learning with Applications to Text Classification a b Figure a A

Learning from Labeled and Unlabeled Data using Graph Mincuts - Blum, Chawla (2001)   (Correct)
Many application domains suffer from not having enough labeled training data for learning. However, large amounts of unlabeled examples can often be gathered cheaply. As a result, there has been a... /

SOM-based Methodology for Building Large Text Archives - Arnulfo Azcarraga And (2001)   (Correct)
Self-Organizing Maps (SOMs) have recently been used to archive over 7 million documents. Not only have SOMs been shown to scale up to very large document collections, these maps also allow for a novel... / various general methods for text classification. Benchmark cases are now br to feature extraction and document classification. These techniques span

Business-Process Oriented Delivery of Knowledge through Domain.. - Abecker, Mentzas (2001)   (Correct)
We shortly motivate the idea of possible IT support business-process oriented knowledge management (BPOKM) and sketch some basic approaches to achieve this goal. Then we describe the DECOR (Delivery o... / tool to an automatic text classification software. Currently we br documentation automated document routing planning support etc. To

Text Classification in a Hierarchical Mixture Model for Small.. - Kristina Toutanova Francine (2001)   (Correct)
Q?RCST)UVWCX8Y[Z"\8V]S^RUURW)_2`SZ"X8VaR\8bdcVefbgWCX8Rih7bdV^\Z"\8Sh)bdVYjREk X8Rl)bgS^YmYnT7Sh,ZEYjX8h7V]RW)VYUoZEbdWCXZEbdW7VeqpC`sr5ZEh)RCR)tuZEWvewX8h7V x lyVWzQ{bd\8V^... /

Words with Attitude - Jaap Kamps Maarten (2001)   (Correct)
The traditional notion of word meaning used in natural language processing is literal or lexical meaning as used in dictionaries and lexicons. This relatively objective notion of lexical meaning i... /

Using Error-Correcting Codes for Efficient Text Classification with a .. - Ghani (2001)   (Correct)
We investigate the use of Error-Correcting Output Codes (ECOC) for efficient text classification with a large number of categories and propose several extensions which improve the performance of ECOC.... /

Enhancing Text Classification to Improve Information Filtering - Lanquillon (2001)   (Correct)
xi 1 unknown Enhancing Text Classification to Improve Information Filtering Dissertation zur Erlangung des akademischen Grades Doktoringenieur (Dr.-Ing.) angenommen durch die Fakult at f ur Inf... /

Multi-Topic E-mail Authorship Attribution Forensics - de Vel, Anderson, Corney, Mohay (2001)   (Correct)
In this paper we describe an investigation of forensic authorship identification or categorisation undertaken on multitopic e-mail documents. We use an extended set of e-mail document features such as... / classifiers. Work in e-mail text classification has also been undertaken

Mining E-mail Content for Author Identification Forensics - de Vel, Anderson, Corney, Mohay (2001)   (Correct)
We describe an investigation into e-mail content mining for author identification, or authorship attribution, for the purpose of forensic investigation. We focus our discussion on the ability to discr... / classifiers. Work in e-mail text classification has also been undertaken by

A Simple Algorithm for Identifying Negated Findings and Diseases in.. - Chapman, Bridewell, Hanbury, Cooper, .. (2001)   (Correct)
Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are eviden... /

Concept Indexing - A Fast Dimensionality Reduction Algorithm with.. - Karypis, Han (2000)   (Correct)
In recent years, we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. This has led to an increased i... / clustering of words for text classification. In SIGIR- . br are primarily used for document classification and for improving the

Centroid-Based Document Classification: Analysis & Experimental.. - Han, Karypis (2000)   (Correct)
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. Automatic text categorization,... / Naive Bayesian algorithm for text classification Rainbow has options br Centroid-Based Document Classification Analysis Experimental

Learning to Construct Knowledge Bases from the World Wide Web - Craven, Freitag, McCallum, Mitchell, .. (2000)   (Correct)
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understanda... / . Statistical Text Classification In this section we br of related work. . Document Classification Our work is related to

Weight adjustment schemes for a centroid based classifier - Shankar, Karypis (2000)   (Correct)
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intra-nets. Automatic text categorization... / clustering of words for text classification. In SIGIR- . br has been widely used for document classification and has been shown to

Two Decades Of Statistical Language Modeling: Where Do We Go From.. - Rosenfeld (2000)   (Correct)
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was... / machine translation document classification and routing optical

Relevance and Reinforcement in Interactive Browsing - Leuski (2000)   (Correct)
We consider the problem of browsing the top ranked portion of the documents returned by an information retrieval system. We describe an interactive relevance feedback agent that analyzes the inter-doc... / about the individual terms. Text classification and categorization is where

On the Learnability and Design of Output Codes for Multiclass Problems - Crammer, Singer (2000)   (Correct)
Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefined output codes. In... / character recognition text classification phoneme classification

Optimization Approaches to Semi-Supervised Learning - Demiriz, Bennett (2000)   (Correct)
We examine mathematical models for semi-supervised support vector machines (S 3 VM). Given a training set of labeled data and a working set of unlabeled data, S 3 VM constructs a support vector ma... / methods on web-based text classification problems for example using

Athena: Mining-based Interactive Management of Text Databases - Agrawal, Bayardo, Srikant (2000)   (Correct)
We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive miningbased operations. Requirements of any such system include speed and ... / with other techniques for text classification CDAR LR Lan br for hierarchy reorganization document routing and identification of

The Challenge of Discovering Meta-Data - Morik, Haustein (2000)   (Correct)
Introduction Machine learning research has always been driven by scenarios. The scenarios were put together from ideas of anthropological disciplines (e.g., linguistics, cognitive and social science,... / learning tasks to be solved. Text classification that recognizes a document

Improving text categorization methods for event tracking - Yang, Ault, Pierce, Lattimer (2000)   (Correct)
Automated tracking of events from chronologically ordered document streams is a new challenge for statistical text classification. Existing learning techniques must be adapted or improved in order to ... / challenge for statistical text classification. Existing learning

Incorporating Linguistic Structure into Statistical Language Models - Rosenfeld (2000)   (Correct)
this paper. References unknown Incorporating Linguistic Structure into Statistical Language Models By Ronald Rosenfeld School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 ... / such as speech recognition document classification optical character

Restricted Bayes Optimal Classifiers - Tong, Koller (2000)   (Correct)
We introduce the notion of restricted Bayes optimal classifiers. These classifiers attempt to combine the flexibility of the generative approach to classification with the high accuracy associated wit... / Naive Bayes classifier in text classification Mitchell We

Boosting for Document Routing - Iyer, Lewis, Schapire, Singer.. (2000)   (Correct)
RankBoost is a recently proposed algorithm for learning ranking functions. It is simple to implement and has strong justifications from computational learning theory. We describe the algorithm and pre... / in a study of boosting for text classification and used the same br Boosting for Document Routing Raj D. Iyer David

An information theoretic approach to finding word groups for text.. - Verbeek (2000)   (Correct)
This thesis concerns finding the `optimal' number of (non-overlapping) word groups for text classification. We present a method to select which words to cluster in word groups and how many such word g... /

A Comparative Study on Chinese Text Categorization Methods - He, Tan, Tan (2000)   (Correct)
This paper reports our comparative evaluation of three machine learning methods on Chinese text categorization. Whereas a wide range of methods have been applied to English text categorization, relati... / performance for Chinese text classification. In our experiments we br has not been used for document classification. To evaluate the three

Language Model Adaptation - Gotoh (2000)   (Correct)
15> attempt to exploit longer distance dependencies. -- infer some notion of `topic' from text. -- compute topic dependent probability. 8th ELSNET summer school 2 Language Model Adaptation 26 Jul... / clustering of words for text classification. In Proceedings of br application to document classification. ffl Brown et al.

Data Mining on Symbolic Knowledge Extracted from the Web - Ghani, Jones, Mladenic, Nigam.. (2000)   (Correct)
Information extractors and classifiers operating on unrestricted, unstructured texts are an errorful source of large amounts of potentially useful information, especially when combined with a crawler ... / information extractors text classification and relational learning.

Text classification and segmentation using minimum cross-entropy - Teahan (2000)   (Correct)
Several methods for classifying and segmenting text are described. These are based on ranking text sequences by their cross-entropy calculated using a fixed order character-based Markov model adapted ... /

Analyzing the Effectiveness and Applicability of Co-training - Nigam, Ghani (2000)   (Correct)
Recently there has been significant interest in supervised learning algorithms that combine labeled and unlabeled data for text learning tasks. The co-training setting [1] applies to datasets that hav... / labeled and unlabeled data text classification . INTRODUCTION There

Automating the Measurement of Linguistic Features to Help Classify.. - Copeck, Barker, Delisle, Szpakowicz (2000)   (Correct)
Text classification plays a central role in software systems which perform automatic information classification and retrieval. Occurrences of linguistic feature values must be counted by any mechanism... / Abstract Text classification plays a central role in br R. J. ALLEN Document Classification Using Multiword Features.

Feature Selection and Dualities in Maximum Entropy Discrimination - Jebara, Jaakkola (2000)   (Correct)
Incorporating feature selection into a classification or regression method often carries a number of advantages. In this paper we formalize feature selection specifically from a discriminative per... / Transductive Inference for Text Classification using Support Vector br ranging from image and document classification to problems in

Combining multiple learning strategies for effective cross validation - Yang, Ault, Pierce (2000)   (Correct)
Parameter tuning through cross-validation becomes very difficult when the validation set contains no or only a few examples of the classes in the evaluation set. We address this open challenge by ... / Buckley Applied to text classification categorization Cohen

Thesis Proposal - Thomas (2000)   (Correct)
AI has long been applied to the problem of predicting financial markets. Recently, developments in both AI and financial economics have opened up the possibility for close collaboration between the ... / Second I plan to adapt text classification and related techniques for

XML Based Schema Definition for Support of Inter-organizational.. - van der Aalst, Kumar (2000)   (Correct)
Commerce on the Internet is still seriously hindered by the lack of a common language for collaborative commercial activities. Although XML (Extendible Markup Language) allows trading partners to exch... / does not provide support for document routing. In this paper we propose

Scalable Association-based Text Classification - Meretakis, Fragoudis, Lu.. (2000)   (Correct)
Nave Bayes (NB) classifier has long been considered a core methodology in text classification mainly due to its simplicity and computational efficiency. There is an increasing need however for methods... / Scalable Association-based Text Classification Dimitris Meretakis

Recognizing End-User Transactions in Performance Management - Hellerstein, Jayram, Rish (2000)   (Correct)
Providing good quality of service (e.g., low response times) in distributed computer systems requires measuring end-user perceptions of performance. Unfortunately, in practice such measures are often ... / to metrics typically used in text classification. The second approach is to br is akin to work done in document classification. The second problem is

Centroid-Based Document Classification: Analysis Experimental Results - Han (2000)   (Correct)
In this paper we present a simple linear-time centroid-based document classification algorithm, that despite its simplicity and robust performance, has not been extensively studied and analyzed. O... / to be very effective in text classification We were not able to br Centroid-Based Document Classification Analysis Experimental

Clustering by means of Unsupervised Decision Trees or Hierarchical.. - Bellot, El-Bèze (2000)   (Correct)
A classical information retrieval system returns a list of documents to a user query. The answer list is often so long that users cannot explore all the documents retrieved. A classification of the re... / we present and compare two text classification algorithms. The first one br commonly used to evaluate document classification for information retrieval

Hierarchical Classification of Web Content - Dumais, Chen (2000)   (Correct)
This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train different second-level cla... / structures. KEYWORDS Text classification text categorization br KEYWORDS Text classification text categorization

Machine Learning for Intelligent Processing of Printed Documents - Esposito, Malerba, Lisi (2000)   (Correct)
A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for pape... / Authors Document analysis Document classification Document understanding br Document analysis Document classification Document understanding Figure .

A Multi-class Linear Learning Algorithm Related to Winnow with Proof - Mesterharm (2000)   (Correct)
In this paper, we present Committee, a new multi-class learning algorithm related to the Winnow family of algorithms. Committee is an algorithm for combining the predictions of a set of sub-experts in... / use Winnow algorithms on text classification problems. This multiclass

Less is More: Active Learning with Support Vector Machines - Schohn, Cohn (2000)   (Correct)
We describe a simple active learning heuristic which greatly enhances the generalization behavior of support vector machines (SVMs) on several practical document classification tasks. We observe a... / particularly those involving text classification Joachims b Dumais et br on several practical document classification tasks. We observe a

Improving Short-Text Classification Using Unlabeled Background.. - Zelikovitz, Hirsh (2000)   (Correct)
We describe a method for improving the classification of short text strings using a combination of labeled training data plus a secondary corpus of unlabeled but related longer documents. We show ... / Improving Short-Text Classification Using Unlabeled Background

Language Sensitive Text Classification - Basili, Moschitti, Pazienza (2000)   (Correct)
It is a traditional belief that in order to scale-up to more effective retrieval and access methods modern Information Retrieval has to consider more the text content. The modalities and techniques to... / Language Sensitive Text Classification Roberto Basili

Enhancing Supervised Learning with Unlabeled Data - Goldman, Zhou (2000)   (Correct)
In many practical learning scenarios, there is a small amount of labeled data along with a large pool of unlabeled data. Many supervised learning algorithms have been developed and extensively stu... / applications to the area of text classification. For example Riloff and

Theme-based Retrieval of Web News - Maria, Silva (2000)   (Correct)
We introduce an information system for organization and retrieval of news articles from Web publications, incorporating a classification framework based on Support Vector Machines. We present the data... / information retrieval and text classification tools are necessary. The

Classification of document page images based on visual similarity of.. - Shin, Doermann (2000)   (Correct)
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be us... / predefined class. Each document classification engine is based on the

Estimating the Generalization Performance of an SVM Efficiently - Joachims (2000)   (Correct)
This paper proposes and analyzes an approach to estimating the generalization performance of a support vector machine (SVM) for text classification. Without any computation intensive resampling, the n... /

Representation of Electronic Mail Filtering Profiles: A User Study - Pazzani (2000)   (Correct)
Electronic mail offers the promise of rapid communication of essential information. However, electronic mail is also used to send unwanted messages. A variety of approaches can learn a profile of a us... / Some of the earliest text classification methods e.g.Rocchio

Transforming Paper Documents into XML Format with WISDOM++ - Altamura, Esposito, Malerba (2000)   (Correct)
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of th... / document analysis document classification document br analysis document classification document understanding text

Learning a Monolingual Language Model from a Multilingual Text.. - Ghani, Jones (2000)   (Correct)
Language models are of importance in speech recognition, document classification, and database selection algorithms. Traditionally language models are learned from corpora specifically acquired for t... / in speech recognition document classification and database selection

Topic-Based Mixture Language Modelling - Gotoh (2000)   (Correct)
This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The appr... / clustering of words for text classification. In Proceedings of br recently been applied to document classification by Baker and McCallum

Discriminant-EM Algorithm with Application to Image Retrieval - Wu, Tian, Huang (2000)   (Correct)
In many vision applications, the practice of supervised learning faces several difficulties, one of which is that insufficient labeled training data result in poor generalization. In image retrieval, ... / have some applications in text classification. Although EM o ers a

Using Information Extraction and Natural Language Generation to.. - Kosseim, Beauregard, Lapalme (2000)   (Correct)
This paper discusses the use of information extraction and natural language generation in the design of an automated e-mail answering system. We analyse short free-form texts and generating a customis... / A technique based on text classification or case-based reasoning may

Self-Supervised Learning for Visual Tracking and Recognition of Human .. - Wu, Huang (2000)   (Correct)
In Proc. of AAAI'2000 pp.243-248, Austin, Texas, July, 2000 Due to the large variation and richness of visual inputs, statistical learning gets more and more concerned in the practice of visual pro... / and have some applications in text classification. Although EM offers a

An Adaptive and Distributed Framework for Advanced IR - Basili, Pazienza, al. (2000)   (Correct)
It has been often noticed that modern IR ((Gregory, 1991), (Alan, 1991)) should exhibit capabilities that are sensitive to the document content, integrate interactivity, multimodality and multilingual... / processors for content-driven text classification. A full toolkit system was

Web Document Classification based on Hyperlinks and Document Semantics - Kuo, Wong (2000)   (Correct)
Besides the basic content, a web document also contains a set of hyperlinks pointing to other related documents. Hyperlinks in a document provide much information about its relation with other web... / Web Document Classification based on Hyperlinks and

Learning to Create Customized Authority Lists - Chang, Cohn (2000)   (Correct)
The proliferation of hypertext and the popularity of Kleinberg's HITS algorithm have brought about an increased interest in link analysis. While HITS and its older relatives from the Bibliometrics... / into plain text. Statistical text classification is used to categorize the

An annotation tool for Web browsers and its applications to.. - Denoue, Vignollet (2000)   (Correct)
With bookmark programs, current Web browsers provide a limited support to personalize the Web. We present a new Web annotation tool which uses the Document Object Model Level 2 and Dynamic HTML to del... / the future works including document classification and summarization and

Document Classification with Unsupervised Artificial Neural Networks - Merkl, Rauber (2000)   (Correct)
Text collections may be regarded as an almost perfect application arena for unsupervised neural networks. This is because many operations computers have to perform on text documents are classi catio... /

Workshop on Intelligent Information Integration (III99) - Fensel Knoblock Kushmerick (2000)   (Correct)
cit management of uncertainty, learning and adaptivity, planning, and so forth. Over the past several years, the "information integration" community in Artificial Intelligence has been exploring these... / their use for automatic document classification. The paper van Harmelen

On Behavior Classification in Adversarial Environments - Riley, Veloso (2000)   (Correct)
In order for robotic systems to be successful in domains with other agents possibly interfering with the accomplishing of goals, the agents must be able to adapt to the opponents' behavior. The more q... / other complex domains such as text classification with the bag-of-words

Using Growing Hierarchical Self-Organizing Maps for Document.. - Dittenbach, Merkl, Rauber (2000)   (Correct)
The self-organizing map has shown to be a stable neural network model for high-dimensional data analysis. However, its applicability is limited by the fact that some knowledge about the data is requir... /

Supporting Distributed Cooperative Work in CAGIS - Ramampiaro (2000)   (Correct)
This paper describes how the CAGIS environment can be used to manage work-processes, cooperative processes, and how to share and control information in a distributed, heterogeneous environment. We hav... / Domain Model Construction Document Classification and Browsing br concepts. Document Classification Documents are classified by

Information Extraction by Text Classification: Corpus Mining for.. - Zavrel, Berck, Lavrijssen (2000)   (Correct)
This paper describes a method for building an Information Extraction (IE) system using standard text classification machine learning techniques, and datamining for complex features on a large corpus o... / Information Extraction by Text Classification Corpus Mining for Features

Toward Using Text Summarization for Essay-Based Feedback - Burstein, Marcu (2000)   (Correct)
We empirically study the impact of using automatically generated summaries in the context of electronic essay rating. Our results indicate that 40% and 60% discourse-based essay summaries improve the ... / Consider for a moment the document classification task used by DARPA during

Improving Text Classification by Using Conceptual and Contextual.. - Jensen, Martinez (2000)   (Correct)
The exponential growth of text available on the Internet has created a critical need for accurate, fast, and general purpose text classification algorithms. This paper examines the improvement of broa... / Improving Text Classification by Using Conceptual and

Mining E-mail Authorship - de Vel (2000)   (Correct)
In this paper we report an investigation into the learning of authorship identification or categorisation for the case of e-mail documents. We use various e-mail document features such as structural c... / document. Work in e-mail text classification has also been undertaken

Subspace Information Criterion for Non-Quadratic Regularizers - Model .. - Tsuda, Sugiyama, Müller (2000)   (Correct)
Non-quadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that a... / samples. For example in text classification many additional

Fast Supervised Dimensionality Reduction Algorithm with Applications.. - Karypis, Han (2000)   (Correct)
Retrieval techniques based on dimensionality reduction, such as Latent Semantic Indexing (LSI), have been shown to improve the quality of the information being retrieved by capturing the latent meanin... / clustering of words for text classification. In SIGIR- . br are primarily used for document classification and for improving the

Knowledge Representation, Learning, and Reasoning in WebDoc - A Web.. - Bo Tang And (2000)   (Correct)
This paper describe a novel approach to knowledge representation, learning, and reasoning in WebDoc, a system that classifies Web documents according to the Library of Congress classification syste... / in WebDoc -A Web Document Classification System Bo Tang and

Using software agents to support evolution of distributed workflow.. - Wang (2000)   (Correct)
This paper outlines a high-level design of how software agents can be used combined with an existing CAGIS Process Centred Environment to deal with evolution of distributed, fragmented workflow models... /

Assessing the Calibration of Naive Bayes' Posterior Estimates - Bennett (2000)   (Correct)
In this paper, we give evidence that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length. While exponential change may be expected as new bits of informati... / reliability posterior text classification Reuters Introduction

Using Information Extraction to Classify Newspapers Advertisements - Peleato, Chappelier, Rajman (2000)   (Correct)
This paper presents a text classification procedure that has been developed in the context of an information extraction project. In the prototype that has been developed for this project, newspaper ad... / This paper presents a text classification procedure that has been

TypTex: Inductive typological text classification by multivariate.. - Folch, Heiden, Habert, Fleury.. (2000)   (Correct)
The increasing use of methods in natural language processing (NLP) which are based on huge corpora require that the lexical, morphosyntactic and syntactic homogeneity of texts be mastered. We have dev... / Inductive typological text classification by multivariate statistical

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute