• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Knowledge acquisition via incremental conceptual clustering (1987)

by D H Fisher
Venue:Machine Learning
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 765
Next 10 →

Fast Algorithms for Mining Association Rules

by Rakesh Agrawal, Ramakrishnan Srikant , 1994
"... We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving this problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known a ..."
Abstract - Cited by 3612 (15 self) - Add to MetaCart
We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving this problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

Data clustering: A review

by A. K. Jain, et al.
"... ..."
Abstract - Cited by 1940 (14 self) - Add to MetaCart
Abstract not found

Hierarchically Classifying Documents Using Very Few Words

by Daphne Koller, Mehran Sahami , 1997
"... The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. Existing classification schemes which ignore the hierarchical structure and treat the topics as separate classes are often inadequate in text ..."
Abstract - Cited by 521 (8 self) - Add to MetaCart
The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. Existing classification schemes which ignore the hierarchical structure and treat the topics as separate classes are often inadequate in text classification where the there is a large number of classes and a huge number of relevant features needed to distinguish between them. We propose an approach that utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. As we show, each of these smaller problems can be solved accurately by focusing only on a very small set of features, those relevant to the task at hand. This set of relevant features varies widely throughout the hierarchy, so that, while the overall relevant feature set may be large, each classifier only examines a small subset. The use of reduced feature sets allows us to util...
(Show Context)

Citation Context

...t work in classification has ignored the problem of supervised learning in the presence of hierarchically structured classes. (There has been some work on unsupervised hierarchical clustering, e.g., (=-=Fisher 1987).) Of cou-=-rse, standard classification techniques can be applied to this problem almost directly. We simply construct a "flattened" class space, with one class for every leaf in the hierarchy. We use ...

Survey of clustering algorithms

by Rui Xu, Donald Wunsch II - IEEE TRANSACTIONS ON NEURAL NETWORKS , 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract - Cited by 499 (4 self) - Add to MetaCart
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

Constrained K-means Clustering with Background Knowledge

by Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schroedl - In ICML , 2001
"... Clustering is traditionally viewed as an unsupervised method for data analysis. However, in some cases information about the problem domain is available in addition to the data instances themselves. In this paper, we demonstrate how the popular k-means clustering algorithm can be pro tably modi- ed ..."
Abstract - Cited by 488 (9 self) - Add to MetaCart
Clustering is traditionally viewed as an unsupervised method for data analysis. However, in some cases information about the problem domain is available in addition to the data instances themselves. In this paper, we demonstrate how the popular k-means clustering algorithm can be pro tably modi- ed to make use of this information. In experiments with arti cial constraints on six data sets, we observe improvements in clustering accuracy. We also apply this method to the real-world problem of automatically detecting road lanes from GPS data and observe dramatic increases in performance. 1.

Knowledge Discovery in Databases: an Overview

by William J. Frawley, Gregory Piatetsky-shapiro, Christopher J. Matheus , 1992
"... this article. 0738-4602/92/$4.00 1992 AAAI 58 AI MAGAZINE for the 1990s (Silberschatz, Stonebraker, and Ullman 1990) ..."
Abstract - Cited by 473 (3 self) - Add to MetaCart
this article. 0738-4602/92/$4.00 1992 AAAI 58 AI MAGAZINE for the 1990s (Silberschatz, Stonebraker, and Ullman 1990)

An analysis of Bayesian classifiers

by Pat Langley, Wayne Iba, Kevin Thompson - IN PROCEEDINGS OF THE TENTH NATIONAL CONFERENCE ON ARTI CIAL INTELLIGENCE , 1992
"... In this paper we present anaverage-case analysis of the Bayesian classifier, a simple induction algorithm that fares remarkably well on many learning tasks. Our analysis assumes a monotone conjunctive target concept, and independent, noise-free Boolean attributes. We calculate the probability that t ..."
Abstract - Cited by 440 (17 self) - Add to MetaCart
In this paper we present anaverage-case analysis of the Bayesian classifier, a simple induction algorithm that fares remarkably well on many learning tasks. Our analysis assumes a monotone conjunctive target concept, and independent, noise-free Boolean attributes. We calculate the probability that the algorithm will induce an arbitrary pair of concept descriptions and then use this to compute the probability of correct classification over the instance space. The analysis takes into account the number of training instances, the number of attributes, the distribution of these attributes, and the level of class noise. We also explore the behavioral implications of the analysis by presenting

Survey of clustering data mining techniques

by Pavel Berkhin , 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract - Cited by 408 (0 self) - Add to MetaCart
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
(Show Context)

Citation Context

... K-Means Methods). The merger decision is based on minimization of its effect on the objective function. The popular hierarchical clustering algorithm for categorical data COBWEB, developed by Fisher =-=[Fis87]-=-, has two very important qualities. First, it is an example of incremental learning. Rather than following divisive or agglomerative approaches, it dynamically builds a dendrogram by processing one in...

The adaptive nature of human categorization

by John R. Anderson - Psychological Review , 1991
"... A rational model of human categorization behavior is presented that assumes that categorization reflects the derivation of optimal estimates of the probability of unseen features of objects. A Bayesian analysis is performed of what optimal estimations would be if categories formed a disjoint partiti ..."
Abstract - Cited by 344 (2 self) - Add to MetaCart
A rational model of human categorization behavior is presented that assumes that categorization reflects the derivation of optimal estimates of the probability of unseen features of objects. A Bayesian analysis is performed of what optimal estimations would be if categories formed a disjoint partitioning of the object space and if features were independently displayed within a category. This Bayesian analysis is placed within an incremental categorization algorithm. The resulting rational model accounts for effects of central tendency of categories, effects of specific instances, learning of linearly nonseparable categories, effects of category labels, extraction of basic level categories, base-rate effects, probability matching in categorization, and trial-by-trial learning functions. Al-though the rational model considers just I level of categorization, it is shown how predictions can be enhanced by considering higher and lower levels. Considering prediction at the lower, individual level allows integration of this rational analysis of categorization with the earlier rational analysis of memory (Anderson & Milson, 1989). Anderson (1990) presented a rational analysis ot 6 human cog-nition. The term rational derives from similar "rational-man" analyses in economics. Rational analyses in other fields are sometimes called adaptationist analyses. Basically, they are ef-forts to explain the behavior in some domain on the assump-tion that the behavior is optimized with respect to some criteria of adaptive importance. This article begins with a general char-acterization ofhow one develops a rational theory of a particu-lar cognitive phenomenon. Then I present the basic theory of categorization developed in Anderson (1990) and review the applications from that book. Since the writing of the book, the theory has been greatly extended and applied to many new phenomena. Most of this article describes these new develop-ments and applications. A Rational Analysis Several theorists have promoted the idea that psychologists might understand human behavior by assuming it is adapted to the environment (e.g., Brunswik, 1956; Campbell, 1974; Gib-
(Show Context)

Citation Context

...h object comes in, and one needs to do so with a substantially bounded amount of computation. There is a type of iterative algorithm that has appeared in the artificial intelligence literature (e.g., =-=Fisher, 1987-=-; Lebowitz, 1987)that satisfies these constraints. I have adapted this algorithm to fit the framework I have set forth. Although I have no formal proof, I strongly suspect that this is the optimal alg...

A Comparison of Two Learning Algorithms for Text Categorization

by David D. Lewis, Marc Ringuette - In Third Annual Symposium on Document Analysis and Information Retrieval , 1994
"... This paper examines the use of inductive learning to categorize natural language documents into predefined content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has m ..."
Abstract - Cited by 336 (1 self) - Add to MetaCart
This paper examines the use of inductive learning to categorize natural language documents into predefined content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has mixed machine learning and knowledge engineering methods, making it difficult to draw conclusions about the performance of particular methods. In this paper we present empirical results on the performance of a Bayesian classifier and a decision tree learning algorithm on two text categorization data sets. We find that both algorithms achieve reasonable performance and allow controlled tradeoffs between false positives and false negatives. The stepwise feature selection in the decision tree algorithm is particularly effective in dealing with the large feature sets common in text categorization. However, even this algorithm is aided by an initial prefiltering of features, confirming the results...
(Show Context)

Citation Context

...features corresponding to time of publication, day, date of the week, month, and so on. More generally, we plan to investigate incremental learning algorithms that are designed to track concept drift =-=[Fis87]-=- and to see how the idea of cyclical changes in concept definition might be used. Raw performance is not the only characteristic of interest in a learning algorithm for text categorization. For applic...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University