Abstract. Clustering is an important data mining task which helps in finding useful patterns to summarize the data. In the KDD context, data mining is often used for description purposes rather than for prediction. However, it turns out difficult to find clustering systems that help to ease the interpretation task to the user in both, statistics and Machine Learning fields. In this paper we present Isaac, a hierarchical clustering system which employs traditional clustering ideas combined with a feature selection mechanism and heuristics in order to provide comprehensible results. At the same time, it allows to efficiently deal with large datasets by means of a preprocessing step. Results suggest that these aims are achieved and encourage further research. 1
|
2489
|
Induction of Decision Trees
– Quinlan
- 1986
|
|
540
|
Wrappers for Feature Subset Selection
– Kohavi, John
- 1997
|
|
527
|
Knowledge acquisition via incremental conceptual clustering
– Fisher
- 1987
|
|
490
|
Generalization as search
– MITCHELL
- 1982
|
|
447
|
From data mining to knowledge discovery: an overview
– Piatetsky-Shapiro, Smyth
- 1996
|
|
339
|
Bayesian Classification (AutoClass): Theory and Results
– Cheeseman, Stutz
- 1996
|
|
314
|
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
– Dietterich
- 1998
|
|
247
|
Selection of relevant features and examples in machine learning
– Blum, Langley
- 1997
|
|
207
|
Learning from observation: Conceptual Clustering
– Michalski, Stepp
- 1983
|
|
99
|
Elements of machine learning
– Langley
- 1996
|
|
87
|
Experiments with Incremental Concept Formation: UNIMEM
– LEBOWITZ
- 1987
|
|
36
|
Understanding the nature of learning: Issues and research directions
– MICHALSKI
- 1986
|
|
35
|
Efficient feature selection in conceptual clustering
– Devaney, Ram
- 1997
|
|
32
|
Evaluation and selection of biases in machine learning
– Gordon, desJardins
- 1995
|
|
28
|
Feature selection as a preprocessing step for hierarchical clustering
– Talavera
- 1999
|
|
27
|
Explorations of an incremental, Bayesian algorithm for categorization", Machine Learning 9
– Anderson, Matessa
- 1992
|
|
23
|
Conceptual clustering, categorization, and polymorphy
– Bauer
- 1989
|
|
22
|
Dimensionality reduction of unsupervised data
– Dash, Liu, et al.
- 1997
|
|
19
|
Very simple classi rules perform well on most commonly used datasets
– Holte
- 1993
|
|
14
|
Data preprocessing and intelligent data analysis
– Famili, Shen, et al.
- 1997
|
|
12
|
Conceptual Clustering and Exploratory Data Analysisi
– Biswas, Weinberg, et al.
- 1991
|
|
8
|
de M'antaras. A distance based attribute selection measure for decision tree induction
– L'opez
- 1991
|
|
7
|
Declarative bias: An overview
– Russell, Grosof
- 1990
|
|
6
|
A buffering strategy to avoid ordering effects in clustering
– Talavera, Roure
- 1998
|
|
6
|
Adquisici'on autom'atica de conocimiento en dominios poco estructurados
– B'ejar
- 1995
|
|
6
|
Experiments with Domain Knowledge in Knowledge Discovery
– B'ejar, Cort'es, et al.
- 1997
|
|
6
|
Generalizaci'on y atenci'on selectiva para la formaci'on de conceptos
– Talavera, Cort'es
- 1996
|
|
6
|
A knowledge acquisition tool for multi-perspective concept formation
– Vasco, Faicher, et al.
- 1996
|
|
4
|
An evaluation of feature-selection methods and their application to computer security
– Doak
- 1992
|
|
3
|
Feature selection for classi
– Dash, Liu
- 1997
|
|
2
|
Extending ITERATE conceptual clustering scheme in dealing with numeric data. Master 's thesis
– Li
- 1995
|
|
2
|
Exploring efficient attribute prediction in hierarchical clustering
– Talavera
- 1998
|
|
2
|
Exploiting bias shift in knowledge acquisition
– Talavera, Cort'es
- 1997
|
|
2
|
Inductive hypothesis validation and bias selection in unsupervised learning
– Talavera, Cort'es
- 1997
|
|
1
|
Vasco. Determining property relevance in concept formation by computing correlation between properties
– Furtado
|
|
1
|
Bias selection and knowledge acquisition in ill-structured domains
– Talavera
- 1997
|
|
1
|
The Diverse Priors Model: Parameter Estimates Bootstrapped 95% confidence interval ss 1 0.126 0.093 0.160 ss 2 0.619 0.375 0.989 2 0.159 0.102 0.636 ss 3 0.300 0.180 0.549 ss 4 0.102 0.050 0.142 4 0.156 0.080 0.453 ee 4 0.474 0.296 0.709 gg 5 0.145 0.103
– unknown authors
- 1995
|