An enormous proliferation of databases in almost every area of human endeavor has created a great demand for new, powerful tools for turning data into useful, task-oriented knowledge. In efforts to satisfy this need, researchers have been exploring ideas and methods developed in machine learning, pattern recognition, statistical data analysis, data visualization, neural nets, etc. These efforts have led to the emergence of a new research area, frequently called data mining and knowledge discovery. The first part of this chapter is a compendium of ideas on the applicability of symbolic machine learning methods to this area. The second part describes a multistrategy methodology for conceptual data exploration, by which we mean the derivation of high-level concepts and descriptions from data through symbolic reasoning involving both data and background knowledge. The methodology, which has been implemented in the INLEN system, combines machine learning, database and knowledge-based technologies. To illustrate the system's capabilities, we present results from its application to a problem of discovery of economic and demographic patterns in a database containing facts and statistics about the countries of the world. The presented results demonstrate a high potential utility of the methodology for assisting in solving practical data mining and knowledge discovery tasks. 2.1
|
3215
|
C4.5: Programs for machine learning
– Quinlan
- 1993
|
|
2489
|
Induction of Decision Trees
– Quinlan
- 1986
|
|
2438
|
Classification and Regression Trees
– Breiman, Friedman, et al.
- 1984
|
|
1486
|
Fuzzy sets
– Zadeh
- 1965
|
|
843
|
Efficient induction of logic programs
– Muggleton, Feng
- 1990
|
|
625
|
A Theory and Methodology of Inductive Learning
– Michalski
- 1983
|
|
536
|
Rough Sets: Theoretical Aspects of Reasoning about Data
– Pawlak
- 1991
|
|
366
|
Exploratory Data Analysis
– Tukey
- 1977
|
|
253
|
Basic objects in natural categories
– Rosch, Mervis, et al.
- 1976
|
|
136
|
Chimerge: Discretization for numeric attributes
– KERBER
- 1992
|
|
135
|
An Empirical Comparison of Selection Measures for Decision Tree Induction”, Machine learning
– Mingers
- 1989
|
|
103
|
HypothesisDriven Constructive Induction in AQ17-HCI: A Method and Experiments
– Wnek, Michalski
- 1994
|
|
90
|
Learning in the presence of concept drift and hidden contexts
– Widmer, Kubat
- 1996
|
|
85
|
Knowledge discovery and data mining: towards a unifying framework
– Fayyed, Piatetsky-Shapiro, et al.
- 1996
|
|
84
|
Experiments in Induction
– Hunt, Marin, et al.
- 1966
|
|
65
|
The AQ15 inductive learning system: An overview and experiments
– Michalski, Mozetic, et al.
- 1986
|
|
62
|
The logic of plausible reasoning: A core theory
– Collins, Michalski
- 1989
|
|
62
|
Selection of most representative training examples and incremental generation of VL1 hypotheses: the underlying methodology and the description of programs ESEL and AQ11
– Michalski, Larson
- 1978
|
|
59
|
Inferential Theory of Learning: Developing Foundations for Multistrategy Learning
– Michalski
- 1994
|
|
55
|
Applied Multivariate Techniques
– Sharma
- 1996
|
|
49
|
Integrating quantitative and qualitative discovery: the ABACUS system
– Falkenhainer, Michalski
- 1986
|
|
45
|
Rediscovering chemistry with the bacon system
– Langley, Bradshaw, et al.
- 1983
|
|
42
|
Learning Two-Tiered Descriptions of Flexible Concepts: The Poseidon System
– Bergadano, Matwin, et al.
- 1992
|
|
41
|
Knowledge acquisition and refinement tools for the ADVISE meta-expert system
– Reinke
- 1984
|
|
38
|
Data-driven constructive induction
– Bloedorn, Michalski
- 1998
|
|
36
|
A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts
– Michalski, Stepp, et al.
- 1981
|
|
35
|
editor. Intelligent Decision Support
– Slowinski
- 1992
|
|
35
|
Selective induction learning system AQ15c: The method and user's guide
– Wnek, Kaufman, et al.
- 1995
|
|
34
|
FS: Fitting Equations to Data
– Daniel, Wood
- 1971
|
|
31
|
Learning flexible concepts: Fundamental ideas and a method based on two-tiered representation
– Michalski
- 1986
|
|
27
|
Learning to predict sequences
– Dietterich, Michalski
- 1986
|
|
25
|
Mining business databases
– Brachman, Khabaza, et al.
- 1996
|
|
25
|
Mining scientific data
– FAYYAD, HAUSSLER, et al.
- 1996
|
|
25
|
Mining for Knowledge
– Michalski, Kerschberg, et al.
- 1992
|
|
24
|
AQ15: Incremental Learning of Attribute-Based Descriptions from Examples, the Method and User's Guide
– Hong, Mozetic, et al.
- 1986
|
|
24
|
Mining for knowledge in databases: goals and general description of the INLEN system
– Kaufman, Michalski, et al.
- 1991
|
|
24
|
Knowledge discovery from multiple databases
– Ribeiro, Kaufman, et al.
- 1995
|
|
22
|
The PROMISE Method for Selecting Most Relevant Attributes for Inductive Learning Systems
– Baim
- 1982
|
|
22
|
A planar geometric model for representing multidimensional discrete spaces and multiple-valued logic functions
– Michalski
- 1978
|
|
21
|
The logic of plausible reasoning
– Collins, Michalski
- 1989
|
|
18
|
Imputation of missing data using machine learning techniques
– Lakshminarayan, Harp, et al.
- 1996
|
|
17
|
Machine Learning of User Profiles: Representational Issues
– Bloedorn, Mani, et al.
- 1996
|
|
17
|
Incremental learning of concept descriptions: A method and experimental results
– Reinke, Michalski
- 1988
|
|
16
|
Introduction to Machine Learning
– Kodratoff
- 1988
|
|
16
|
Combining many searches in the FAHRENHEIT discovery system
– Zytkow
- 1987
|
|
16
|
Mining for knowledge in databases: The INLEN architecture, initial implementation and rst results
– Michalski, Kerschberg, et al.
- 1992
|
|
15
|
Interval Generalization of Switching Theory
– Michalski, McCormick
- 1971
|
|
14
|
Experience in the use of an inductive system in knowledge engineering
– Hart
- 1984
|
|
13
|
CONVART: A Program for Constructive Induction on Time Dependent Data
– Davis
- 1981
|
|
13
|
DIAV 2.0 User Manual: Specification and Guide through the Diagrammatic Visualization System
– Wnek
- 1995
|