Citations
6600 |
C4.5: Programs for machine learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ds and lower throughputs. Memory constraints are equally important. For the same problem, a single ID3 decision tree ([27]) may require more than 650KBytes of main memory, while a C4.5 decision tree (=-=[28]-=-) may need 75KBytes. Retaining a large number of base classifiers and 2smeta-classifiers may not be practical nor feasible. Meta classifiers are defined recursively as collections of classifiers struc... |
5960 |
Classification and regression trees
- Breiman
- 1998
(Show Context)
Citation Context ...el tailored to the credit card fraud detection problem. 4 Experiments and Results Learning algorithms Five inductive learning algorithms are used in our experiments. ID3, its successor C4.5, and Cart =-=[3]-=- are decision tree based algorithms, Bayes, described in [11], is a naive Bayesian classifier and Ripper [9] is a rule induction algorithm. Learning tasks Two data sets of real credit card transaction... |
4371 | Induction of decision trees
- Quinlan
- 1986
(Show Context)
Citation Context ...es are required to classify new instances and that translates to increased overheads and lower throughputs. Memory constraints are equally important. For the same problem, a single ID3 decision tree (=-=[27]-=-) may require more than 650KBytes of main memory, while a C4.5 decision tree ([28]) may need 75KBytes. Retaining a large number of base classifiers and 2smeta-classifiers may not be practical nor feas... |
2211 | Experiments with a new boosting algorithm
- Freund, Shapire
- 1996
(Show Context)
Citation Context ...he most related work, Margineantu and Dietterich [18] studied the problem of pruning the ensemble of classifiers (i.e. the set of hypothesis (classifiers)) obtained by the boosting algorithm ADABOOST =-=[13]-=-. According to their findings, by examining the diversity and accuracy of the available classifiers, it is possible for a subset of classifiers to achieve similar levels of performance as the entire s... |
1273 | Fast effective rule induction
- Cohen
- 1995
(Show Context)
Citation Context ...nductive learning algorithms are used in our experiments. ID3, its successor C4.5, and Cart [3] are decision tree based algorithms, Bayes, described in [11], is a naive Bayesian classifier and Ripper =-=[9]-=- is a rule induction algorithm. Learning tasks Two data sets of real credit card transactions were used in our experiments provided by the Chase and First Union Banks, members of the FSTC (Financial S... |
773 |
A theory and methodology of inductive learning
- Michalski
- 1983
(Show Context)
Citation Context ...gence operations utilize similar methodologies on vast information sources to predict a wide range of conditions in various contexts. Machine learning or Inductive learning (or learning from examples =-=[20]-=-) aims to identify regularities in a given set of training examples with little or no knowledge about the domain from which the examples are drawn. Given a set of training examples, i.e. {(x1,y1), ...... |
731 | Stacked generalization.
- Wolpert
- 1992
(Show Context)
Citation Context ...l. We call the problem of learning useful new information from large and inherently distributed databases, the scaling problem for machine learning. Meta-learning [7], a technique similar to stacking =-=[32]-=-, was developed recently to deal with the scaling problem. The basic idea is to execute a number of machine learning processes on a number of data subsets in parallel, and then 1sto combine their coll... |
685 |
Neural network ensembles
- Hansen, Salamon
- 1990
(Show Context)
Citation Context ...ods that are suitable for multi-class problems and on metrics that provide information about the interdependencies among the base classifiers and their potential when forming ensembles of classifiers =-=[10, 14]-=-. In the most related work, Margineantu and Dietterich [18] studied the problem of pruning the ensemble of classifiers (i.e. the set of hypothesis (classifiers)) obtained by the boosting algorithm ADA... |
653 |
Generalization as Search
- Mitchell
- 1982
(Show Context)
Citation Context ...meta-learning in a similar manner. Furthermore, it improves accuracy by combining different learning systems each having different inductive bias (e.g representation, search heuristics, search space) =-=[21]-=-. By combining separately learned classifiers, meta-learning is expected to derive a higher level learned model that explains a large database more accurately than any of the individual learners. The ... |
432 |
Classi and Regression
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...veral different measures and methods. Before we present the metrics employed in this study, we summarize the previous and current research within the Machine Learning and KDD communities. Leo Breiman =-=[2]-=- and LeBlanc and Tibshirani [17] acknowledge the value of using multiple predictive models to increase accuracy, but they view the problem from a different perspective. They rely on cross-validation d... |
313 | Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions
- Provost, Fawcett
- 1997
(Show Context)
Citation Context ...ta sets. On the other hand, the pruning methods presented in this paper precede the meta-learning phase and, as such, can be used in conjunction with SCANN or any other algorithm. Provost and Fawcett =-=[26]-=- introduced the ROC convex hull method for its intuitiveness and flexibility. The method evaluates models for binary classification problems, by mapping them onto a True Positive/False Negative plane ... |
287 | Machine Learning Research: Four Current Directions,
- Dietterich
- 1997
(Show Context)
Citation Context ...ods that are suitable for multi-class problems and on metrics that provide information about the interdependencies among the base classifiers and their potential when forming ensembles of classifiers =-=[10, 14]-=-. In the most related work, Margineantu and Dietterich [18] studied the problem of pruning the ensemble of classifiers (i.e. the set of hypothesis (classifiers)) obtained by the boosting algorithm ADA... |
278 |
Neural network perception for mobile robot guidance
- Pomerleau
- 1992
(Show Context)
Citation Context ... disease diagnosis [29], in predicting glucose levels for diabetic patients [12], in detecting credit card fraud [30], in steering vehicles driving autonomously on public highways at 70 miles an hour =-=[24]-=-, in predicting stock option pricing [23], in computing customizing electronic newspapers[15] etc. Many large business institutions and market analysis firms attempt to distinguish the low-risk (high ... |
156 | Jam: Java agents for meta-learning over distributed databases. In
- Stolfo, Tselepis, et al.
- 1997
(Show Context)
Citation Context ...rs, meta-learning is expected to derive a higher level learned model that explains a large database more accurately than any of the individual learners. The JAM system (Java Agents for Meta-learning) =-=[31]-=- is a distributed agent-based data mining system that implements meta-learning. JAM takes full advantage of the inherent parallelism and distributed nature of meta-learning by providing a set of learn... |
149 | Error reduction through learning multiple descriptions,”Machine Learning,
- Ali, Pazzani
- 1996
(Show Context)
Citation Context ... and specialty metrics. Apart from these metrics and accuracy, correlation error and coverage have also been used to analyze and explain the properties and performance of classifiers. Ali and Pazzani =-=[1]-=- define as correlation error the fraction of instances for which a pair of base classifiers make the same incorrect predictions and Brodley and Lane [4] measured coverage by computing the fraction of ... |
130 | Pruning adaptive boosting,”
- Margineantu, Dietterich
- 1997
(Show Context)
Citation Context ...hat provide information about the interdependencies among the base classifiers and their potential when forming ensembles of classifiers [10, 14]. In the most related work, Margineantu and Dietterich =-=[18]-=- studied the problem of pruning the ensemble of classifiers (i.e. the set of hypothesis (classifiers)) obtained by the boosting algorithm ADABOOST [13]. According to their findings, by examining the d... |
106 | Combining estimates in Regression and Classification,
- LeBlanc, Tibshirani
- 1992
(Show Context)
Citation Context ...thods. Before we present the metrics employed in this study, we summarize the previous and current research within the Machine Learning and KDD communities. Leo Breiman [2] and LeBlanc and Tibshirani =-=[17]-=- acknowledge the value of using multiple predictive models to increase accuracy, but they view the problem from a different perspective. They rely on cross-validation data and analytical methods, (e.g... |
74 | Addressing the selective superiority problem: Automatic Algorithm/Model class selection.
- Brodley
- 1993
(Show Context)
Citation Context ...ect predictions and Brodley and Lane [4] measured coverage by computing the fraction of instances for which at least one of the base classifiers produces the correct prediction. 2.1 Diversity Brodley =-=[5]-=- defines diversity by measuring the classification overlap of a pair of classifiers, i.e. the percentage of the instances classified the same way by two classifiers while Chan [6] associates it with t... |
71 | Using correspondence analysis to combine classifiers,
- Merz
- 1999
(Show Context)
Citation Context ...has the advantage of searching for more complex and non-linear relations among the classifiers, at the expense of generating less intuitive representations. In a related study, Merz’s SCANN algorithm =-=[19]-=- employs correspondence analysis2 to map the predictions of the available classifiers onto a new scaled space that clusters similar prediction behaviors and then uses the nearest neighbor algorithm to... |
67 | Credit Card Fraud Detection Using Meta Learning: Issues and Initial Results,"
- Stolfo, Fan
- 1997
(Show Context)
Citation Context ...nificant commercial value [22]. Machine-learning algorithms have been deployed in heart disease diagnosis [29], in predicting glucose levels for diabetic patients [12], in detecting credit card fraud =-=[30]-=-, in steering vehicles driving autonomously on public highways at 70 miles an hour [24], in predicting stock option pricing [23], in computing customizing electronic newspapers[15] etc. Many large bus... |
66 |
Meta-learning for multistrategy and parallel learning
- Chan, Stolfo
- 1993
(Show Context)
Citation Context ... one primary “global” concept or model. We call the problem of learning useful new information from large and inherently distributed databases, the scaling problem for machine learning. Meta-learning =-=[7]-=-, a technique similar to stacking [32], was developed recently to deal with the scaling problem. The basic idea is to execute a number of machine learning processes on a number of data subsets in para... |
64 |
Multiple decision trees. In
- Kwok, Carter
- 1990
(Show Context)
Citation Context ...s of the base classifiers. (When the predictions of the classifiers are distributed evenly across the possible classes, the entropy is higher and the set of classifiers more diverse.) Kwok and Carter =-=[16]-=- correlate the error rates of a set of decision trees to their syntactical diversity, while Ali and Pazzani [1] studied the impact of the number of gain ties 4 on the accuracy of an ensemble of classi... |
62 |
International application of a new probability algorithm for the diagnosis of coronary artery disease
- Detrano, Janosi, et al.
- 1989
(Show Context)
Citation Context ...cade, machine learning has evolved from a field of laboratory demonstrations to a field of significant commercial value [22]. Machine-learning algorithms have been deployed in heart disease diagnosis =-=[29]-=-, in predicting glucose levels for diabetic patients [12], in detecting credit card fraud [30], in steering vehicles driving autonomously on public highways at 70 miles an hour [24], in predicting sto... |
53 | An extensive meta-learning approach for scalable and accurate inductive learning,
- Chan
- 1996
(Show Context)
Citation Context ....1 Diversity Brodley [5] defines diversity by measuring the classification overlap of a pair of classifiers, i.e. the percentage of the instances classified the same way by two classifiers while Chan =-=[6]-=- associates it with the entropy in the predictions of the base classifiers. (When the predictions of the classifiers are distributed evenly across the possible classes, the entropy is higher and the s... |
48 |
News Weeder: Learning to filter netnews,
- Lang
- 1995
(Show Context)
Citation Context ...g credit card fraud [30], in steering vehicles driving autonomously on public highways at 70 miles an hour [24], in predicting stock option pricing [23], in computing customizing electronic newspapers=-=[15]-=- etc. Many large business institutions and market analysis firms attempt to distinguish the low-risk (high profit) potential customers by learn simple categorical classifications of their potential cu... |
32 | Sharing learned models among remote database partitions by local meta-learning
- Chan, Stolfo
- 1996
(Show Context)
Citation Context ...ing a large number of base classifiers and 2smeta-classifiers may not be practical nor feasible. Meta classifiers are defined recursively as collections of classifiers structured in multi-level trees =-=[8]-=-, hence determining the optimal set of classifiers is a combinatorial problem. Pre-training pruning1 refers to the filtering of the classifiers before they are used in the training of a meta-classifie... |
27 | Creating and exploiting coverage and diversity
- Brodley, Lane
- 1996
(Show Context)
Citation Context ...and performance of classifiers. Ali and Pazzani [1] define as correlation error the fraction of instances for which a pair of base classifiers make the same incorrect predictions and Brodley and Lane =-=[4]-=- measured coverage by computing the fraction of instances for which at least one of the base classifiers produces the correct prediction. 2.1 Diversity Brodley [5] defines diversity by measuring the c... |
25 |
Does machine learning really work
- Mitchell
- 1997
(Show Context)
Citation Context ...escriptive representations (also called classifiers or models). Over the past decade, machine learning has evolved from a field of laboratory demonstrations to a field of significant commercial value =-=[22]-=-. Machine-learning algorithms have been deployed in heart disease diagnosis [29], in predicting glucose levels for diabetic patients [12], in detecting credit card fraud [30], in steering vehicles dri... |
19 |
A neural network model for estimating option prices,
- Malliaris, Salchenberger
- 1993
(Show Context)
Citation Context ...ucose levels for diabetic patients [12], in detecting credit card fraud [30], in steering vehicles driving autonomously on public highways at 70 miles an hour [24], in predicting stock option pricing =-=[23]-=-, in computing customizing electronic newspapers[15] etc. Many large business institutions and market analysis firms attempt to distinguish the low-risk (high profit) potential customers by learn simp... |
4 |
Models and computers in diabetes research and diabetes care. Computer methods and programs in biomedicine, special issue
- Carson, Fischer
- 1990
(Show Context)
Citation Context ...tory demonstrations to a field of significant commercial value [22]. Machine-learning algorithms have been deployed in heart disease diagnosis [29], in predicting glucose levels for diabetic patients =-=[12]-=-, in detecting credit card fraud [30], in steering vehicles driving autonomously on public highways at 70 miles an hour [24], in predicting stock option pricing [23], in computing customizing electron... |
4 | On the management of distributed learning agents
- Prodromidis
- 1997
(Show Context)
Citation Context ...r its intuitiveness and flexibility. The method evaluates models for binary classification problems, by mapping them onto a True Positive/False Negative plane and 1As opposed to post-training pruning =-=[25]-=- which denotes the evaluation and revision/pruning of the meta-classifier after it is computed. 2Very similar to Principal Component Analysis 3Although the transformed predictions may be mapped on a n... |
3 |
Boosting and naive bayesian learning [http://www-cse.ucsd.edu/#elkan/papers/bnb.ps
- Elkan
- 1997
(Show Context)
Citation Context ...periments and Results Learning algorithms Five inductive learning algorithms are used in our experiments. ID3, its successor C4.5, and Cart [3] are decision tree based algorithms, Bayes, described in =-=[11]-=-, is a naive Bayesian classifier and Ripper [9] is a rule induction algorithm. Learning tasks Two data sets of real credit card transactions were used in our experiments provided by the Chase and Firs... |