(Enter summary)
Abstract: . Machine learning programs need to scale up to very large data sets for several reasons,
including increasing accuracy and discovering infrequent special cases. Current inductive learners
perform well with hundreds or thousands of training examples, but in some cases, up to a million
or more examples may be necessary to learn important special cases with confidence. These tasks
are infeasible for current learning programs running on sequential machines. We discuss the need
for very large data... (Update)
Context of citations to this paper: More
.... rules tend to be more interesting than large disjunct rules, since the latter are more likely to be previously known by the user [11]. In this paper we propose a hybrid decision tree genetic algorithm method for rule discovery that copes with the problem of small...
.... because small disjuncts often capture special cases that were unknown previously the analysts often know the common cases (Provost Aronis 1996). As with classifier learning, in order not to be swamped with spurious small disjuncts it is essential for a data set to be...
Cited by: More
PKDD'98 Tutorial on Scalable, High-Performance Data Mining with.. - Freitas (1998)
(Correct)
Increasing the Efficiency of Data Mining Algorithms with.. - Aronis, Provost (1997)
(Correct)
On the Effect of Data Set Size on Bias and Variance in.. - Brain, Webb
(Correct)
Active bibliography (related documents): More All
1.0: A Survey of Methods for Scaling Up Inductive Algorithms - Provost, Kolluri (1999)
(Correct)
0.8: A Survey of Methods for Scaling Up Inductive Learning Algorithms - Provost, Kolluri (1997)
(Correct)
0.7: Scaling Up: Distributed Machine Learning with Cooperation - Provost, Hennessy (1996)
(Correct)
Similar documents based on text: More All
0.3: Learning from Bad Data - Provost, Danyluk (1995)
(Correct)
0.3: A Study in Causal Discovery from Population-Based Infant Birth .. - Mani, Cooper (1999)
(Correct)
0.2: Scaling up Inductive Logic Programming: An Evolutionary.. - Reiser, Riddle
(Correct)
Related documents from co-citation: More All
12: Programs for machine learning (context) - Quinlan - 1993
9: Small disjuncts in action: learning to diagnose errors in the local loop of the .. (context) - Danyluk, Provost - 1993
7: Exploiting Background Knowledge in Automated Discovery
- Aronis, Provost et al. - 1996
BibTeX entry: (Update)
F.J. Provost and J.M. Aronis. Scaling up inductive learning with massive parallelism. Machine Learning 23(1), Apr./96, 33-46. http://citeseer.ist.psu.edu/391399.html More
@article{ provost96scaling,
author = "Foster J. Provost and John M. Aronis",
title = "Scaling Up Inductive Learning with Massive Parallelism",
journal = "Machine Learning",
volume = "23",
number = "1",
pages = "33-46",
year = "1996",
url = "citeseer.ist.psu.edu/391399.html" }
Citations (may not include all citations):
1491
Learning internal representations by error propagation (context) - Rumelhart, Hinton et al. - 1986
1359
Induction of Decision Trees (context) - Quinlan - 1986
216
Very simple classification rules perform well on most common.. (context) - Holte - 1993
99
Concept learning and the problem of small disjuncts (context) - Holte, Acker et al. - 1989
83
Generating production rules from decision trees (context) - Quinlan - 1987
83
Incremental induction of decision trees
- Utgoff - 1989
62
Megainduction: machine learning on very large databases (context) - Catlett
59
Toward parallel and distributed learning by meta-learning
- Chan, Stolfo
54
Meta-learning for multistrategy and parallel learning (context) - Chan, Stolfo
51
Parallel depth-first search (context) - Rao, Kumar - 1987
50
A case study of incremental concept induction (context) - Schlimmer, Fisher - 1986
47
Megainduction: A test flight (context) - Catlett
47
Parallel depth-first search (context) - Kumar, Rao - 1987
32
Small disjuncts in action: Learning to diagnose errors in th.. (context) - Danyluk, Provost - 1993
28
Inductive policy: The pragmatics of bias selection (context) - Provost, Buchanan - 1995
24
Learning decision lists using homogeneous rules
- Segal, Etzioni - 1994
22
An ounce of knowledge is worth a ton of data: Quantitative s..
- Gaines - 1989
20
RL4: A tool for knowledge-based induction (context) - Clearwater, Provost - 1990
20
An efficient implementation of the backpropagation algorithm.. (context) - Zhang, Mckenna et al. - 1989
18
Distributed machine learning: Scaling up with coarsegrained .. (context) - Provost, Hennessy - 1994
15
Incremental batch learning (context) - Clearwater, Cheng et al. - 1989
13
Efficiently constructing relational features from background..
- Aronis, Provost - 1994
10
A SIMD approach to parallel heuristic search (context) - Mahanti, Daniels - 1993
9
A distributed problem-solving approach to inductive learning (context) - Shaw, Sikora - 1990
8
The memory-based reasoning paradigm (context) - Stanfill, Waltz - 1988
8
Accelerated learning on the connection machine
- Cook, Holder - 1990
8
DADO: a tree-structured machine architecture for production .. (context) - Stolfo, Shaw - 1982
6
Special issue on bias evaluation and selection (context) - Gordon, desJardins - 1995
5
Massively parallel IDA* search (context) - Cook, Lyons - 1993
4
ARIEL: A massively parallel symbolic learning assistant for .. (context) - Lathrop, Webster et al. - 1990
3
Machine learning in the service of exploratory science and e.. (context) - Provost, Buchanan et al. - 1993
2
Initial performance of the DADO2 prototype (context) - Stolfo - 1987
1
Massachusetts Institute of Technology (context) - Lathrop - 1995
1
Learning with Small Disjuncts (context) - Weiss - 1995
1
Editorial introduction (context) - Bobrow - 1993
1
An unexpected relationship between the timing of entry into .. (context) - Sharma, Provost et al. - 1995
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.stern.nyu.edu/~fprovost): More
On Applied Research in Machine Learning - Provost (1998)
(Correct)
Well-Trained PETs: Improving Probability Estimation Trees - Provost, Domingos (2000)
(Correct)
Machine Learning from Imbalanced Data Sets 101 (Extended Abstract) - Provost
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC