Abstract:
This paper reviews research on combining artificial neural nets, and provides an overview of, and an introduction to, the papers contained this Special Issue, and its companion (Connection Science, 9, 1). Two main approaches, ensemble-based, and modular, are identified and considered. An ensemble, or committee, is made up of a set of nets, each of which is a general function approximator. The members of the ensemble are combined in order to obtain better generalisation performance than would be achieved by any of the individual nets. The main issues considered here under the heading of ensemble-based approaches, are (a) how to combine the outputs of the ensemble members (b) how to create candidate ensemble members and (c) which methods lead to the most effective ensembles? Under the heading of modular approaches we begin by considering a divide-and-conquer approach by which a function is automatically decomposed into a number of subfunctions which are treated by specialist modules. Other modular approaches are also identified and considered, for whilst the divideand-conquer approach is designed to improve performance, the term modularity can be given a wider interpretation. The broadly defined topic of modularity includes the explicit decomposition of a task based on the designer's understanding, and the exploitation of specialist modules in order to accomplish tasks which could not be performed by a monolithic net. 1
Citations
|
3052
|
Neural Networks for Pattern Recognition
– Bishop
- 1995
|
|
1453
|
Bagging predictors
– Breiman
- 1996
|
|
593
|
Hierarchical Mixtures of Experts and the EM algorithm
– Jordan, Jacobs
- 1993
|
|
569
|
Adaptive mixtures of local experts
– Jacobs, Jordan, et al.
- 1991
|
|
453
|
The strength of weak learnability
– Schapire
- 1990
|
|
370
|
Stacked Generalization
– Wolpert
- 1992
|
|
368
|
Neural network ensembles
– Hansen, Salamon
- 1990
|
|
308
|
Neural network ensembles, cross validation, and active learning
– Krogh, Vedelsby
- 1995
|
|
243
|
When networks disagree: Ensemble methods for neural network Neural networks for speech and image processing
– Perrone, Cooper
- 1993
|
|
236
|
On learning the past tenses of English verbs
– Rumelhart, McClelland
- 1986
|
|
222
|
Stacked Regression
– Breiman
- 1995
|
|
120
|
Method for combining experts’ probability assessments
– Jacobs
- 1995
|
|
96
|
Combining the results of several neural network classi�ers
– Rogova
- 1994
|
|
87
|
Combining Probability Distributions: A Critique and an Annotated Bibliography
– Genest, Zidek
- 1986
|
|
86
|
variance, and arcing classifiers
– Breiman
- 1996
|
|
84
|
Conceptual Modeling of Coincident Failures in Multiversion Software
– Littlewood, Miller
- 1989
|
|
83
|
Combining forecasts: a review and annotated bibliography
– Clemen
- 1989
|
|
83
|
Back propagation is sensitive to initial conditions
– Kolen, Pollack
- 1990
|
|
78
|
Boosting and other ensemble methods
– Drucker, Cortes, et al.
- 1994
|
|
66
|
Learning Machines: Foundations of Trainable Pattern-Classifying Systems
– Nilsson
- 1965
|
|
65
|
Large automatic learning, rule extraction, and generalization
– Denker, Schwartz, et al.
- 1987
|
|
60
|
A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors
– Eckhardt, Lee
- 1985
|
|
46
|
The meta-pi network: building distributed knowledge representations for robust multisource pattern recognition
– Hampshire, Waibel
- 1992
|
|
43
|
Computational methods for a mathematical theory of evidence
– Barnett
- 1985
|
|
28
|
Modularity and scaling in large phonemic neural networks
– Waibel, Sawai, et al.
- 1989
|
|
27
|
Theoretical foundations of linear and order statistics combiners for neural pattern classifierru
– Tumer, Ghosh
|
|
22
|
Combining forecasts - Twenty years later
– Granger
- 1989
|
|
20
|
Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks
– Hashem, Schmeiser
- 1993
|
|
19
|
Modular and hierarchical learning systems
– Jordan, Jacobs
- 1995
|
|
17
|
Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition
– Peng, Jacobs, et al.
- 1996
|
|
11
|
Learning ranks with neural networks (Invited paper
– Al-Ghoneim, Kumar
- 1995
|
|
9
|
Neural Nets and Diversity
– Sharkey, Sharkey, et al.
- 1995
|
|
8
|
The Modularity of Mind: An Essay on Faculty
– Fodor
- 1983
|
|
6
|
Error Estimation by Series Association for Neural Network Systems
– Kim, Bartlett
- 1995
|
|
6
|
An experimental evaluation of independence in multiversion programming
– Knight
- 1986
|
|
6
|
Searching Weight Space for Backpropagation Solution Types
– Sharkey, Neary, et al.
- 1995
|
|
5
|
Bias/Variance Analyses of Mixtures-of-Experts Architectures
– Jacobs
- 1997
|
|
1
|
Experiments with a new boosting algorithm, to appear 'Machine
– Freund, Schapire
- 1996
|
|
1
|
Collinearity and the use of latent root regression for combining GNP forecasts
– Jr, B, et al.
- 1989
|
|
1
|
An Analysis of Catastrophic
– Sharkey
- 1995
|
|
1
|
Multimodular architecture for remote sensing operations
– Thria, Mejia, et al.
- 1992
|