The experts considered in this paper are neural networks whose forecasts are combined by another neural network, a gate. For regression problems such an architecture was shown to partly remedy the two main problems in forecasting real world time series: nonstationarity and overfitting. The goal of this paper is to compare the forecasting ability of gated experts (GE) with a that of a single neural network expert on a time series classification task, which corresponds to decisions of taking a long position in a stock, a short position, or doing nothing. A new error function and a weight update rule were derived for this problem. The architecture was tested on the actual stock market data, and the errors on both training and testing data were smaller than errors for the best expert. This suggests that the performance of any single stock market forecasting system can be improved by making several copies of it and training them under the GE framework. In addition, an algorithm is presented for the GE architecture that makes it possible for the model to modify the data to fit the model better. Such a modification is done only if the decrease in the model cost associated with the output error is less than the increase in the input cost associated with moving the data away from its initial values. This idea corresponds to a bi-directional search for the true model, which was shown in AI to cut in half the exponent in the search time in comparison to the standard unidirectional search used by most connectionist architectures. The implementation of this algorithm was show to further decrease overfitting on the testing data.
|
3051
|
Neural Networks for Pattern Recognition
– Bishop
- 1995
|
|
593
|
Hierarchical Mixtures of Experts and the EM algorithm
– Jordan, Jacobs
- 1993
|
|
425
|
Bayes factors
– Kass, Raftery
- 1995
|
|
256
|
Multivariate adaptive regression splines
– Friedman
- 1991
|
|
254
|
Factorial hidden Markov models
– Ghahramani, Jordan
- 1997
|
|
182
|
The wake-sleep algorithm for unsupervised neural networks
– Hinton, Dayan, et al.
- 1995
|
|
150
|
A time-delay neural network architecture for isolated word recognition
– Lang, Waibel, et al.
- 1990
|
|
146
|
Probabilistic independence networks for hidden markov probability models
– Smyth, Heckerman, et al.
|
|
132
|
A guide to the literature on learning probabilistic networks from data
– Buntine
- 1996
|
|
120
|
Method for combining experts’ probability assessments
– Jacobs
- 1995
|
|
110
|
Neural Net Architectures for Temporal Sequence Processing’ in Andreas Weigand and Neil Gershenfeld (eds), Time Series Prediction – Forecasting Financial Failure and Understanding the Past
– Mozer
- 1993
|
|
95
|
A method of combining multiple experts for the recognition unconstrained handwritten numerals
– Huang, Suen
- 1995
|
|
77
|
Democracy in neural nets: voting schemes for classi�cation
– Battiti, Colla
- 1994
|
|
74
|
carlo implementation of gaussian process models for bayesian regression and classification
– Neal
- 1997
|
|
66
|
A multiple cause mixture model for unsupervised learning
– Saund
- 1995
|
|
57
|
Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics
– Pawelzik, Kohlmorgen, et al.
- 1996
|
|
57
|
Nonlinear gated experts for time series: discovering regimes and avoiding overfitting
– Weigend, Mangeas, et al.
- 1995
|
|
52
|
TD models: Modeling the world at a mixture of time scales
– Sutton
- 1995
|
|
52
|
Threshold autoregression, limit cycles and cyclical data
– Tong, KS
- 1980
|
|
47
|
Bayesian methods for mixtures of experts
– Waterhouse, MacKay, et al.
- 1996
|
|
36
|
Selecting input variables using mutual information and nonparametric density estimation, in
– Bonnlander, Weigend
- 1994
|
|
27
|
Why the logistic function? A tutorial discussion on probabilities and neural networks
– Jordan
- 1995
|
|
20
|
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
– Hihi, Bengio
- 1996
|
|
19
|
Modular and hierarchical learning systems
– Jordan, Jacobs
- 1995
|
|
15
|
Evaluating neural network predictors by bootstrapping
– Weigend, LeBaron
- 1994
|
|
14
|
A Recurrent Network Implementation of Time Series Classification
– Petridis, Kehagias
- 1996
|
|
13
|
Scale-based clustering using the radial basis function network
– Chakravarthy, Ghosh
- 1996
|
|
13
|
Predicting Conditional Probability Distributions: A Connectionist Approach
– Weigend, Srivastava
- 1995
|
|
12
|
Bayesian methods for neural networks: Theory and applications
– Mackay
- 1995
|
|
11
|
Constructive Algorithms for Hierarchical Mixtures of Experts
– Waterhouse, Robinson
- 1996
|
|
10
|
A practical monte carlo implementation of Bayesian learning
– Rasmussen
- 1996
|
|
6
|
Modular neural network architectures for classification
– Auda, Kamel, et al.
- 1996
|
|
6
|
Recurrent input transformations for Hidden Markov models
– Valtchev, Kapadia, et al.
- 1993
|
|
4
|
Voting Schemes for Cooperative Neural Network Classifiers
– Auda, Kamel, et al.
- 1995
|
|
4
|
The use of recurrent neural networks for classification
– Burrows, Niranjan
- 1994
|
|
4
|
Practical methods of tracking of non-stationary time series applied to real world problems
– Nabney
- 1996
|
|
3
|
Learning Long-Term Dependencies
– Lin, Horne
- 1996
|
|
3
|
Handling Uncertainty in Neural Networks: An Interval Approach
– Simoff, J
- 1996
|
|
3
|
A new class of neural networks called Neural Logic Networks: First technical report
– Teh
- 1993
|
|
3
|
Adaptive integration of multiple experts
– Teow, Tan
- 1995
|
|
3
|
Improved Time Series Segmentation using Gated Experts with Simulated Annealing
– Weigend, Srivastava
- 1996
|
|
2
|
Neural Networks: A Pattern Recognition Perspective
– Bishop
- 1996
|
|
2
|
Measuring Predictability using Multi-Scale Embedding
– Bjorn, Weigend
- 1996
|
|
2
|
Learning in Networks," http://www.Ultimode.com/~wray/#research
– Buntine
- 1995
|
|
2
|
A Hybrid Neural Network for Spatio-Temporal Pattern Recognition
– Chen, Cao
- 1996
|
|
2
|
Hidden Markov Decision Trees." http://www.ai.mit.edu/projects/jordan.html
– Jordan, Ghahramani, et al.
- 1996
|
|
2
|
Ph.D. : Interim Report
– Joseph
- 1996
|
|
2
|
An Adaptive Modular Recurrent Neural Network Based Power System Load Forecaster
– Khotanzad, Abaye, et al.
- 1995
|
|
2
|
A Partially Recurrent Architecture Applied to Classification Problems
– Martino
- 1995
|
|
2
|
Estimation of Correctness Region Using Clustering in Mixture of Experts
– Park, Hu
- 1996
|