We consider the problem of one-step ahead prediction of a real-valued, stationary, strongly mixing random process fX i g 1 i=\Gamma1. The best mean-square predictor of X 0 is its conditional mean given the entire infinite past fX i g \Gamma1 i=\Gamma1. Given a sequence of observations X 1 X 2 \Delta \Delta \Delta XN, we propose estimators for the conditional mean based on sequences of parametric models of increasing memory and of increasing dimension, for example, neural networks and Legendre polynomials. The proposed estimators select both the model memory and the model dimension, in a data-driven fashion, by minimizing certain complexity regularized least-squares criteria. When the underlying predictor function has a finite memory, we establish that the proposed estimators are memory-universal: the proposed estimators, which do not know the true memory, deliver the same statistical performance (rates of integrated mean-squared error) as that delivered by estimators that know the true memory. Furthermore, when the underlying predictor function does not have a finite memory, we establish that the estimator based on Legendre polynomials is consistent.
|
4962
|
The Nature of Statistical Learning Theory
– Vapnik
- 1998
|
|
694
|
A new look at the statistical model identification
– Akaike
- 1974
|
|
658
|
Estimation of Dependences Based on Empirical Data
– Vapnik
- 1982
|
|
629
|
A Probabilistic Theory of Pattern Recognition
– Devroye, Gyorfi, et al.
- 1996
|
|
358
|
Probability: Theory and Examples
– Durrett
- 1991
|
|
274
|
Universal approximation bounds for superpositions of a sigmoidal function
– Barron
- 1993
|
|
231
|
Stochastic Complexity in Statistical Inquiry
– Rissanen
- 1989
|
|
178
|
Orthogonal Polynomials
– Szego
- 1975
|
|
146
|
Minimum complexity density estimation
– Barron, Cover
- 1991
|
|
121
|
A universal data compression system
– Rissanen
- 1983
|
|
106
|
Mixing: Properties and Examples
– Doukhan
- 1994
|
|
103
|
Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings
– White
- 1990
|
|
102
|
Approximation and estimation bounds for artificial neural networks
– Barron
- 1994
|
|
67
|
A nonparametric approach to pricing and hedging derivative securities via learning networks
– Hutchinson, Lo, et al.
- 1994
|
|
58
|
Nonparametric Statistics for Stochastic Processes
– Bosq
- 1998
|
|
52
|
Complexity of strings in the class of Markov sources
– Rissanen
- 1986
|
|
47
|
Nonparametric Estimators for Time Series
– Robinson
- 1983
|
|
42
|
Universal Schemes for Prediction, Gambling and Portfolio Selection
– Algoet
- 1992
|
|
38
|
Adaptive model selection using empirical complexities
– Lugosi, Nobel
- 1999
|
|
37
|
An asymptotic property of model selection criteria
– Yang, Barron
- 1998
|
|
30
|
Nonparametric estimation via empirical risk minimization
– Lugosi, Zeger
- 1995
|
|
29
|
Concept learning using complexity regularization
– Lugosi, Zeger
- 1996
|
|
27
|
Consistent nonparametric regression (with discussion
– Stone
- 1977
|
|
26
|
Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives
– Hornik, Stinchcombe, et al.
- 1994
|
|
24
|
Identification of nonlinear time series: First order characterization and order determination
– Auestad, Tjstheim
- 1990
|
|
24
|
Prediction of random sequences and universal coding
– Ryabko
- 1988
|
|
23
|
Risk bounds for model selection via penalization
– Barron, Birg'e, et al.
- 1995
|
|
22
|
Approximation results for orthogonal polynomials in Sobolev spaces
– Canuto, Quarteroni
- 1982
|
|
21
|
On consistent nonparametric order determination and chaos
– Cheng, Tong
- 1992
|
|
20
|
A central limit theorem and a strong mixing condition
– Rosenblatt
- 1956
|
|
19
|
The strong law of large numbers for sequential decisions under uncertainty
– Algoet
- 1994
|
|
19
|
Strong universal consistency of neural network classifiers
– Farago, Lugosi
- 1993
|
|
19
|
Information Criteria for Selecting Possibly Misspecified Parametric Models
– Sin, White
- 1996
|
|
16
|
The computational intractability of training sigmoidal neural networks
– Jones
- 1997
|
|
16
|
Convergence rates for single hidden layer feedforward networks
– McCaffrey, Gallant
- 1994
|
|
15
|
Nonparametric regression estimation under mixing conditions
– Roussas
- 1990
|
|
14
|
Weakly Convergent Nonparametric Forecasting of Stationary Time Series
– Morvai, Yakowitz, et al.
- 1997
|
|
12
|
Non-linear Time Series: A Selective Review
– Tjstheim
- 1994
|
|
11
|
Nonparametric Estimation of Conditional Quantiles Using Neural Networks
– White
- 1991
|
|
10
|
Minimum complexity regression estimation with weakly dependent observations
– Modha, Masry
- 1996
|
|
10
|
Some results on sieve estimation with dependent observations
– White, Wooldridge
- 1991
|
|
9
|
Mixing properties of Harris chains and autoregressive processes
– Athreya, Pantula
- 1986
|
|
9
|
Nonparametric inference for ergodic, stationary time series
– Morvai, Yakowitz, et al.
- 1996
|
|
7
|
Functional identification in nonlinear time series
– Auestad, Tj��stheim
- 1991
|
|
7
|
Open problems in Information Theory
– Cover
- 1975
|
|
6
|
Sup-norm approximation bounds for networks through probabilistic methods
– Yukich, Stinchcombe, et al.
- 1995
|
|
5
|
Conditions for linear processes to be strong-mixing
– Withers
- 1981
|
|
5
|
Fully vector-quantized neural network-based code-excited nonlinear predictive speech coding
– Wu, Niranjan, et al.
- 1994
|
|
4
|
Order selection for linear time series models: A review,” in Developments in Time Series Analysis
– Bhansali
- 1993
|
|
4
|
A board system for high-speed image analysis and neural networks
– Sackinger, Graf
- 1996
|