Results 1 - 10
of
10
Universal Prediction
- IEEE Transactions on Information Theory
, 1998
"... This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression.
Nonparametric time series prediction through adaptive model selection
- Machine Learning
, 2000
"... Abstract. We consider the problem of one-step ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and ada ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
Abstract. We consider the problem of one-step ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and adapt the uniform convergence framework of Vapnik and Chervonenkis to the problem of time series prediction, obtaining finite sample bounds. Furthermore, by allowing both the model complexity and memory size to be adaptively determined by the data, we derive nonparametric rates of convergence through an extension of the method of structural risk minimization suggested by Vapnik. All our results are derived for general L p error measures, and apply to both exponentially and algebraically mixing processes.
Minimum Complexity Regression Estimation with Weakly Dependent Observations
- IEEE Trans. Inform. Theory
, 1996
"... Parameter Spaces and Abstract Complexities For each integer rt _> 1, let % denote a model dimension, for example, see (2), and let S, denote a compact subset of ]R The set S, will serve as a collection of parameters associated with the model dimension %, for example, see (5). For every v S,, let f( ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Parameter Spaces and Abstract Complexities For each integer rt _> 1, let % denote a model dimension, for example, see (2), and let S, denote a compact subset of ]R The set S, will serve as a collection of parameters associated with the model dimension %, for example, see (5). For every v S,, let f(,, v) denote a real-valued function on Bx parameterized by (n, v), for example, see (3). The following condition is required to invoke the exponential inequalities in Theorems 4.2 and 4.3.
Extension of the PAC Framework to Finite and Countable Markov Chains
- In Proceedings of the 12th Annual Conference on Computational Learning Theory
, 2000
"... We consider a model of learning in which the successive observations follow a certain Markov chain. The observations are labeled according to a membership to some unknown target set. For a Markov chain with finitely many states we show that, if the target set belongs to a family of sets with a finit ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We consider a model of learning in which the successive observations follow a certain Markov chain. The observations are labeled according to a membership to some unknown target set. For a Markov chain with finitely many states we show that, if the target set belongs to a family of sets with a finite VC dimension, then probably approximately correct learning of this set is possible with polynomially large samples. Specifically for observations following a random walk with a state space X and uniform stationary distribution, the sample size required is no more than\Omega i t 0 1\Gamma 2 log(t 0 jX j 1 ffi ) j , where ffi is the confidence level, 2 is the second largest eigenvalue of the transition matrix and t 0 is the sample size sufficient for learning from i.i.d. observations We extend these results to Markov chains with countably many states using Lyapunov function technique and recent results on mixing properties of infinite state Markov chains. 1 INTRODUCTION The subject...
Convergence and consistency of regularized boosting algorithms with stationary β-mixing observations
- In NIPS
, 2006
"... We study the statistical convergence and consistency of regularized Boosting methods, where the samples are not independent and identically distributed (i.i.d.) but come from empirical processes of stationary β-mixing sequences. Utilizing a technique that constructs a sequence of independent blocks ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We study the statistical convergence and consistency of regularized Boosting methods, where the samples are not independent and identically distributed (i.i.d.) but come from empirical processes of stationary β-mixing sequences. Utilizing a technique that constructs a sequence of independent blocks close in distribution to the original samples, we prove the consistency of the composite classifiers resulting from a regularization achieved by restricting the 1-norm of the base classifiers ’ weights. When compared to the i.i.d. case, the nature of sampling manifests in the consistency result only through generalization of the original condition on the growth of the regularization parameter. 1
On Optimal Sequential Prediction for General Processes
- IEEE Transactions on Information Theory
, 2001
"... In the stochastic sequential prediction problem, the elements of a random process X 1 , X 2 , ... 2 R are successively revealed to a forecaster. At each time t the forecaster makes a prediction F t of X t based only on X 1 , ..., X t 1 , when X t is revealed, the forecaster incurs a loss `(F t , X t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In the stochastic sequential prediction problem, the elements of a random process X 1 , X 2 , ... 2 R are successively revealed to a forecaster. At each time t the forecaster makes a prediction F t of X t based only on X 1 , ..., X t 1 , when X t is revealed, the forecaster incurs a loss `(F t , X t ). This paper considers several aspects of the sequential prediction problem for unbounded, non-stationary processes under p-th power loss , 1 < p < 1. In the first part of the paper it is shown that Bayes prediction schemes are Cesaro optimal under general conditions, that Cesaro optimal prediction schemes are unique in a natural sense, and that Cesaro optimality is equivalent to a form of weak calibration. Extensions of the existence and uniqueness results to generalized prediction, and prediction from observations with additive noise, are established.
Learning from dependent observations
, 2006
"... In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) essentially only require that the data-g ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) essentially only require that the data-generating process satisfies a certain law of large numbers. We then consider the learnability of SVMs for α-mixing (not necessarily stationary) processes for both classification and regression, where for the latter we explicitly allow unbounded noise. Keywords: Support vector machine, Consistency, Non-stationary mixing process, Classification, Regression
CONSISTENCY OF SUPPORT VECTOR MACHINES FOR FORECASTING THE EVOLUTION OF AN UNKNOWN ERGODIC DYNAMICAL SYSTEM FROM OBSERVATIONS WITH UNKNOWN NOISE
, 2007
"... We consider the problem of forecasting the next (observable) state of an unknown ergodic dynamical system from a noisy observation of the present state. Our main result shows, for example, that support vector machines (SVMs) using Gaussian RBF kernels can learn the best forecaster from a sequence of ..."
Abstract
- Add to MetaCart
We consider the problem of forecasting the next (observable) state of an unknown ergodic dynamical system from a noisy observation of the present state. Our main result shows, for example, that support vector machines (SVMs) using Gaussian RBF kernels can learn the best forecaster from a sequence of noisy observations if (a) the unknown observational noise process is bounded and has a summable α-mixing rate and (b) the unknown ergodic dynamical system is defined by a Lipschitz continuous function on some compact subset of R d and has a summable decay of correlations for Lipschitz continuous functions. In order to prove this result we first establish a general consistency result for SVMs and all stochastic processes that satisfy a mixing notion that is substantially weaker than α-mixing. Let us assume that we have an ergodic dynamical system described by the sequence (F n)n≥0 of iterates of an (essentially) unknown map F:M → M, where M ⊂ R d is compact and the corresponding ergodic measure µ is assumed to be unique. Furthermore, assume that all observations ˜x of this dynamical system are corrupted by some stationary, R d-valued, additive noise process E = (εn)n≥0 whose distribution ν we assume to be independent of the state, but otherwise unknown, too. In other words all possible observations of the system at time n ≥ 0 are of the form (1) ˜xn = F n (x0) + εn, where x0 is a true but unknown state at time 0. Now, given an observation of the system at some arbitrary time, our goal is to forecast the next observable

