Results 1  10
of
19
Rademacher Complexity Bounds for NonI.I.D. Processes
"... This paper presents the first Rademacher complexitybased error bounds for noni.i.d. settings, a generalization of similar existing bounds derived for the i.i.d. case. Our bounds hold in the scenario of dependent samples generated by a stationary βmixing process, which is commonly adopted in many p ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
This paper presents the first Rademacher complexitybased error bounds for noni.i.d. settings, a generalization of similar existing bounds derived for the i.i.d. case. Our bounds hold in the scenario of dependent samples generated by a stationary βmixing process, which is commonly adopted in many previous studies of noni.i.d. settings. They benefit from the crucial advantages of Rademacher complexity over other measures of the complexity of hypothesis classes. In particular, they are datadependent and measure the complexity of a class of hypotheses based on the training sample. The empirical Rademacher complexity can be estimated from such finite samples and lead to tighter generalization bounds. We also present the first margin bounds for kernelbased classification in this noni.i.d. setting and briefly study their convergence. 1
Fast learning from noni.i.d. observations
 In NIPS
, 2009
"... We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from αmixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality use ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from αmixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case. 1
Security Analysis of Online Centroid Anomaly Detection
, 2012
"... Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution, 1 we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses the following securityrelated issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker’s full or limited control over the traffic and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker’s gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit traffic, confirms the tightness of our theoretical bounds and the practicality of our protection mechanisms.
CONSISTENCY OF SUPPORT VECTOR MACHINES FOR FORECASTING THE EVOLUTION OF AN UNKNOWN ERGODIC DYNAMICAL SYSTEM FROM OBSERVATIONS WITH UNKNOWN NOISE
, 2007
"... We consider the problem of forecasting the next (observable) state of an unknown ergodic dynamical system from a noisy observation of the present state. Our main result shows, for example, that support vector machines (SVMs) using Gaussian RBF kernels can learn the best forecaster from a sequence of ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of forecasting the next (observable) state of an unknown ergodic dynamical system from a noisy observation of the present state. Our main result shows, for example, that support vector machines (SVMs) using Gaussian RBF kernels can learn the best forecaster from a sequence of noisy observations if (a) the unknown observational noise process is bounded and has a summable αmixing rate and (b) the unknown ergodic dynamical system is defined by a Lipschitz continuous function on some compact subset of R d and has a summable decay of correlations for Lipschitz continuous functions. In order to prove this result we first establish a general consistency result for SVMs and all stochastic processes that satisfy a mixing notion that is substantially weaker than αmixing. Let us assume that we have an ergodic dynamical system described by the sequence (F n)n≥0 of iterates of an (essentially) unknown map F:M → M, where M ⊂ R d is compact and the corresponding ergodic measure µ is assumed to be unique. Furthermore, assume that all observations ˜x of this dynamical system are corrupted by some stationary, R dvalued, additive noise process E = (εn)n≥0 whose distribution ν we assume to be independent of the state, but otherwise unknown, too. In other words all possible observations of the system at time n ≥ 0 are of the form (1) ˜xn = F n (x0) + εn, where x0 is a true but unknown state at time 0. Now, given an observation of the system at some arbitrary time, our goal is to forecast the next observable
Concentration in unbounded metric spaces and algorithmic stability. arXiv:1309.1007
, 2013
"... We prove an extension of McDiarmid’s inequality for metric spaces with unbounded diameter. To this end, we introduce the notion of the subgaussian diameter, which is a distributiondependent refinement of the metric diameter. Our technique provides an alternative approach to that of Kutin and Niyo ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We prove an extension of McDiarmid’s inequality for metric spaces with unbounded diameter. To this end, we introduce the notion of the subgaussian diameter, which is a distributiondependent refinement of the metric diameter. Our technique provides an alternative approach to that of Kutin and Niyogi’s method of weakly differencebounded functions, and yields nontrivial, dimensionfree results in some interesting cases where the former does not. As an application, we give apparently the first generalization bound in the algorithmic stability setting that holds for unbounded loss functions. This yields a novel risk bound for some regularized metric regression algorithms. We give two extensions of the basic concentration result. The first enables one to replace the independence assumption by appropriate strong mixing. The second generalizes the subgaussian technique to other Orlicz norms. 1.
Generalization and Robustness of Batched Weighted Average Algorithm with Vgeometrically Ergodic Markov Data
"... ar ..."
(Show Context)
NearOptimal Approximation Rates for Distribution Free Learning with Exponentially, Mixing Observations
"... AbstractThis paper derives the rate of convergence for the distribution free learning problem when the observation process is an exponentially strongly mixing (αmixing with an exponential rate) Markov chain. If is an exponentially strongly mixing Markov chain with stationary measure ρ, it is show ..."
Abstract
 Add to MetaCart
(Show Context)
AbstractThis paper derives the rate of convergence for the distribution free learning problem when the observation process is an exponentially strongly mixing (αmixing with an exponential rate) Markov chain. If is an exponentially strongly mixing Markov chain with stationary measure ρ, it is shown that the empirical estimate f z that minimizes the discrete quadratic risk satisfies the bound where E z∈Z m (·) is the expectation over the first msteps of the chain, f ρ is the regressor function in L 2 (ρ X ) associated with ρ, r is related to the abstract smoothness of the regressor, ρ X is the marginal measure associated with ρ, and a is the rate of concentration of the Markov chain.
Learnability of NonI.I.D
"... Abstract Learnability has always been one of the most central problems in learning theory. Most previous studies on this issue were based on the assumption that the samples are drawn independently and identically according to an underlying (unknown) distribution. The i.i.d. assumption, however, doe ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Learnability has always been one of the most central problems in learning theory. Most previous studies on this issue were based on the assumption that the samples are drawn independently and identically according to an underlying (unknown) distribution. The i.i.d. assumption, however, does not hold in many real applications. In this paper, we study the learnability of problems where the samples are drawn from empirical process of stationary βmixing sequence, which has been a widelyused assumption implying a dependence weaken over time in training samples. By utilizing the independent blocks technique, we provide a sufficient and necessary condition for learnability, that is, average stability is equivalent to learnability with AERM (Asymptotic Empirical Risk Minimization) in the noni.i.d. learning setting. In addition, we also discuss the generalization error when the test variable is dependent on the training sample.