#### DMCA

## Algorithmic stability and meta-learning

### Cached

### Download Links

- [andreas-maurer.eu]
- [jmlr.csail.mit.edu]
- [www.eecs.berkeley.edu]
- [jmlr.org]
- [www.cs.berkeley.edu]
- [www.cs.berkeley.edu]
- [www.eecs.berkeley.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | J. Machine Learning Research |

Citations: | 7 - 1 self |

### Citations

13219 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ges of regularisation and we stick to ERM for definiteness and motivation. The traditional method to give generalization error bounds for such algorithms is described in (Anthony, Bartlett, 1999) or (=-=Vapnik, 1995-=-) and involves the study of the complexity of the function space F H = � z ↦→ l(c,z) : c ∈ H � in terms of covering numbers or related quantities, and proceeds to prove a uniform bound on the estimati... |

2210 | Probability inequalities for sums of bounded random variables - Hoeffding - 1963 |

1324 | A Probabilistic Theory of Pattern Recognition - Devroye, Gyor - 1996 |

731 | Stacked generalization. - Wolpert - 1992 |

265 | Stability and generalization
- Bousquet, Elisseeff
(Show Context)
Citation Context ... (Z) ;8 > 0; Dm fS : R (A (S) ; D) B (; S)g 1 : Estimators and Algorithmic Stability. The leave-one-out estimator lloo and the empirical estimator lemp are the functions (the notation is from =-=[4]-=-) lloo; lemp : A (C;Z) (Zm)! [0;M ] de ned for A 2 A (C;Z) and S = (z1; :::; zm) 2 Zm by lloo (A;S) = 1 m mX i=1 l A Sni ; zi ; where Sni generally denotes the sample S with the i-th element ... |

193 | A model of inductive bias learning.
- Baxter
- 2000
(Show Context)
Citation Context ...earner can possiblysnd an algorithm to outperform other algorithms, but, of course, only on average over the distribution E . This mechanism ofmeta-learning has been analysed by Jonathan Baxter ([2], =-=[3]-=-) and there have been several successful experiments in practical machinelearning contexts (see [5],[14],[15] and section 6). In this paper we extend the results in [3] and o¤er a general method to co... |

140 | Some pac-bayesian theorems. - McAllester - 1999 |

127 | Probability Inequalities for Sums of Bounded Random Variables,” - Hoeding - 1963 |

58 | Almost-everywhere algorithmic stability and generalization error - Kutin, Niyogi - 2002 |

56 | Representation, similarity, and the chorus of prototypes.
- Edelman
- 1995
(Show Context)
Citation Context ...the theorem becomes non-trivial. We apply these results to a practical meta-algorithm for least squares regression. This meta-algorithm is related to the Chorus of Prototypes introduced by Edelman in =-=[8]-=-, so we call it CP-Regression. CP-Regression takes the meta-sample S=(S1; :::; Sn) and uses a primitive algorithm A0 to compute a set of corresponding regression functions h1; :::; hn. For any new inp... |

39 |
An introduction to support vector machines.
- Christianini, Shawe-Taylor
- 2002
(Show Context)
Citation Context ...lgorithm for Regression In this section we present a meta-learning algorithm for function estimation. The algorithm is based on regularized least-squares regression, or ridge regression (as in [4] or =-=[6]-=-) and preliminary experiments appear promising. 22 To implicitely also de ne a kernelisedversion of the algorithm, we describe it in a setting where the input space is a subset X of the unit ball fk... |

29 | Györfi L, Lugosi G: A Probabilistic Theory of Pattern Recognition - Devroye - 1996 |

17 |
Lifelong learning algorithms. Learning to learn,
- Thrun
- 1997
(Show Context)
Citation Context ...E [D m (S)]. The accumulation of experience is then modelled by n independent draws of samples Si ∼ DE, resulting in the sample-sequence or meta-sample S=(S1,...,Sn) (also called ’support sets’ by S. =-=Thrun, 1998-=-, or (n,m)-samples by J. Baxter, 2000). The probability for S to arise in this manner is (DE) n (S) and depends completely on the environment E. We generally use m to denote the size of the ordinary s... |

16 |
Learning in Neural Networks: Theoretical Foundations.
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ... not mapped is the transfer risk R (A; E). Correspondingly an estimator prediction bound is not a generalization error bound for the transfer risk. 9 Covering Numbers (these de nitions are taken from =-=[1]-=-). Let X be a set, X0 X. For > 0 and a metric d on X the covering numbers N (;X0; d) are de ned by N (;X0; d) = min N 2 N : 9 (x1; :::; xN ) 2 XN ;8x 2 X0;9i; d (x; xi) : For a class F of ... |

14 |
Partha Niyogi: Almost-everywhere Algorithmic Stability and Generalization Error
- Kutin
(Show Context)
Citation Context ...ithms have simple bounds on their estimation error. Corresponding theorems can be found in [4]. The requirement of stability has been weakened and the results have been extended by Nyogi and Kutin in =-=[10]-=-. If for somesand all S the algorithm A (S) is uniformlys-stable, then the estimation term in (6) can be bounded in a particularly simple way, namely by 2, as stated in Theorem 6. Results. Algorithmi... |

7 | Theoretical Models of Learning to Learn, in Learning to Learn, S.Thrun, L.Pratt Eds - Baxter - 1998 |

6 | Theoretical Models of Learning to Learn,” Learning to - Baxter - 1998 |

4 |
Multitask Learning,” Learning to
- Caruana
- 1998
(Show Context)
Citation Context ...er the distribution E. This mechanism of meta-learning has been analysed by Jonathan Baxter (1998, 2000) and there have been several successful experiments in practical machine-learning contexts (see =-=Caruana, 1998-=-; Thrun, 1996, 1998) and Section 6). In this paper we extend the results in Baxter (2000) and offer a general method to control the generalization error of meta-learning. We begin by reviewing some no... |

4 | Learning, in Learning to Learn, S.Thrun, L.Pratt Eds - Caruana - 1998 |

1 |
Transfer in Congnition, in Learning to
- Robins
- 1998
(Show Context)
Citation Context ...mally study the phenomenon of transfer, where novel tasks and concepts are learned more quickly and reliably through the application of past experience. Transfer is fundamental to human learning (see =-=Robins, 1998-=-, for an overview of the psychological literature) and offers a way to partially escape the implications of the No Free Lunch Theorem (NFLT). The NFLT states that no algorithm is superior to another w... |

1 |
Explanation-Based Neural Network
- Thrun
- 1996
(Show Context)
Citation Context ...tion E. This mechanism of meta-learning has been analysed by Jonathan Baxter (1998, 2000) and there have been several successful experiments in practical machine-learning contexts (see Caruana, 1998; =-=Thrun, 1996-=-, 1998) and Section 6). In this paper we extend the results in Baxter (2000) and offer a general method to control the generalization error of meta-learning. We begin by reviewing some notions of lear... |