MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  A model of inductive bias learning (2000) [49 citations — 0 self]

Download:
Download as a PDF
by Jonathan Baxter
Journal of Artificial Intelligence Research
http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume12/baxter00a.pdf
Add To MetaCart

Abstract:

A major problem in machine learning is that of inductive bias: how to choose a learner’s hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonably-sized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Explicit bounds are also derived demonstrating that learning multiple tasks within an environment of related tasks can potentially give much better generalization than learning a single task. 1.

Citations

1376 A theory of the learnable – Valiant - 1984
641 Estimation of Dependences Based on Empirical Data – Vapnik - 1982
636 Statistical Decision Theory and Bayesian Analysis – Berger - 1985
578 A Probabilistic Theory of Pattern Recognition – Devroye, Gyorfi, et al. - 1996
541 Learnability and the VapnikChervonenkis Dimension – Blumer, Ehrenfeucht, et al. - 1989
505 Bayesian Data Analysis – Gelman, Carlin, et al. - 2003
335 Convergence of Stochastic Processes – Pollard - 1984
325 Decision theoretic generalizations of the PAC model for neural net and other learning applications – Haussler - 1992
197 Neural network learning: theoretical foundations. Cambridge:Cambridge – Anthony, Bartlett - 1999
178 On the density of families of sets – Sauer - 1972
148 Real Analysis and Probability – Dudley - 2002
145 Transfer of learning by composing solutions of elemental tasks – Singh - 1992
141 Multitask Learning – Caruana - 1997
131 The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network – Bartlett - 1998
127 Classical Descriptive Set Theory – Kechris - 1994
113 Shift of bias for inductive concept-learning – Utgoff - 1986
94 Probability Measures on Metric Spaces – Parthasarathy - 1967
82 Finding structure in reinforcement learning – Thrun, Schwartz - 1995
66 Learning Internal Representations – Baxter - 1995
65 A course on empirical processes – Dudley - 1984
52 Adapting bias by gradient descent: an incremental version of delta-bar-delta – Sutton - 1992
51 Is learning the n-th thing any easier than learning the first – Thrun - 1996
49 Discovering Structure in Multiple Learning Tasks: The TC Algorithm – O’Sullivan, Thrun - 1996
48 Learning one more thing – Thrun, Mitchell - 1994
45 Learning to learn – Thrun, Pratt - 1997
39 The Use of Knowledge in Analogy and Induction – Russell - 1989
38 Discriminability-based transfer between neural networks – Pratt - 1993
30 A method for learning from hints – Abu-Mostafa - 1993
28 A bayesian/information theoretic model of learning to learn via multiple task sampling – Baxter - 1997
26 Symbolic-neural Systems and the Use of Hints for Developing Complex Systems – Suddarth, Holden - 1991
25 Rule-injection Hints as a Means of Improving Network Performance and Learning Time – Suddarth, Kergosien - 1990
19 Layered concept-learning and dynamically-variable bias management – Rendell, Seshu, et al. - 1987
18 Some history of the hierarchical Bayesian methodology – Good - 1980
16 The canonical distortion measure for vector quantization and function approximation – Baxter - 1997
15 The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness – Silver, Mercer - 1996
14 Adaptive Generalisation and the Transfer of Knowledge,” University of Exeter: R257 – Sharkey, Sharkey - 1992
13 Solving a huge number of similar tasks: a combination of multi-task learning and a hierarchical bayesian approach – Heskes - 1998
10 How to make a low-dimensional representation suitable for diverse tasks – Intrator, Edelman - 1996
9 The need for biases in learning generalisations – Mitchell - 1980
8 Distribution inequalities for the binomial law – Slud - 1977
5 Repeat learning using predicate invention – Khan, Muggleton, et al. - 1998
4 The canonical distortion measure in feature space and 1-NN classification – Baxter, Bartlett - 1998
1 Lower bounds on the VC-dimension of multi-layer threshold networks – Bartlett - 1993
1 Staged learning – Langford - 1999
1 Continual Learning in Reinforcement Environments. R. Oldenbourg Verlag – Ring - 1995
1 On a double inequality of the normal distribution – Tate - 1953