A major problem in machine learning is that of inductive bias: how to choose a learner’s hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonably-sized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Explicit bounds are also derived demonstrating that learning multiple tasks within an environment of related tasks can potentially give much better generalization than learning a single task. 1.
|
1376
|
A theory of the learnable
– Valiant
- 1984
|
|
641
|
Estimation of Dependences Based on Empirical Data
– Vapnik
- 1982
|
|
636
|
Statistical Decision Theory and Bayesian Analysis
– Berger
- 1985
|
|
578
|
A Probabilistic Theory of Pattern Recognition
– Devroye, Gyorfi, et al.
- 1996
|
|
541
|
Learnability and the VapnikChervonenkis Dimension
– Blumer, Ehrenfeucht, et al.
- 1989
|
|
505
|
Bayesian Data Analysis
– Gelman, Carlin, et al.
- 2003
|
|
335
|
Convergence of Stochastic Processes
– Pollard
- 1984
|
|
325
|
Decision theoretic generalizations of the PAC model for neural net and other learning applications
– Haussler
- 1992
|
|
197
|
Neural network learning: theoretical foundations. Cambridge:Cambridge
– Anthony, Bartlett
- 1999
|
|
178
|
On the density of families of sets
– Sauer
- 1972
|
|
148
|
Real Analysis and Probability
– Dudley
- 2002
|
|
145
|
Transfer of learning by composing solutions of elemental tasks
– Singh
- 1992
|
|
141
|
Multitask Learning
– Caruana
- 1997
|
|
131
|
The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network
– Bartlett
- 1998
|
|
127
|
Classical Descriptive Set Theory
– Kechris
- 1994
|
|
113
|
Shift of bias for inductive concept-learning
– Utgoff
- 1986
|
|
94
|
Probability Measures on Metric Spaces
– Parthasarathy
- 1967
|
|
82
|
Finding structure in reinforcement learning
– Thrun, Schwartz
- 1995
|
|
66
|
Learning Internal Representations
– Baxter
- 1995
|
|
65
|
A course on empirical processes
– Dudley
- 1984
|
|
52
|
Adapting bias by gradient descent: an incremental version of delta-bar-delta
– Sutton
- 1992
|
|
51
|
Is learning the n-th thing any easier than learning the first
– Thrun
- 1996
|
|
49
|
Discovering Structure in Multiple Learning Tasks: The TC Algorithm
– O’Sullivan, Thrun
- 1996
|
|
48
|
Learning one more thing
– Thrun, Mitchell
- 1994
|
|
45
|
Learning to learn
– Thrun, Pratt
- 1997
|
|
39
|
The Use of Knowledge in Analogy and Induction
– Russell
- 1989
|
|
38
|
Discriminability-based transfer between neural networks
– Pratt
- 1993
|
|
30
|
A method for learning from hints
– Abu-Mostafa
- 1993
|
|
28
|
A bayesian/information theoretic model of learning to learn via multiple task sampling
– Baxter
- 1997
|
|
26
|
Symbolic-neural Systems and the Use of Hints for Developing Complex Systems
– Suddarth, Holden
- 1991
|
|
25
|
Rule-injection Hints as a Means of Improving Network Performance and Learning Time
– Suddarth, Kergosien
- 1990
|
|
19
|
Layered concept-learning and dynamically-variable bias management
– Rendell, Seshu, et al.
- 1987
|
|
18
|
Some history of the hierarchical Bayesian methodology
– Good
- 1980
|
|
16
|
The canonical distortion measure for vector quantization and function approximation
– Baxter
- 1997
|
|
15
|
The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness
– Silver, Mercer
- 1996
|
|
14
|
Adaptive Generalisation and the Transfer of Knowledge,” University of Exeter: R257
– Sharkey, Sharkey
- 1992
|
|
13
|
Solving a huge number of similar tasks: a combination of multi-task learning and a hierarchical bayesian approach
– Heskes
- 1998
|
|
10
|
How to make a low-dimensional representation suitable for diverse tasks
– Intrator, Edelman
- 1996
|
|
9
|
The need for biases in learning generalisations
– Mitchell
- 1980
|
|
8
|
Distribution inequalities for the binomial law
– Slud
- 1977
|
|
5
|
Repeat learning using predicate invention
– Khan, Muggleton, et al.
- 1998
|
|
4
|
The canonical distortion measure in feature space and 1-NN classification
– Baxter, Bartlett
- 1998
|
|
1
|
Lower bounds on the VC-dimension of multi-layer threshold networks
– Bartlett
- 1993
|
|
1
|
Staged learning
– Langford
- 1999
|
|
1
|
Continual Learning in Reinforcement Environments. R. Oldenbourg Verlag
– Ring
- 1995
|
|
1
|
On a double inequality of the normal distribution
– Tate
- 1953
|