Abstract. A neural tree is a feedforward neural network with at most one edge outgoing from each node. We investigate the number of examples that a learning algorithm needs when using neural trees as hypothesis class. We give bounds for this sample complexity in terms of the VC dimension. We consider trees consisting of threshold, sigmoidal and linear gates. In particular, we show that the class of threshold trees and the class of sigmoidal trees on n inputs both have VC dimension \Omega (n log n). This bound is asymptotically tight for the class of threshold trees. We also present an upper bound for this class where the constants involved are considerably smaller than in a previous calculation. Finally, we argue that the VC dimension of threshold or sigmoidal trees cannot become larger by allowing the nodes to compute linear functions. This sheds some light on a recent result that exhibited neural networks with quadratic VC dimension. 1
|
1380
|
A theory of the learnable
– Valiant
- 1984
|
|
815
|
Multilayer feedforward networks are universal approximators
– Hornik, Stinchcombe, et al.
- 1989
|
|
544
|
Learnability and the vapnik-chervonenkis dimension
– Blumer, Ehrenfeucht, et al.
- 1989
|
|
538
|
by superpositions of a sigmoidal function
– CYBENKO
- 1989
|
|
515
|
Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
– Littlestone
- 1988
|
|
370
|
Perceptrons: An Introduction to Computational Geometry
– Minsky, Papert
- 1969
|
|
325
|
Decision theoretic generalizations of the PAC model for neural net and other learning applications
– Haussler
- 1992
|
|
294
|
What size net gives valid generalization
– Baum, Haussler
- 1989
|
|
261
|
Universal approximation bounds for superpositions of a sigmoidal function
– Barron
- 1993
|
|
198
|
Neural Network Learning: Theoretical Foundations
– Anthony, Bartlett
- 1999
|
|
197
|
On the Complexity of Timetable and Multicommodity Flow Problems
– Even, Itai, et al.
- 1976
|
|
191
|
Training a 3-node Neural Network is NP-complete”, Neural Networks
– Blum, Rivest
- 1992
|
|
155
|
Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition
– Cover
- 1965
|
|
136
|
Approximation capabilities of multilayer feedforward networks
– HORNIK
- 1991
|
|
113
|
EndreTarjan: A linear-time algorithm for testing the truth of certain quantified boolean formulas
– Aspval, Plass, et al.
|
|
103
|
Learning read-once formulas with queries
– Angluin, Hellerstein, et al.
- 1989
|
|
94
|
Bounding the vapnik-chervonenkis dimension of concept classes parameterized by real numbers
– Goldberg, Jerrum
- 1995
|
|
93
|
Multilayer Feedforward Networks with Nonpolynomial Activation Function can Approximate any Function
– Leshno, Ya-Lin, et al.
- 1993
|
|
87
|
Networks of spiking neurons: the third generation of neural network models
– Maass
- 1997
|
|
80
|
Time structure of the activity in neural network models. Phys Rev E 51: 738–758
– Gerstner
- 1995
|
|
63
|
Reckhow, Time bounded random access machines
– Cook, A
- 1973
|
|
53
|
Lower bounds for approximation by nonlinear manifolds
– Warren
- 1968
|
|
50
|
Neural networks with quadratic VC dimension
– Koiran, Sontag
- 1997
|
|
45
|
Polynomial bounds for VC dimension of sigmoidal and general pfaffian neural networks
– Karpinski, Macintyre
- 1997
|
|
41
|
What size net gives valid generalization? Neural Computation
– Baum, Haussler
- 1989
|
|
41
|
Lower bound methods and separation results for online learning .models
– Maass, Turan
- 1992
|
|
41
|
Neural networks for optimal approximation of smooth and analytic functions
– Mhaskar
- 1996
|
|
35
|
Computational Learning Theory. Cambridge Tracts
– Anthony, Biggs
- 1992
|
|
34
|
The power of approximating: A comparison of activation functions
– DasGupta, Schnitger
- 1993
|
|
32
|
Vapnik-Chervonenkis dimension of neural nets
– Bartlett, Maass
- 2003
|
|
31
|
Neural nets with superlinear VC-dimension
– Maass
- 1994
|
|
29
|
Capacity problems for linear machines
– Cover
- 1968
|
|
29
|
Analogue Neural VLSI: A Pulse Stream Approach
– Murray, Tarassenko
- 1994
|
|
26
|
Theorie der vielfachen Kontinuität
– Schläfli
- 1857
|
|
23
|
On the complexity of training Perceptrons
– Amaldi
- 1991
|
|
22
|
A growth algorithm for neural network decision trees
– Golea, Marchand
- 1990
|
|
20
|
Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results
– Scarselli, Tosi
- 1998
|
|
17
|
Approximation by ridge functions and neural networks with one hidden layer
– Chui, Li
- 1992
|
|
16
|
Learning nonoverlapping perceptron networks from examples and membership queries
– Hancock, Golea, et al.
- 1994
|
|
14
|
Tighter bounds of the VC-dimension of three-layer networks
– Sakurai
- 1993
|
|
12
|
Arbitrary nonlinearity is sufficient to represent all functions by neural networks: A theorem
– Kreinovich
- 1991
|
|
12
|
On defining sets of vertices of the hypercube by linear inequalities
– Jeroslow
- 1975
|
|
11
|
On computing Boolean functions by a spiking neuron
– Schmitt
- 1998
|
|
11
|
On the complexity of learning for a spiking neuron
– Maass, Schmitt
- 1997
|
|
10
|
Noisy spiking neurons with temporal coding have more computational power than sigmoidal neurons
– Maass
- 1997
|
|
8
|
VC dimension in circuit complexity
– Koiran
- 1996
|
|
7
|
Statistical mechanics of a multilayered neural network
– Barkai, Hansel, et al.
- 1990
|
|
7
|
Neural trees: a new tool for classification. Network
– Sirat, Nadal
- 1990
|
|
6
|
Fewnomials, volume 88 of Translations of Mathematical Monographs
– Khovanskii
- 1991
|
|
5
|
On learning -Perceptron networks on the uniform distribution. Neural Networks
– Golea, Marchand, et al.
- 1996
|