Results 11 - 20
of
27
On the Complexity of Learning for Spiking Neurons with Temporal Coding
, 1999
"... Spiking neurons axe models for the computational units in biological neural systems where information is considered to be encoded mainly in the temporal patterns of their activity. In a network of spiking neurons a new set of paxameters becomes relevant which has no counterpaxt in traditional neu ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Spiking neurons axe models for the computational units in biological neural systems where information is considered to be encoded mainly in the temporal patterns of their activity. In a network of spiking neurons a new set of paxameters becomes relevant which has no counterpaxt in traditional neural network models: the time that a pulse needs to travel through a connection between two neurons (also known as delay of a connection). It is known that these delays axe tuned in biological neural systems through a vaxiety of mechanisms. In this
On Efficient Agnostic Learning of Linear Combinations of Basis Functions
- In Proceedings of the Eighth Annual Conference on Computational Learning Theory
, 1995
"... We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostica ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostically learnable if and only if the class of basis functions is efficiently agnostically learnable. We also show that the sample complexity for learning the linear combinations grows polynomially if and only if a combinatorial property of the class of basis functions, called the fat-shattering function, grows at most polynomially. We also relate the problem to agnostic learning of f0; 1g-valued function classes by showing that if a class of f0; 1g-valued functions is efficiently agnostically learnable (using the same function class) with the discrete loss function, then the class of linear combinations of functions from the class is efficiently agnostically learnable with the quadratic loss fun...
On the complexity of learning on feedforward neural nets
- in Proc. EATCS Advanced School on Computational Learning and Cryptography, Vietri sul Mare
, 1993
"... This paper discusses within the framework of computational learning theory the current state of knowledge and some open problems in three areas of research about learning on feedforward neural nets:-- Neural nets that learn from mistakes-- Bounds for the Vapnik-Chervonenkis dimension of neural nets- ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper discusses within the framework of computational learning theory the current state of knowledge and some open problems in three areas of research about learning on feedforward neural nets:-- Neural nets that learn from mistakes-- Bounds for the Vapnik-Chervonenkis dimension of neural nets-- Agnostic PAC-learning of functions on neural nets. All relevant definitions are given in this paper, and no previous knowledge about computational learning theory or neural nets is required. We refer to [RSO] for further introductory material and survey papers about the complexity of learning on neural nets. Throughout this paper we consider the following rather general notion of a (feedforward) neural net. Definition 1.1 A network architecture (or "neural net") N is a labeled acyclic directed graph. Its nodes of fan-in 0 ( " input nodes"), as well as its nodes of fan-out 0 ( " output nodes") are labeled by natural numbers. A node g in N with fan-in r? 0 is called a computation node (or gate), and it is labeled by some activation function fl g: R! R, some polynomial Q g (y 1; : : : ; y r), and a subset P g of the coefficients of this polynomial (if P g is not separately specified we assume that P g consists of all coefficients of Q g). One says that N is of order v if all polynomials Q g in N are of degree v. The coefficients in the sets P g for the gates g in N are called the programmable parameters of N. Assume that N has w programmable parameters, that some numbering of these has been fixed, and that values for all non-programmable parameters have been assigned. Furthermore assume that N has d input nodes and l output nodes. Then each assignment ff 2 R w of reals to the programmable parameters in N defines an analog circuit N ff, which computes a function x 7! N ff
Minimizing Disagreement for Geometric Regions Using Dynamic Programming, with Applications to Machine Learning and Computer Graphics
, 1996
"... We demonstrate that the dynamic programming paradigm is an effective tool in the design of efficient algorithms for solving minimumdisagreement problems for convex polygons, star-shaped polygons, unions of axis-parallel boxes and various other classes of geometric regions. In particular, we show tha ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We demonstrate that the dynamic programming paradigm is an effective tool in the design of efficient algorithms for solving minimumdisagreement problems for convex polygons, star-shaped polygons, unions of axis-parallel boxes and various other classes of geometric regions. In particular, we show that the minimizing disagreement problem for convex k-gons on a sample of size n can be solved in O(n 6 k) time. Together with earlier known results, we obtain algorithms for learning these geometric regions in the agnostic PAC learning model and the PAC model with random classification noise. Furthermore, these algorithms also allow us to track slowly drifting concept from these geometric regions. Most of these algorithms can be naturally adapted to solve related discrepancy problems that have applications in image compression, geometrical clustering and numerical integration. 1 Introduction 1.1 The Minimum Disagreements Problem For a collection S of n points in R d , each point being l...
On the Generalisation of Soft Margin Algorithms
- IEEE Transactions on Information Theory
, 2000
"... Generalisation bounds depending on the margin of a classier are a relatively recent development. They provide an explanation of the performance of state-of-the-art learning systems such as Support Vector Machines (SVM) [12] and Adaboost [24]. The diculty with these bounds has been either their lack ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Generalisation bounds depending on the margin of a classier are a relatively recent development. They provide an explanation of the performance of state-of-the-art learning systems such as Support Vector Machines (SVM) [12] and Adaboost [24]. The diculty with these bounds has been either their lack of robustness or their looseness. The question of whether the generalisation of a classier can be more tightly bounded in terms of a robust measure of the distribution of margin values has remained open for some time. The paper answers this open question in the armative and furthermore the analysis leads to bounds that motivate the previously heuristic soft margin SVM algorithms as well as justifying the use of the quadratic loss in neural network training algorithms. The results are extended to give bounds for the probability of failing to achieve a target accuracy in regression prediction, with a statistical analysis of Ridge Regression and Gaussian Processes as a special case. The analysis presented in the paper has also lead to new boosting algorithms described elsewhere [7].
C-Net: A Method for Generating Non-deterministic and Dynamic Multi-variate Decision Trees
, 2001
"... Despite the fact that artificial neural networks (ANNs) are universal function approximators, their black box nature (that is, their lack of direct interpretability or expressive power) limits their utility. In contrast, univariate decision trees (UDTs) have expressive power, usually though they are ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Despite the fact that artificial neural networks (ANNs) are universal function approximators, their black box nature (that is, their lack of direct interpretability or expressive power) limits their utility. In contrast, univariate decision trees (UDTs) have expressive power, usually though they are not as accurate as ANNs. We propose an improvement, C-Net, for both the expressiveness of ANNs and the accuracy of UDTs by consolidating both technologies for generating multivariate decision trees (MDTs). In addition, we introduce a new concept, recurrent decision trees, where C-Net uses recurrent neural networks to generate an MDT with a recurrent feature. That is, a memory is associated with each node in the tree with a recursive condition which replaces the conventional linear one. Furthermore, we show empirically that, in our test cases, our proposed method achieves a balance of comprehensibility and accuracy intermediate between ANNs and UDTs. MDTs are found to be intermediate since they are more expressive than ANNs and, more accurate than UDTs. Moreover, in all cases MDTs are more compact (i.e. smaller tree size) than UDTs.
Relevance Determination in Learning Vector Quantization
- PROC. OF EUROPEAN SYMPOSIUM ON ARTIFICIAL NEURAL NETWORKS
, 2001
"... We propose a method to automatically determine the relevance of the input dimensions of a learning vector quantization (LVQ) architecture during training. The method is based on Hebbian learning and introduces weighting factors of the input dimensions which are automatically adapted to the specific ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We propose a method to automatically determine the relevance of the input dimensions of a learning vector quantization (LVQ) architecture during training. The method is based on Hebbian learning and introduces weighting factors of the input dimensions which are automatically adapted to the specific problem. The benefits are twofold: On the one hand, the incorporation of relevance factors in the LVQ architecture increases the overall performance of the classification and adapts the metric to the specific data used for training. On the other hand, the method induces a pruning algorithm, i.e. an automatic detection of the input dimensions which do not contribute to the overall classifier. Hence we obtain a possibly more efficient classification and we gain insight to the role of the data dimensions.
A robust boosting algorithm
- Proc. 17th European Conf. on Machine Learning
, 2002
"... We describe a new Boosting algorithm which combines the base hypotheses with symmetric functions. Among its properties of practical relevance, the algorithm has significant resistance against noise, and is efficient even in an agnostic learning setting. This last property is ruled out for voting-b ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We describe a new Boosting algorithm which combines the base hypotheses with symmetric functions. Among its properties of practical relevance, the algorithm has significant resistance against noise, and is efficient even in an agnostic learning setting. This last property is ruled out for voting-based Boosting algorithms like AdaBoost. Experiments carried out on thirty domains, most of which readily available, tend to display the reliability of the classifiers built.
Function-free horn clauses are hard to approximate, in
- Proc. of the 8th International Conference on Inductive Logic Programming, in: Lecture Notes in Artificial Intelligence
, 1998
"... Abstract. In this paper, we show two hardness results for approximating the best function-free Horn clause by an element of the same class. Our first result shows that for some constant k> 0, the error rate of the best k-Horn clause cannot be approximated in polynomial time to within any constant fa ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. In this paper, we show two hardness results for approximating the best function-free Horn clause by an element of the same class. Our first result shows that for some constant k> 0, the error rate of the best k-Horn clause cannot be approximated in polynomial time to within any constant factor by an element of the same class. Our second result is much stronger. Under some frequently encountered complexity hypothesis, we show that if we replace the constant number of Horn clauses by a small, poly-logarithmic number, the constant factor blows up exponentially to a quasi-polynomial factor n l°gk '~, where n is the number of predicates of the problem, a measure of its complexity. Our main result links the difficulty of error approximation with the number of clauses allowed. We finally give an outline of the incidence of our result on systems that learn using ILP (Inductive Logic Programming) formalism. 1 Introduction and
The VC-Dimension of Subclasses of Pattern Languages
- 10th International Conference, ALT ’99
, 1999
"... . This paper derives the Vapnik Chervonenkis dimension of several natural subclasses of pattern languages. For classes with unbounded VC-dimension, an attempt is made to quantify the "rate of growth" of VC-dimension for these classes. This is achieved by computing, for each n, size of the "small ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
. This paper derives the Vapnik Chervonenkis dimension of several natural subclasses of pattern languages. For classes with unbounded VC-dimension, an attempt is made to quantify the "rate of growth" of VC-dimension for these classes. This is achieved by computing, for each n, size of the "smallest" witness set of n elements that is shattered by the class. The paper considers both erasing (empty substitutions allowed) and nonerasing (empty substitutions not allowed) pattern languages. For erasing pattern languages, optimal bounds for this size --- within polynomial order --- are derived for the case of 1 variable occurrence and unary alphabet, for the case where the number of variable occurrences is bounded by a constant, and the general case of all pattern languages. The extent to which these results hold for nonerasing pattern languages is also investigated. Some results that shed light on efficient learning of subclasses of pattern languages are also given. 1 Introduct...

