Results 1 - 10
of
16
L.: The value of agreement a new boosting algorithm
- J. Comput. Syst. Sci
"... Abstract. We present a new generalization bound where the use of unlabeled examples results in a better ratio between training-set size and the resulting classifier’s quality and thus reduce the number of labeled examples necessary for achieving it. This is achieved by demanding from the algorithms ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract. We present a new generalization bound where the use of unlabeled examples results in a better ratio between training-set size and the resulting classifier’s quality and thus reduce the number of labeled examples necessary for achieving it. This is achieved by demanding from the algorithms generating the classifiers to agree on the unlabeled examples. The extent of this improvement depends on the diversity of the learners—a more diverse group of learners will result in a larger improvement whereas using two copies of a single algorithm gives no advantage at all. As a proof of concept, we apply the algorithm, named AgreementBoost, to a web classification problem where an up to 40 % reduction in the number of labeled examples is obtained. 1
On Concept Space and Hypothesis Space in Case-Based Learning Algorithms
- In N Lavrac and S Wrobel, eds, ECML-95: Proc. 8th European Conf. on Machine Learning, 1995, LNAI Volume 914
, 1995
"... . In order to learn more about the behaviour of case-based reasoners as learning systems, we formalise a simple case-based learner as a PAC learning algorithm. We show that the case-based representation hCB; oei is rich enough to express any boolean function. We define a family of simple case-based ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
. In order to learn more about the behaviour of case-based reasoners as learning systems, we formalise a simple case-based learner as a PAC learning algorithm. We show that the case-based representation hCB; oei is rich enough to express any boolean function. We define a family of simple case-based learning algorithms which use a single, fixed similarity measure and we give necessary and sufficient conditions for the consistency of these learning algorithms in terms of the chosen similarity measure. Finally, we consider the way in which these simple algorithms, when trained on target concepts from a restricted concept space, often output hypotheses which are outside the chosen concept space. A case study investigates this relationship between concept space and hypothesis space and concludes that the case-based algorithm studied is a less than optimal learning algorithm for the chosen, small, concept space. 1 Introduction The performance of a case-based reasoning system [13] will chan...
Polynomial Bounds for the VC-Dimension of Sigmoidal, Radial Basis Function, and Sigma-pi Networks
- Proc. World Congress on Neural Networks
, 1995
"... . W 2 h 2 is an asymptotic upper bound for the VC-dimension of a large class of neural networks including sigmoidal, radial basis functions, and sigma-pi networks, where h is the number of hidden units and W is the number of adjustable parameters, which extends Karpinski and Macintyre's resent r ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. W 2 h 2 is an asymptotic upper bound for the VC-dimension of a large class of neural networks including sigmoidal, radial basis functions, and sigma-pi networks, where h is the number of hidden units and W is the number of adjustable parameters, which extends Karpinski and Macintyre's resent results.* The class is characterized by polynomial input functions and activation functions that are solutions of first order differential equations with rational function coefficients and that can be represented in an implicit function form of a composition of the natural logarithm and polynomials. O(W log h) is a lower bound for the VC-dimension of sigmoidal, radial basis function, and sigma-pi networks. 1. Introduction When Vapnik and Chervonenkis developed in [13] the concept that later became called Vapnik-Chervonenkis dimension, or VC-dimension, they dared not dream that it would become as common as is now. After Blumer et al. showed in [2] that the number of samples needed to accompl...
Analysis of data with threshold decision lists
- In preparation
"... We apply techniques from probabilistic learning theory to analyse theoretically the accuracy of data classification techniques that are based on the use of threshold decision lists. 1 ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We apply techniques from probabilistic learning theory to analyse theoretically the accuracy of data classification techniques that are based on the use of threshold decision lists. 1
Inductive Bias in Case-Based Reasoning Systems
- Department of Computer Science, University of York, York
, 1995
"... In order to learn more about the behaviour of case-based reasoners as learning systems, we formalise a simple case-based learner as a PAC learning algorithm, using the case-based representation hCB; oei. We first consider a `naive' case-based learning algorithm CB1(oeH ) which learns by collecting ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In order to learn more about the behaviour of case-based reasoners as learning systems, we formalise a simple case-based learner as a PAC learning algorithm, using the case-based representation hCB; oei. We first consider a `naive' case-based learning algorithm CB1(oeH ) which learns by collecting all available cases into the case-base and which calculates similarity by counting the number of features on which two problem descriptions agree. We present results concerning the consistency of this learning algorithm and give some partial results regarding its sample complexity. We are able to characterise CB1(oeH ) as a `weak but general' learning algorithm. We then consider how the sample complexity of case-based learning can be reduced for specific classes of target concept by the application of inductive bias, or prior knowledge of the class of target concepts. Following recent work demonstrating how case-based learning can be improved by choosing a similarity measure appropriate to t...
On the Complexity of Training a Single Perceptron with Programmable Synaptic Delays
"... We consider a single perceptron N with synaptic delays which generalizes a simpli ed model for a spiking neuron where not only the time that a pulse needs to travel through a synapse is taken into account but also the input ring rates may have more dierent levels. A synchronization technique i ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider a single perceptron N with synaptic delays which generalizes a simpli ed model for a spiking neuron where not only the time that a pulse needs to travel through a synapse is taken into account but also the input ring rates may have more dierent levels. A synchronization technique is introduced so that the results concerning the learnability of spiking neurons with binary delays also apply to N with arbitrary delays. In particular, the consistency problem for N with programmable delays and its approximation version prove to be NP-hard. It follows that the perceptrons with programmable synaptic delays are not properly PAC-learnable and the spiking neurons with arbitrary delays do not allow robust learning unless RP = NP . In addition, we show that the representation problem for N which is an issue whether an n-variable Boolean function given in DNF (or as a disjunction of O(n) threshold gates) can be computed by a spiking neuron is co-NP-hard.
Uniform Glivenko-Cantelli Theorems and Concentration of Measure in the . . .
, 2002
"... This paper surveys certain developments in the use of probabilistic techniques for the modelling of generalization in machine learning. Building on `uniform convergence' results in probability theory, a number of approaches to the problem of quantifying generalization have been developed in recen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper surveys certain developments in the use of probabilistic techniques for the modelling of generalization in machine learning. Building on `uniform convergence' results in probability theory, a number of approaches to the problem of quantifying generalization have been developed in recent years. Initially these models addressed binary classification, and as such were applicable, for example, to binary-output neural networks. More recently, analysis has been extended to apply to regression problems, and to classification problems in which the classification is achieved by using real-valued functions (in which the concept of a large margin has proven useful). In order to obtain more useful and realistic bounds, and to analyse model selection, another development has been the derivation of datadependent bounds. Here, we discuss some of the main probabilistic techniques and key results, particularly the use (and derivation of) uniform Glivenko-Cantelli theorems, and the use of concentration of measure results. Many details are omitted, the aim being to give a high-level overview of the types of approaches taken and methods used.
Pattern Recognition for Conditionally Independent Data
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this work we consider the task of relaxing the i.i.d. assumption in pattern recognition (or classification) , aiming to make existing learning algorithms applicable to a wider range of tasks. Pattern recognition is guessing a discrete label of some object based on a set of given examples (pairs ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this work we consider the task of relaxing the i.i.d. assumption in pattern recognition (or classification) , aiming to make existing learning algorithms applicable to a wider range of tasks. Pattern recognition is guessing a discrete label of some object based on a set of given examples (pairs of objects and labels). We consider the case of deterministically defined labels. Traditionally, this task is studied under the assumption that examples are independent and identically distributed. However, it turns out that many results of pattern recognition theory carry over a weaker assumption. Namely, under
Minimizing the Quadratic Training Error of a Sigmoid Neuron Is Hard
"... . We rst present a brief survey of hardness results for training feedforward neural networks. These results are then completed by the proof that the simplest architecture containing only a single neuron that applies the standard (logistic) activation function to the weighted sum of n inputs is h ..."
Abstract
- Add to MetaCart
. We rst present a brief survey of hardness results for training feedforward neural networks. These results are then completed by the proof that the simplest architecture containing only a single neuron that applies the standard (logistic) activation function to the weighted sum of n inputs is hard to train. In particular, the problem of nding the weights of such a unit that minimize the relative quadratic training error within 1 or its average (over a training set) within 13=(31n) of its inmum proves to be NP-hard. Hence, the well-known back-propagation learning algorithm appears to be not ecient even for one neuron which has negative consequences in constructive learning. 1 The Complexity of Neural Network Loading Neural networks establish an important class of learning models that are widely applied in practical applications to solving articial intelligence tasks [13]. The most prominent position among successful neural learning heuristics is occupied by the back-prop...

