Results 11  20
of
1,963
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 589 (54 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Improving generalization with active learning
 Machine Learning
, 1994
"... Abstract. Active learning differs from "learning from examples " in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples ..."
Abstract

Cited by 539 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Active learning differs from &quot;learning from examples &quot; in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples. In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning called selective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers &quot;useful. &quot; We test our implementation, called an SGnetwork, on three domains and observe significant improvement in generalization.
Unsupervised Models for Named Entity Classification
 In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
, 1999
"... This paper discusses the use of unlabeled examples for the problem of named entity classification. A large number of rules is needed for coverage of the domain, suggesting that a fairly large number of labeled examples should be required to train a classifier. However, we show that the use of unlabe ..."
Abstract

Cited by 534 (4 self)
 Add to MetaCart
This paper discusses the use of unlabeled examples for the problem of named entity classification. A large number of rules is needed for coverage of the domain, suggesting that a fairly large number of labeled examples should be required to train a classifier. However, we show that the use of unlabeled data can reduce the requirements for supervision to just 7 simple “seed ” rules. The approach gains leverage from natural redundancy in the data: for many namedentity instances both the spelling of the name and the context in which it appears are sufficient to determine its type. We present two algorithms. The first method uses a similar algorithm to that of (Yarowsky 95), with modifications motivated by (Blum and Mitchell 98). The second algorithm extends ideas from boosting algorithms, designed for supervised learning tasks, to the framework suggested by (Blum and Mitchell 98). 1
Property Testing and its connection to Learning and Approximation
"... We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the fun ..."
Abstract

Cited by 498 (68 self)
 Add to MetaCart
We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the function on instances of its choice. First, we establish some connections between property testing and problems in learning theory. Next, we focus on testing graph properties, and devise algorithms to test whether a graph has properties such as being kcolorable or having a aeclique (clique of density ae w.r.t the vertex set). Our graph property testing algorithms are probabilistic and make assertions which are correct with high probability, utilizing only poly(1=ffl) edgequeries into the graph, where ffl is the distance parameter. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph which corre...
Knowledge Discovery in Databases: an Overview
, 1992
"... this article. 07384602/92/$4.00 1992 AAAI 58 AI MAGAZINE for the 1990s (Silberschatz, Stonebraker, and Ullman 1990) ..."
Abstract

Cited by 470 (3 self)
 Add to MetaCart
this article. 07384602/92/$4.00 1992 AAAI 58 AI MAGAZINE for the 1990s (Silberschatz, Stonebraker, and Ullman 1990)
Selective sampling using the Query by Committee algorithm
 Machine Learning
, 1997
"... We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the numbe ..."
Abstract

Cited by 429 (7 self)
 Add to MetaCart
We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons.
Connectionist Learning Procedures
 ARTIFICIAL INTELLIGENCE
, 1989
"... A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way ..."
Abstract

Cited by 408 (8 self)
 Add to MetaCart
A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradientdescent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradientdescent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 376 (72 self)
 Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 366 (38 self)
 Add to MetaCart
(Show Context)
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Efficient noisetolerant learning from statistical queries
 JOURNAL OF THE ACM
, 1998
"... In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from stat ..."
Abstract

Cited by 357 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noisetolerant algorithm is known.