Results 21 - 30
of
353
On-line algorithms in machine learning
- IN FIAT, AND WOEGINGER., EDS., ONLINE ALGORITHMS: THE STATE OF THE ART
, 1998
"... The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computation ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
(Show Context)
The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computational Learning Theory that fit nicely into the "on-line algorithms" framework. This survey article discusses some of the results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms. The emphasis in this article is on describing some of the simpler, more intuitive results, whose proofs can be given in their entirity. Pointers to the literature are given for more sophisticated versions of these algorithms.
A Polynomial-time Algorithm for Learning Noisy Linear Threshold Functions
, 1996
"... In this paper we consider the problem of learning a linear threshold function (a halfspace in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and s ..."
Abstract
-
Cited by 73 (11 self)
- Add to MetaCart
In this paper we consider the problem of learning a linear threshold function (a halfspace in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and solved in polynomial time with the Ellipsoid Algorithm or Interior Point methods. Alternatively, simple greedy algorithms such as the Perceptron Algorithm are often used in practice and have certain provable noise-tolerance properties; but, their running time depends on a separation parameter, which quanties the amount of "wiggle room" available for a solution, and can be exponential in the description length of the input. In this paper, we show how simple greedy methods can be used to nd weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter. Suitably combining these hypotheses results in a polynomial-time algorithm for learning linear threshold functions in the PAC model in the presence of random classification noise. (Also, a polynomial-time algorithm for learning linear threshold functions in the Statistical Query model of Kearns.) Our algorithm is based on a new method for removing outliers in data. Specifically, for any set S of points in R n , each given to b bits of precision, we show that one can remove only a small fraction of S so that in the remaining set T , for every vector v, max x2T (v x) 2 poly(n; b)E x2T (v x) 2 ; i.e., for any hyperplane through the origin, the maximum distance (squared) from a point in T to the plane is at most polynomially larger than the average. After removing these outliers, we are able to show that a modified v...
A reliable effective terascale linear learning system
, 2011
"... We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features,1 billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are n ..."
Abstract
-
Cited by 72 (6 self)
- Add to MetaCart
(Show Context)
We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features,1 billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature.2 We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices.
Residual splash for optimally parallelizing belief propagation
- In In Artificial Intelligence and Statistics (AISTATS
, 2009
"... As computer architectures move towards multicore we must build a theoretical understanding of parallelism in machine learning. In this paper we focus on parallel inference in graphical models. We demonstrate that the natural, fully synchronous parallelization of belief propagation is highly ineffici ..."
Abstract
-
Cited by 68 (8 self)
- Add to MetaCart
(Show Context)
As computer architectures move towards multicore we must build a theoretical understanding of parallelism in machine learning. In this paper we focus on parallel inference in graphical models. We demonstrate that the natural, fully synchronous parallelization of belief propagation is highly inefficient. By bounding the achievable parallel performance in chain graphical models we develop a theoretical understanding of the parallel limitations of belief propagation. We then provide a new parallel belief propagation algorithm which achieves optimal performance. Using two challenging real-world tasks, we empirically evaluate the performance of our algorithm on large cyclic graphical models where we achieve near linear parallel scaling and out perform alternative algorithms. 1
A Simple and Practical Algorithm for Differentially Private Data Release
"... We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simp ..."
Abstract
-
Cited by 62 (2 self)
- Add to MetaCart
(Show Context)
We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1.
On the Fourier Spectrum of Monotone Functions
, 1996
"... In this paper, monotone Boolean functions are studied using harmonic analysis on the cube. ..."
Abstract
-
Cited by 60 (0 self)
- Add to MetaCart
In this paper, monotone Boolean functions are studied using harmonic analysis on the cube.
Vox populi: Collecting high-quality labels from a crowd
- In Proceedings of the 22nd Annual Conference on Learning Theory
, 2009
"... With the emergence of search engines and crowdsourcing websites, machine learning practitioners are faced with datasets that are labeled by a large heterogeneous set of teachers. These datasets test the limits of our existing learning theory, which largely assumes that data is sampled i.i.d. from a ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
With the emergence of search engines and crowdsourcing websites, machine learning practitioners are faced with datasets that are labeled by a large heterogeneous set of teachers. These datasets test the limits of our existing learning theory, which largely assumes that data is sampled i.i.d. from a fixed distribution. In many cases, the number of teachers actually scales with the number of examples, with each teacher providing just a handful of labels, precluding any statistically reliable assessment of an individual teacher’s quality. In this paper, we study the problem of pruning low-quality teachers in a crowd, in order to improve the label quality of our training set. Despite the hurdles mentioned above, we show that this is in fact achievable with a simple and efficient algorithm, which does not require that each example be repeatedly labeled by multiple teachers. We provide a theoretical analysis of our algorithm and back our findings with empirical evidence. 1
Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression
- Proceedings of the Twentieth International Conference on Machine Learning (ICML
, 2003
"... The problem of learning with positive and unlabeled examples arises frequently in retrieval applications. ..."
Abstract
-
Cited by 57 (8 self)
- Add to MetaCart
(Show Context)
The problem of learning with positive and unlabeled examples arises frequently in retrieval applications.
PAC Learning from Positive Statistical Queries
- Proc. 9th International Conference on Algorithmic Learning Theory - ALT ’98
, 1998
"... . Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
(Show Context)
. Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, extra-information about the underlying distribution must be provided to the learner. We define a PAC learning model from positive and unlabeled examples. We also define a PAC learning model from positive and unlabeled statistical queries. Relations with PAC model ([Val84]), statistical query model ([Kea93]) and constantpartition classification noise model ([Dec97]) are studied. We show that k-DNF and k-decision lists are learnable in both models, i.e. with far less information than it is assumed in previously used algorithms. 1 Introduction The PAC learning model of Valiant ([Val84]) has become the reference model in computational learning theory. However, in spite of the importance of lea...
Privately releasing conjunctions and the statistical query barrier
- In STOC
, 2011
"... ar ..."
(Show Context)