• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Efficient noise-tolerant learning from statistical queries (1998)

by M J Kearns
Venue:J. ACM
Add To MetaCart

Tools

Sorted by:
Results 21 - 30 of 353
Next 10 →

On-line algorithms in machine learning

by Avrim Blum - IN FIAT, AND WOEGINGER., EDS., ONLINE ALGORITHMS: THE STATE OF THE ART , 1998
"... The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computation ..."
Abstract - Cited by 75 (2 self) - Add to MetaCart
The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computational Learning Theory that fit nicely into the "on-line algorithms" framework. This survey article discusses some of the results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms. The emphasis in this article is on describing some of the simpler, more intuitive results, whose proofs can be given in their entirity. Pointers to the literature are given for more sophisticated versions of these algorithms.
(Show Context)

Citation Context

...nomial p after seeing polynomially many examples. Does this imply that there must exist a polynomial time algorithm B that succeeds in the same sense for all constant noise rates j ! 1=2. (See Kearns =-=[21]-=- for related issues.) 7. What Competitive Ratio can be achieved for learning with respect to the best Disjunction? Is there a polynomial time algorithm that given any sequence of examples over f0; 1g ...

A Polynomial-time Algorithm for Learning Noisy Linear Threshold Functions

by Avrim Blum , Alan Frieze, Ravi Kannan, Santosh Vempala , 1996
"... In this paper we consider the problem of learning a linear threshold function (a halfspace in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and s ..."
Abstract - Cited by 73 (11 self) - Add to MetaCart
In this paper we consider the problem of learning a linear threshold function (a halfspace in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and solved in polynomial time with the Ellipsoid Algorithm or Interior Point methods. Alternatively, simple greedy algorithms such as the Perceptron Algorithm are often used in practice and have certain provable noise-tolerance properties; but, their running time depends on a separation parameter, which quanties the amount of "wiggle room" available for a solution, and can be exponential in the description length of the input. In this paper, we show how simple greedy methods can be used to nd weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter. Suitably combining these hypotheses results in a polynomial-time algorithm for learning linear threshold functions in the PAC model in the presence of random classification noise. (Also, a polynomial-time algorithm for learning linear threshold functions in the Statistical Query model of Kearns.) Our algorithm is based on a new method for removing outliers in data. Specifically, for any set S of points in R n , each given to b bits of precision, we show that one can remove only a small fraction of S so that in the remaining set T , for every vector v, max x2T (v x) 2 poly(n; b)E x2T (v x) 2 ; i.e., for any hyperplane through the origin, the maximum distance (squared) from a point in T to the plane is at most polynomially larger than the average. After removing these outliers, we are able to show that a modified v...

A reliable effective terascale linear learning system

by Alekh Agarwal, Olivier Chapelle, John Langford, Corinna Cortes , 2011
"... We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features,1 billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are n ..."
Abstract - Cited by 72 (6 self) - Add to MetaCart
We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features,1 billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature.2 We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices.
(Show Context)

Citation Context

...ient-based optimization algorithms such as gradient descent or L-BFGS—gradients are accumulated locally, and the global gradient is obtained by AllReduce. In general, any statistical query algorithm (=-=Kearns, 1993-=-) can be parallelized with AllReduce with only a handful of additional lines of code. This approach also easily implements averaging parameters of online learning algorithms. An implementation of AllR...

Residual splash for optimally parallelizing belief propagation

by Joseph E. Gonzalez, Yucheng Low, Carlos Guestrin - In In Artificial Intelligence and Statistics (AISTATS , 2009
"... As computer architectures move towards multicore we must build a theoretical understanding of parallelism in machine learning. In this paper we focus on parallel inference in graphical models. We demonstrate that the natural, fully synchronous parallelization of belief propagation is highly ineffici ..."
Abstract - Cited by 68 (8 self) - Add to MetaCart
As computer architectures move towards multicore we must build a theoretical understanding of parallelism in machine learning. In this paper we focus on parallel inference in graphical models. We demonstrate that the natural, fully synchronous parallelization of belief propagation is highly inefficient. By bounding the achievable parallel performance in chain graphical models we develop a theoretical understanding of the parallel limitations of belief propagation. We then provide a new parallel belief propagation algorithm which achieves optimal performance. Using two challenging real-world tasks, we empirically evaluate the performance of our algorithm on large cyclic graphical models where we achieve near linear parallel scaling and out perform alternative algorithms. 1
(Show Context)

Citation Context

...erence, which is typically necessary for large complex models. Finally, work by [Chu et al., 2006] provides more general insight into the parallelism afforded by the Statistical Query Model (SQM) of [=-=Kearns, 1998-=-]. However, the SQM is already embarrassingly parallel, i.e. having completely independent computational components, and does not efficiently represent many challenging machine learning tasks. We will...

A Simple and Practical Algorithm for Differentially Private Data Release

by Moritz Hardt, Katrina Ligett, Frank Mcsherry
"... We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simp ..."
Abstract - Cited by 62 (2 self) - Add to MetaCart
We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1.
(Show Context)

Citation Context

...and easy-to-implement algorithm, capable of substantially improving the performance of linear queries on many realistic datasets. Linear queries are equivalent to statistical queries (in the sense of =-=[6]-=-) and can serve as the basis of a wide range of data analysis and learning algorithms (see [7] for some examples). Our algorithm is a combination of the Multiplicative Weights approach of [8, 9], main...

On the Fourier Spectrum of Monotone Functions

by Nader H. Bshouty, Christino Tamon , 1996
"... In this paper, monotone Boolean functions are studied using harmonic analysis on the cube. ..."
Abstract - Cited by 60 (0 self) - Add to MetaCart
In this paper, monotone Boolean functions are studied using harmonic analysis on the cube.

Vox populi: Collecting high-quality labels from a crowd

by Ofer Dekel, Ohad Shamir - In Proceedings of the 22nd Annual Conference on Learning Theory , 2009
"... With the emergence of search engines and crowdsourcing websites, machine learning practitioners are faced with datasets that are labeled by a large heterogeneous set of teachers. These datasets test the limits of our existing learning theory, which largely assumes that data is sampled i.i.d. from a ..."
Abstract - Cited by 58 (1 self) - Add to MetaCart
With the emergence of search engines and crowdsourcing websites, machine learning practitioners are faced with datasets that are labeled by a large heterogeneous set of teachers. These datasets test the limits of our existing learning theory, which largely assumes that data is sampled i.i.d. from a fixed distribution. In many cases, the number of teachers actually scales with the number of examples, with each teacher providing just a handful of labels, precluding any statistically reliable assessment of an individual teacher’s quality. In this paper, we study the problem of pruning low-quality teachers in a crowd, in order to improve the label quality of our training set. Despite the hurdles mentioned above, we show that this is in fact achievable with a simple and efficient algorithm, which does not require that each example be repeatedly labeled by multiple teachers. We provide a theoretical analysis of our algorithm and back our findings with empirical evidence. 1

Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression

by Wee Sun Lee, Bing Liu - Proceedings of the Twentieth International Conference on Machine Learning (ICML , 2003
"... The problem of learning with positive and unlabeled examples arises frequently in retrieval applications. ..."
Abstract - Cited by 57 (8 self) - Add to MetaCart
The problem of learning with positive and unlabeled examples arises frequently in retrieval applications.
(Show Context)

Citation Context

...ples was done in (Denis, 1998). Using the model where a positive example is left unlabeled with constant probability, it was shown that function classes learnable under the statistical queries model (=-=Kearns, 1998-=-) is also learnable from positive and unlabeled examples. Learning from positive example was also studied theoretically in (Muggleton, 2001) within a Bayesian framework where the distribution of funct...

PAC Learning from Positive Statistical Queries

by François Denis, Bat M, Universit'e De Lille I - Proc. 9th International Conference on Algorithmic Learning Theory - ALT ’98 , 1998
"... . Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, ..."
Abstract - Cited by 55 (3 self) - Add to MetaCart
. Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, extra-information about the underlying distribution must be provided to the learner. We define a PAC learning model from positive and unlabeled examples. We also define a PAC learning model from positive and unlabeled statistical queries. Relations with PAC model ([Val84]), statistical query model ([Kea93]) and constantpartition classification noise model ([Dec97]) are studied. We show that k-DNF and k-decision lists are learnable in both models, i.e. with far less information than it is assumed in previously used algorithms. 1 Introduction The PAC learning model of Valiant ([Val84]) has become the reference model in computational learning theory. However, in spite of the importance of lea...
(Show Context)

Citation Context

... learning model from positive and unlabeled examples. We also define a PAC learning model from positive and unlabeled statistical queries. Relations with PAC model ([Val84]), statistical query model (=-=[Kea93]-=-) and constantpartition classification noise model ([Dec97]) are studied. We show that k-DNF and k-decision lists are learnable in both models, i.e. with far less information than it is assumed in pre...

Privately releasing conjunctions and the statistical query barrier

by Anupam Gupta, Moritz Hardt, Aaron Roth, Jonathan Ullman - In STOC , 2011
"... ar ..."
Abstract - Cited by 50 (16 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...stical queries necessary and sufficient for this task is, up to a factor of O(d), equal to the agnostic learning complexity of C (over arbitrary distributions) in Kearns’ statistical query (SQ) model =-=[Kea98]-=-. Using an SQ lower bound for agnostically learning monotone conjunctions shown by Feldman [Fel10], this connection implies that no polynomial-time algorithm operating in the SQ-model can release even...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University