#### DMCA

## Efficient noise-tolerant learning from statistical queries (1998)

### Cached

### Download Links

- [www.cs.iastate.edu]
- [www.mpi-inf.mpg.de]
- [www.cis.upenn.edu]
- [classes.cec.wustl.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | JOURNAL OF THE ACM |

Citations: | 350 - 5 self |

### Citations

1960 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...ficient Noise-Tolerant Learning From Statistical Queries Michael Kearns AT&T Bell Laboratories Murray Hill, New Jersey 1 Introduction In this paper, we study the extension of Valiant's learning model =-=[25]-=- in which the positive or negative classification label provided with each random example may be corrupted by random noise. This extension was first examined in the learning theory literature by Anglu... |

1132 |
On the uniform convergence of relative frequencies of events to their probabilities
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ... 2 d possible binary labelings of the points in S, there is a function in F that agrees with that labeling. The Vapnik-Chervonenkis dimensionsof F is the cardinality of the largest set shattered by F =-=[27]-=-. 5 Simulating Statistical Queries Using Noisy Examples Our first theorem formalizes the intuition given above that learning from statistical queries implies learning in the noisefree Valiant model. T... |

713 | Learnability and the Vapnik-Chervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...re centers on the tradeoff between the number of statistical queries that must be made, and the required accuracy of these queries. For instance, translation of Valiant model sample size lower bounds =-=[3, 4]-=- into the statistical query model leaves open the possibility that some classes might be learned with just a single statistical query of sufficiently small allowed approximation error. Here we dismiss... |

684 | An introduction to computational learning theory - Kearns, Vazirani - 1994 |

665 |
Perceptrons: An Introduction to Computational Geometry
- Minsky, Papert
- 1969
(Show Context)
Citation Context ... is the uniform distribution on the unit sphere (or any other radially symmetric distribution). Despite the voluminous literature on learning perceptrons in general (see the work of Minsky and Papert =-=[17]-=- for a partial bibliography) and with respect to this distribution in particular [23, 2, 6], no efficient noise-tolerant learning algorithm has been given previously. Here we give a very simple and ef... |

424 | Learning decision lists
- Rivest
- 1987
(Show Context)
Citation Context ...ill see a somewhat detailed example of this approach momentarily. A partial list of the efficient algorithms employing some version of this approach is: Rivest's algorithm for learning decision lists =-=[19]-=-; Haussler's algorithm for learning boolean conjunctions with few relevant variables [8]; the algorithm of Blumer et al. for learning a union of axis-aligned rectangles in the Euclidean plane; and the... |

324 |
Constant depth circuits, Fourier transform and learnability
- Linial, Mansour, et al.
- 1993
(Show Context)
Citation Context ...ing n-dimensional axis-aligned rectangles with noise; learning AC 0 with noise with respect to the uniform distribution in time O(n poly(log n )) (for which the algorithm of Linial, Mansour and Nisan =-=[16]-=- can be shown to fall into the statistical query model without modification); and many others. The fact that practically every concept class known to be efficiently learnable in the Valiant model can ... |

254 |
Learning from noisy examples
- Angluin, Laird
- 1988
(Show Context)
Citation Context ... positive or negative classification label provided with each random example may be corrupted by random noise. This extension was first examined in the learning theory literature by Angluin and Laird =-=[1]-=-, who formalized the simplest type of white label noise and then sought algorithms tolerating the highest possible rate of noise. In addition to being the subject of a number of theoretical studies [1... |

252 |
Quantifying inductive bias: AI learning algorithms and valiant’s learning framework
- Haussler
- 1988
(Show Context)
Citation Context ...fficient algorithms employing some version of this approach is: Rivest's algorithm for learning decision lists [19]; Haussler's algorithm for learning boolean conjunctions with few relevant variables =-=[8]-=-; the algorithm of Blumer et al. for learning a union of axis-aligned rectangles in the Euclidean plane; and the algorithm of Kearns and Pitt [13] for learning pattern languages with respect to produc... |

212 |
A general lower bound on the number of examples needed for learning
- Ehrenfeucht, Haussler, et al.
- 1989
(Show Context)
Citation Context ...re centers on the tradeoff between the number of statistical queries that must be made, and the required accuracy of these queries. For instance, translation of Valiant model sample size lower bounds =-=[3, 4]-=- into the statistical query model leaves open the possibility that some classes might be learned with just a single statistical query of sufficiently small allowed approximation error. Here we dismiss... |

212 | Computational limitations on learning from examples
- Pitt, Valiant
- 1988
(Show Context)
Citation Context ...s be quite significant, as previous results have demonstrated concept classes F for which the choice of hypothesis representation can mean the difference between intractability and efficient learning =-=[18, 12]-=-. by p(1=ffl; 1=ffi; n; size(f)) and output a representation in H of a function h that with probability at least 1 \Gamma ffi satisfies error(h)sffl. This probability is taken over the random draws fr... |

211 | Efficient distribution-free learning of probabilistic concepts
- Kearns, Schapire
- 1994
(Show Context)
Citation Context ...ning boolean conjunctions that tolerates a noise rate approaching the information-theoretic barrier of 1=2. Subsequently, there have been some isolated instances of efficient noisetolerant algorithms =-=[14, 20, 22]-=-, but little work on characterizing which classes can be efficiently learned in the presence of noise, and no general transformations of Valiant model algorithms into noise-tolerant algorithms. The pr... |

181 | Learning in the presence of malicious errors
- Kearns, Li
- 1993
(Show Context)
Citation Context ...1], who formalized the simplest type of white label noise and then sought algorithms tolerating the highest possible rate of noise. In addition to being the subject of a number of theoretical studies =-=[1, 15, 24, 11]-=-, the classification noise model has become a common paradigm for experimental machine learning research. Angluin and Laird provided an algorithm for learning boolean conjunctions that tolerates a noi... |

181 | Learning boolean formulae
- Kearns, Li, et al.
- 1995
(Show Context)
Citation Context ...s be quite significant, as previous results have demonstrated concept classes F for which the choice of hypothesis representation can mean the difference between intractability and efficient learning =-=[18, 12]-=-. by p(1=ffl; 1=ffi; n; size(f)) and output a representation in H of a function h that with probability at least 1 \Gamma ffi satisfies error(h)sffl. This probability is taken over the random draws fr... |

128 | Weakly Learning DNF and Characterizing Statistical Query Learning using Fourier Analysis - Blum, Furst, et al. - 1994 |

117 |
Learning disjunctions of conjunctions
- Valiant
- 1985
(Show Context)
Citation Context ...iven to the learner; the inputs x given to the learner remain independently distributed according to D. Other models allowing corruption of the input as well as the label have been studied previously =-=[26, 11]-=-, with considerably less success in finding efficient error-tolerant algorithms. Here we will concentrate primarily on the classification noise model, although in Section 9 we will examine a more real... |

72 | A polynomial-time algorithm for learning noisy linear threashold functions, Algorithmica 22 - Blum, Frieze, et al. - 1998 |

71 | The Computational Complexity of Machine Learning
- Kearns
- 1990
(Show Context)
Citation Context ...aliant model also efficiently learnable with noise? Note that any counterexamples to such equivalences should not depend on syntactic hypothesis restrictions, but should be representation independent =-=[10]-=-. Acknowledgements Thanks to Umesh Vazirani for the early conversations from which this research grew, to Rob Schapire for many insightful comments and his help with the proof of Theorem 5, and to Jay... |

63 |
Learning integer lattices
- HELMBOLD, SLOAN, et al.
- 1992
(Show Context)
Citation Context ...the parity of some unknown subset of the boolean variablessx1 ; : : : ; xn ), which is known to be efficiently learnable in the Valiant model via the solution of a system of linear equations modulo 2 =-=[9]-=-, is not efficiently learnable from statistical queries. The fact that the separation of the two models comes via this class is of particular interest, since the parity class has no known efficient no... |

59 | The Design and Analysis of Efficient Learning Algorithms
- Schapire
- 1992
(Show Context)
Citation Context ...ning boolean conjunctions that tolerates a noise rate approaching the information-theoretic barrier of 1=2. Subsequently, there have been some isolated instances of efficient noisetolerant algorithms =-=[14, 20, 22]-=-, but little work on characterizing which classes can be efficiently learned in the presence of noise, and no general transformations of Valiant model algorithms into noise-tolerant algorithms. The pr... |

55 | Learning from good and bad data - Laird - 1988 |

50 |
Statistical mechanics of learning from examples
- Seung, Sompolinsky, et al.
- 1992
(Show Context)
Citation Context ...ribution). Despite the voluminous literature on learning perceptrons in general (see the work of Minsky and Papert [17] for a partial bibliography) and with respect to this distribution in particular =-=[23, 2, 6]-=-, no efficient noise-tolerant learning algorithm has been given previously. Here we give a very simple and efficient algorithm for learning from statistical queries (and thus an algorithm tolerating n... |

48 |
Types of noise in data for concept learning
- Sloan
- 1988
(Show Context)
Citation Context ...1], who formalized the simplest type of white label noise and then sought algorithms tolerating the highest possible rate of noise. In addition to being the subject of a number of theoretical studies =-=[1, 15, 24, 11]-=-, the classification noise model has become a common paradigm for experimental machine learning research. Angluin and Laird provided an algorithm for learning boolean conjunctions that tolerates a noi... |

46 | General bounds on statistical query learning and PAC learning with noise via hypothesis boosting - Aslam, Decatur - 1993 |

39 | Learning noisy perceptrons by a perceptron in polynomial time - Cohen - 1997 |

33 | Specification and simulation of statistical query algorithms for efficiency and noise tolerance - Aslam, Decatur - 1998 |

30 | Improved learning of AC0 functions - Furst, Jackson, et al. - 1991 |

25 | On learning ring-sum expansions - Fischer, Simon - 1992 |

24 | A polynomial-time algorithm for learning k{variable pattern languages from examples
- Kearns, Pitt
- 1989
(Show Context)
Citation Context ...rning boolean conjunctions with few relevant variables [8]; the algorithm of Blumer et al. for learning a union of axis-aligned rectangles in the Euclidean plane; and the algorithm of Kearns and Pitt =-=[13]-=- for learning pattern languages with respect to product distributions. In its original form, the covering method is not noisetolerant, and indeed with the exception of decision lists [14, 20], until n... |

18 | Learning from Good and Bad Data. Kluwer international series in engineering and computer science - Laird - 1988 |

17 |
Improved learning of AC ~ functions
- Furst, Jackson, et al.
- 1991
(Show Context)
Citation Context ...g AC 0 in time O(n poly(log n )) with respect to the uniform distribution in the Valiant model (and its subsequent generalization with respect to product distributions due to Furst, Jackson and Smith =-=[5]-=-); several efficient algorithms for learning restricted forms of DNF with respect to the uniform distribution in the Valiant model [12]; and efficient algorithms for learning unbounded-depth readonce ... |

13 | Three unfinished works on the optimal storage capacity of networks
- Gardner, Derrida
- 1994
(Show Context)
Citation Context ...ribution). Despite the voluminous literature on learning perceptrons in general (see the work of Minsky and Papert [17] for a partial bibliography) and with respect to this distribution in particular =-=[23, 2, 6]-=-, no efficient noise-tolerant learning algorithm has been given previously. Here we give a very simple and efficient algorithm for learning from statistical queries (and thus an algorithm tolerating n... |

13 |
Learning monotone k- DNF formulas on product distributions
- Hancock, Mansour
- 1991
(Show Context)
Citation Context ... with respect to the uniform distribution in the Valiant model [12]; and efficient algorithms for learning unbounded-depth readonce circuits with respect to product distributions in the Valiant model =-=[21, 7]-=-. For all of these classes we can obtain efficient algorithms for learning with noise by Theorem 3; in this list, only for conjunctions [1] and Schapire's work on read-once circuits [21] were there pr... |

13 |
Learning probabilistic read-once formulas on product distributions
- Schapire
- 1991
(Show Context)
Citation Context ... with respect to the uniform distribution in the Valiant model [12]; and efficient algorithms for learning unbounded-depth readonce circuits with respect to product distributions in the Valiant model =-=[21, 7]-=-. For all of these classes we can obtain efficient algorithms for learning with noise by Theorem 3; in this list, only for conjunctions [1] and Schapire's work on read-once circuits [21] were there pr... |

8 | The transition to perfect generalization in perceptrons - Baum, Lyuu - 1991 |

7 |
Algorithmic Learning of Formal Languages and Decision Trees
- Sakakibara
- 1991
(Show Context)
Citation Context ...ning boolean conjunctions that tolerates a noise rate approaching the information-theoretic barrier of 1=2. Subsequently, there have been some isolated instances of efficient noisetolerant algorithms =-=[14, 20, 22]-=-, but little work on characterizing which classes can be efficiently learned in the presence of noise, and no general transformations of Valiant model algorithms into noise-tolerant algorithms. The pr... |

2 |
and Yuh-Dauh Lyuu. The transition to perfect generalization in perceptrons
- Baum
- 1991
(Show Context)
Citation Context ...ribution). Despite the voluminous literature on learning perceptrons in general (see the work of Minsky and Papert [17] for a partial bibliography) and with respect to this distribution in particular =-=[23, 2, 6]-=-, no efficient noise-tolerant learning algorithm has been given previously. Here we give a very simple and efficient algorithm for learning from statistical queries (and thus an algorithm tolerating n... |