Results 1  10
of
13
Learning halfspaces with the zeroone loss: Timeaccuracy tradeoffs.
 In NIPS,
, 2012
"... Abstract Given α, ϵ, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 + α) L * γ + ϵ, where L * γ is the optimal γmargin error rate. For α = 1/γ, polynomial time and sample complexity is achievable using the hingeloss. For α = 0 ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract Given α, ϵ, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 + α) L * γ + ϵ, where L * γ is the optimal γmargin error rate. For α = 1/γ, polynomial time and sample complexity is achievable using the hingeloss. For α = 0, ShalevShwartz et al. [2011] showed that poly(1/γ) time is impossible, while learning is possible in time exp
On the computational efficiency of training neural networks
"... It is wellknown that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks are trained efficiently using SGD and a variety of tricks that include different activation functions (e.g. ReLU), overspecification (i.e., train networks which are l ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
It is wellknown that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks are trained efficiently using SGD and a variety of tricks that include different activation functions (e.g. ReLU), overspecification (i.e., train networks which are larger than needed), and regularization. In this paper we revisit the computational complexity of training neural networks from a modern perspective. We provide both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks. 1
Deterministic approximate counting for degree2 polynomial threshold functions. manuscript
, 2013
"... ar ..."
Deterministic approximate counting for juntas of degree2 polynomial threshold functions. manuscript
, 2013
"... ar ..."
regularized Neural Networks are Improperly Learnable in Polynomial Time
"... Abstract We study the improper learning of multilayer neural networks. Suppose that the neural network to be learned has k hidden layers and that the 1 norm of the incoming weights of any neuron is bounded by L. We present a kernelbased method, such that with probability at least 1 − δ, it learn ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract We study the improper learning of multilayer neural networks. Suppose that the neural network to be learned has k hidden layers and that the 1 norm of the incoming weights of any neuron is bounded by L. We present a kernelbased method, such that with probability at least 1 − δ, it learns a predictor whose generalization error is at most worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoidlike activation functions and ReLUlike activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.
Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness
, 2014
"... Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigat ..."
Abstract
 Add to MetaCart
(Show Context)
Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the “polynomial regression ” algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to logconcave distributions on Rn in the challenging agnostic learning model. The power of this algorithm relies on the fact that under logconcave distributions, halfspaces can be approximated arbitrarily well by lowdegree polynomials. We ask whether this technique can be extended beyond logconcave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of nonlogconcave distributions on the real line, including those with densities proportional to exp(−x0.99). This impossibility result extends to multivariate distributions, and thus gives a strong limitation on the power of the polynomial
Efficient Deterministic Approximate Counting for LowDegree Polynomial Threshold Functions
"... ABSTRACT We give a deterministic algorithm for approximately counting satisfying assignments of a degreed polynomial threshold function (PTF). Given a degreed input polynomial p(x) over R n and a parameter > 0, our algorithm approximates (Since it is NPhard to determine whether the above prob ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT We give a deterministic algorithm for approximately counting satisfying assignments of a degreed polynomial threshold function (PTF). Given a degreed input polynomial p(x) over R n and a parameter > 0, our algorithm approximates (Since it is NPhard to determine whether the above probability is nonzero, any sort of efficient multiplicative approximation is almost certainly impossible even for randomized algorithms.) Note that the running time of our algorithm (as a function of n d , the number of coefficients of a degreed PTF) is a fixed polynomial. The fastest previous algorithm for this problem [Kan12b], based on constructions of unconditional pseudorandom generators for degreed PTFs, runs in time n This new CLT shows that any collection of Gaussian polynomials with small eigenvalues must have a joint distribution which is very close to a multidimensional Gaussian distribution. • A new decomposition of lowdegree multilinear polynomials over Gaussian inputs. Roughly speaking we show that (up to some small error) any such polynomial can be decomposed into a bounded number of multilinear polynomials all of which have extremely small eigenvalues. * This material is based upon work supported by the National Science Foundation under agreements Princeton University Prime Award No. CCF0832797 and Subcontract No. 00001583. Some of this work was done while the author was at the Simons Institute, UC Berkeley. † Supported by NSF grants CCF1115703 and CCF1319788. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. We use these new ingredients to give a deterministic algorithm for a Gaussianspace version of the approximate counting problem, and then employ standard techniques for working with lowdegree PTFs (invariance principles and regularity lemmas) to reduce the original approximate counting problem over the Boolean hypercube to the Gaussian version. As an application of our result, we give the first deterministic fixedparameter tractable algorithm for the following moment approximation problem: given a degreed polynomial p(x1, . . . , xn) over {−1, 1} n , a positive integer k and an error parameter , output a (1 ± )multiplicatively accurate estimate to E x∼{−1,1} n [p(x) k ]. Our algorithm runs in time
Improper Deep Kernels
"... Abstract Neural networks have recently reemerged as a powerful hypothesis class, yielding impressive classification accuracy in multiple domains. However, their training is a nonconvex optimization problem which poses theoretical and practical challenges. Here we address this difficulty by turnin ..."
Abstract
 Add to MetaCart
Abstract Neural networks have recently reemerged as a powerful hypothesis class, yielding impressive classification accuracy in multiple domains. However, their training is a nonconvex optimization problem which poses theoretical and practical challenges. Here we address this difficulty by turning to "improper" learning of neural nets. In other words, we learn a classifier that is not a neural net but is competitive with the best neural net model given a sufficient number of training examples. Our approach relies on a novel kernel construction scheme in which the kernel is a result of integration over the set of all possible instantiation of neural models. It turns out that the corresponding integral can be evaluated in closedform via a simple recursion. Thus we translate the nonconvex learning problem of a neural net to an SVM with an appropriate kernel. We also provide sample complexity results which depend on the stability of the optimal neural net.
Efficient Learning of Linear Separators under Bounded Noise
"... Abstract We study the learnability of linear separators in d in the presence of bounded (a.k.a Massart) noise. This is a realistic generalization of the random classification noise model, where the adversary can flip each example x with probability η(x) ≤ η. We provide the first polynomial time alg ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We study the learnability of linear separators in d in the presence of bounded (a.k.a Massart) noise. This is a realistic generalization of the random classification noise model, where the adversary can flip each example x with probability η(x) ≤ η. We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the uniform distribution over the unit sphere in d , for some constant value of η. While widely studied in the statistical learning theory community in the context of getting faster convergence rates, computationally efficient algorithms in this model had remained elusive. Our work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. We additionally provide lower bounds showing that popular algorithms such as hinge loss minimization and averaging cannot lead to arbitrarily small excess error under Massart noise, even under the uniform distribution. Our work, instead, makes use of a margin based technique developed in the context of active learning. As a result, our algorithm is also an active learning algorithm with label complexity that is only logarithmic in the desired excess error .
A PTAS for Agnostically Learning Halfspaces
, 2015
"... Abstract We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the d dimensional sphere. Namely, we show that for every µ > 0 there is an algorithm that runs in time poly d, 1 , and is guaranteed to return a classifier with error at most (1 + µ)opt + , where o ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the d dimensional sphere. Namely, we show that for every µ > 0 there is an algorithm that runs in time poly d, 1 , and is guaranteed to return a classifier with error at most (1 + µ)opt + , where opt is the error of the best halfspace classifier. This improves on Awasthi, Balcan and Long Awasthi et al.