Results 1  10
of
15
Hardness of learning halfspaces with noise
 In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
, 2006
"... Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noisefree case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noisefree case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However, under the promise that a halfspace consistent with a fraction (1 − ε) of the examples exists (for some small constant ε> 0), it was not known how to efficiently find a halfspace that is correct on even 51 % of the examples. Nor was a hardness result that ruled out getting agreement on more than 99.9 % of the examples known. In this work, we close this gap in our understanding, and prove that even a tiny amount of worstcase noise makes the problem of learning halfspaces intractable in a strong sense. Specifically, for arbitrary ε, δ> 0, we prove that given a set of exampleslabel pairs from the hypercube a fraction (1 − ε) of which can be explained by a halfspace, it is NPhard to find a halfspace that correctly labels a fraction (1/2 + δ) of the examples. The hardness result is tight since it is trivial to get agreement on 1/2 the examples. In learning theory parlance, we prove that weak proper agnostic learning of halfspaces is hard. This settles a question that was raised by Blum et al. in their work on learning halfspaces in the presence of random classification noise [10], and in some more recent works as well. Along the way, we also obtain a strong hardness result for another basic computational problem: solving a linear system over the rationals. 1
The signrank of AC^0
 IN PROC. OF THE 49TH SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2008
"... The signrank of a matrix A = [Ai j] with ±1 entries is the least rank of a real matrix B = [Bi j] with Ai j Bi j> 0 for all i, j. We obtain the first exponential lower bound on the signrank of a function in AC 0. Namely, let f (x, y) = �m �m2 i=1 j=1 (xi j ∧ yi j). We show that the matrix [ f ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
The signrank of a matrix A = [Ai j] with ±1 entries is the least rank of a real matrix B = [Bi j] with Ai j Bi j> 0 for all i, j. We obtain the first exponential lower bound on the signrank of a function in AC 0. Namely, let f (x, y) = �m �m2 i=1 j=1 (xi j ∧ yi j). We show that the matrix [ f (x, y)]x,y has signrank 2�(m). This in particular implies that �cc 2 � ⊆ UPPcc, which solves a longstanding open problem posed by Babai, Frankl, and Simon (1986). Our result additionally implies a lower bound in learning theory. Specifically, let φ1,..., φr: {0, 1} n → R be functions such that every DNF formula f: {0, 1} n → {−1, +1} of polynomial size has the representation f ≡ sign(a1φ1 + · · · + ar φr) for some reals a1,..., ar. We prove that then r � 2�(n1/3) , which essentially matches an upper bound of 2Õ(n1/3) due to Klivans and Servedio (2001). Finally, our work yields the first exponential lower bound on the size of thresholdofmajority circuits computing a function in AC 0. This substantially generalizes and strengthens the results of Krause and Pudlák (1997).
Metric and Mixing Sufficient Conditions for Concentration of Measure
, 2008
"... We derive sufficient conditions for a family (S n, ρn,Pn) of metric probability spaces to have the measure concentration property. Specifically, if the sequence {Pn} of probability measures satisfies a strong mixing condition (which we call ηmixing) and the sequence of metrics {ρn} is what we call ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
We derive sufficient conditions for a family (S n, ρn,Pn) of metric probability spaces to have the measure concentration property. Specifically, if the sequence {Pn} of probability measures satisfies a strong mixing condition (which we call ηmixing) and the sequence of metrics {ρn} is what we call Ψdominated, we show that (S n, ρn,Pn) is a normal Lévy family. We establish these properties for some metric probability spaces, including the possibly novel S = [0, 1], ρn = ‖·‖ 1 case.
Learning Noisy Characters, Multiplication Codes, and Cryptographic Hardcore Predicates
, 2008
"... We present results in cryptography, coding theory and sublinear algorithms. In cryptography, we introduce a unifying framework for proving that a Boolean predicate is hardcore for a oneway function and apply it to a broad family of functions and predicates, showing new hardcore predicates for well ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We present results in cryptography, coding theory and sublinear algorithms. In cryptography, we introduce a unifying framework for proving that a Boolean predicate is hardcore for a oneway function and apply it to a broad family of functions and predicates, showing new hardcore predicates for well known oneway function candidates such as RSA and discretelog as well as reproving old results in an entirely different way. Our proof framework extends the listdecoding method of Goldreich and Levin [38] for showing hardcore predicates, by introducing a new class of error correcting codes and new listdecoding algorithm we develop for these codes. In coding theory, we introduce a novel class of error correcting codes that we name: Multiplication codes (MPC). We develop decoding algorithms for MPC codes, showing they achieve desirable combinatorial and algorithmic properties, including: (1) binary MPC of constant distance and exponential encoding length for which we provide efficient local list decoding and local self correcting algorithms; (2) binary MPC of constant distance and polynomial encoding length for which we provide efficient
Candidate Weak Pseudorandom Functions in AC0 ◦MOD2
"... Pseudorandom functions (PRFs) play a fundamental role in symmetrickey cryptography. However, they are inherently complex and cannot be implemented in the class AC0(MOD2). Weak pseudorandom functions (weak PRFs) do not suffer from this complexity limitation, yet they suffice for many cryptographic a ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Pseudorandom functions (PRFs) play a fundamental role in symmetrickey cryptography. However, they are inherently complex and cannot be implemented in the class AC0(MOD2). Weak pseudorandom functions (weak PRFs) do not suffer from this complexity limitation, yet they suffice for many cryptographic applications. We study the minimal complexity requirements for constructing weak PRFs. To this end • We conjecture that the function family FA(x) = g(Ax), where A is a random square GF (2) matrix and g is a carefully chosen function of constant depth, is a weak PRF. In support of our conjecture, we show that functions in this family are inapproximable by GF (2) polynomials of low degree and do not correlate with any fixed Boolean function family of subexponential size. • We study the class AC0◦MOD2 that captures the complexity of our construction. We conjecture that all functions in this class have a Fourier coefficient of magnitude exp(−poly log n) and prove this conjecture in the case when the MOD2 function is typical. • We investigate the relation between the hardness of learning noisy parities and the existence of weak PRFs in AC0 ◦MOD2. We argue that such a complexitydriven approach can play a role in bridging the gap between the theory and practice of cryptography.
Cryptography by Cellular Automata or How Fast Can Complexity Emerge in Nature?
"... Computation in the physical world is restricted by the following spatial locality constraint: In a single unit of time, information can only travel a bounded distance in space. A simple computational model which captures this constraint is a cellular automaton: A discrete dynamical system in which c ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Computation in the physical world is restricted by the following spatial locality constraint: In a single unit of time, information can only travel a bounded distance in space. A simple computational model which captures this constraint is a cellular automaton: A discrete dynamical system in which cells are placed on a grid and the state of each cell is updated via a local deterministic rule that depends only on the few cells within its close neighborhood. Cellular automata are commonly used to model real world systems in nature and society. Cellular automata were shown to be capable of a highly complex behavior. However, it is not clear how fast this complexity can evolve and how common it is with respect to all possible initial configurations. We examine this question from a computational perspective, identifying “complexity ” with computational intractability. More concretely, we consider an ncell automaton with a random initial configuration, and study the minimal number of computation steps t = t(n) after which the following problems can become computationally hard: • The inversion problem. Given the configuration y at time t, find an initial configuration x which leads to y in t steps.
Learning with Annotation Noise
"... It is usually assumed that the kind of noise existing in annotated data is random classification noise. Yet there is evidence that differences between annotators are not always random attention slips but could result from different biases towards the classification categories, at least for the hard ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
It is usually assumed that the kind of noise existing in annotated data is random classification noise. Yet there is evidence that differences between annotators are not always random attention slips but could result from different biases towards the classification categories, at least for the hardertodecide cases. Under an annotation generation model that takes this into account, there is a hazard that some of the training instances are actually hard cases with unreliable annotations. We show that these are relatively unproblematic for an algorithm operating under the 01 loss model, whereas for the commonly used voted perceptron algorithm, hard training cases could result in incorrect prediction on the uncontroversial cases at test time. 1
Algorithmic Signaling of Features in Auction Design
"... Abstract. In many markets, products are highly complex with an extremely large set of features. In advertising auctions, for example, an impression, i.e., a viewer on a web page, has numerous features describing the viewer’s demographics, browsing history, temporal aspects, etc. In these markets, an ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. In many markets, products are highly complex with an extremely large set of features. In advertising auctions, for example, an impression, i.e., a viewer on a web page, has numerous features describing the viewer’s demographics, browsing history, temporal aspects, etc. In these markets, an auctioneer must select a few key features to signal to bidders. These features should be selected such that the bidder with the highest value for the product can construct a bid so as to win the auction. We present an efficient algorithmic solution for this problem in a setting where the product’s features are drawn independently from a known distribution, the bidders ’ values for a product are additive over their known values for the features of the product, and the number of features is exponentially larger than the number of bidders and the number of signals. Our approach involves solving a novel optimization problem regarding the expectation of a sum of independent random vectors that may be of independent interest. We complement our positive result with a hardness result for the problem when features are arbitrarily correlated. This result is based on the conjectured hardness of learning kjuntas, a central open problem in learning theory. 1
Under the supervision of
, 2014
"... We survey the problem of learning linear models, in the binary and multiclass settings. In both cases, our goal is to find a linear model with least probability of mistake. This problem is known to be NPhard and even NPhard to learn improperly (under relevant assumptions). Nonetheless, under cert ..."
Abstract
 Add to MetaCart
(Show Context)
We survey the problem of learning linear models, in the binary and multiclass settings. In both cases, our goal is to find a linear model with least probability of mistake. This problem is known to be NPhard and even NPhard to learn improperly (under relevant assumptions). Nonetheless, under certain assumptions about the input the problem has an algorithm with worstcase polynomial time complexity. At first glance these assumptions seem to vary greatly. Starting from the realizable assumption, which entails that the labeling is deterministic and can be realized by a linear function, and ending with simply the existence of a predictor with margin and a low error rate. However all of these methods can be seen as generalized linear models, namely the optimal classifier can be used to estimate the distribution of the labels given any example. On a different note, all of these methods are based on convex optimization