Results 1  10
of
199
Greedy layerwise training of deep networks
, 2006
"... Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allow ..."
Abstract

Cited by 394 (48 self)
 Add to MetaCart
(Show Context)
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multilayer neural networks have many levels of nonlinearities allowing them to compactly represent highly nonlinear and highlyvarying functions. However, until recently it was not clear how to train such deep networks, since gradientbased optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layerwise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also conrm the hypothesis that the greedy layerwise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are highlevel abstractions of the input, bringing better generalization.
Hardness vs. randomness
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1994
"... We present a simple new construction of a pseudorandom bit generator, based on the constant depth generators of [N]. It stretches a short string of truly random bits into a long string that looks random to any algorithm from a complexity class C (eg P, NC, PSPACE,...) using an arbitrary function tha ..."
Abstract

Cited by 298 (27 self)
 Add to MetaCart
We present a simple new construction of a pseudorandom bit generator, based on the constant depth generators of [N]. It stretches a short string of truly random bits into a long string that looks random to any algorithm from a complexity class C (eg P, NC, PSPACE,...) using an arbitrary function that is hard for C. This construction reveals an equivalence between the problem of proving lower bounds and the problem of generating good pseudorandom sequences. Our construction has many consequences. The most direct one is that efficient deterministic simulation of randomized algorithms is possible under much weaker assumptions than previously known. The efficiency of the simulations depends on the strength of the assumptions, and may achieve P =BPP. We believe that our results are very strong evidence that the gap between randomized and deterministic complexity is not large. Using the known lower bounds for constant depth circuits, our construction yields an unconditionally proven pseudorandom generator for constant depth circuits. As an application of this generator we characterize the power of NP with a random oracle.
Almost Optimal Lower Bounds for Small Depth Circuits
, 1986
"... We give improved lower bounds for the size of small depth circuits computing several functions. In particular we prove almost optimal lower bounds for the size of parity circuits. Further we show that there are functions computable in polynomial size and depth k but requires exponential size when ..."
Abstract

Cited by 277 (8 self)
 Add to MetaCart
(Show Context)
We give improved lower bounds for the size of small depth circuits computing several functions. In particular we prove almost optimal lower bounds for the size of parity circuits. Further we show that there are functions computable in polynomial size and depth k but requires exponential size when the depth is restricted to k1. Our main lemma which is of independent interest states that by using a random restriction we can convert an AND of small ORs to an OR of small ANDs and conversely.
Learning Decision Trees using the Fourier Spectrum
, 1991
"... This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each ..."
Abstract

Cited by 207 (10 self)
 Add to MetaCart
This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each node (i.e., summation of a subset of the input variables over GF (2)). This paper shows how to learn in polynomial time any function that can be approximated (in norm L 2 ) by a polynomially sparse function (i.e., a function with only polynomially many nonzero Fourier coefficients). The authors demonstrate that any function f whose L 1 norm (i.e., the sum of absolute value of the Fourier coefficients) is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions. Moreover, it is shown that the functions with polynomial L 1 norm can be learned deterministically. The algorithm can a...
Scaling learning algorithms towards AI
 TO APPEAR IN “LARGESCALE KERNEL MACHINES”
, 2007
"... One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Lea ..."
Abstract

Cited by 122 (21 self)
 Add to MetaCart
One longterm goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to nonparametric learning, particularly kernel methods, are fundamentally limited in their ability to learn complex highdimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefficients. We argue that shallow architectures can be very inefficient in terms of required number of computational elements and examples. Second, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learning) and semisupervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in and higherlevel representations. We argue that deep architectures have the potential to generalize in nonlocal ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.
On the Power of SmallDepth Threshold Circuits
, 1990
"... We investigate the power of threshold circuits of small depth. In particular we give functions which require exponential size unweigted threshold circuits of depth 3 when we restrict the bottom fanin. We also prove that there are mone tone functions fk which can be computed in depth k and linear s ..."
Abstract

Cited by 120 (2 self)
 Add to MetaCart
(Show Context)
We investigate the power of threshold circuits of small depth. In particular we give functions which require exponential size unweigted threshold circuits of depth 3 when we restrict the bottom fanin. We also prove that there are mone tone functions fk which can be computed in depth k and linear size A, Vcircuits but require exponential size to compute by a depth k 1 monotone weighted threshold circuit.
Complexity Classes Defined By Counting Quantifiers
, 1991
"... We study the polynomial time counting hierarchy, a hierarchy of complexity classes related to the notion of counting. We investigate some of their structural properties, settling many open questions dealing with oracle characterizations, closure under boolean operations, and relations with other com ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
We study the polynomial time counting hierarchy, a hierarchy of complexity classes related to the notion of counting. We investigate some of their structural properties, settling many open questions dealing with oracle characterizations, closure under boolean operations, and relations with other complexity classes. We develop a new combinatorial technique to obtain relativized separations for some of the studied classes, which imply absolute separations for some logarithmic time bounded complexity classes.
Unprovability of Lower Bounds on the Circuit Size in Certain Fragments of Bounded Arithmetic
 IN IZVESTIYA OF THE RUSSIAN ACADEMY OF SCIENCE, MATHEMATICS
, 1995
"... We show that if strong pseudorandom generators exist then the statement “α encodes a circuit of size n (log ∗ n) for SATISFIABILITY ” is not refutable in S2 2 (α). For refutation in S1 2 (α), this is proven under the weaker assumption of the existence of generators secure against the attack by smal ..."
Abstract

Cited by 57 (6 self)
 Add to MetaCart
(Show Context)
We show that if strong pseudorandom generators exist then the statement “α encodes a circuit of size n (log ∗ n) for SATISFIABILITY ” is not refutable in S2 2 (α). For refutation in S1 2 (α), this is proven under the weaker assumption of the existence of generators secure against the attack by small depth circuits, and for another system which is strong enough to prove exponential lower bounds for constantdepth circuits, this is shown without using any unproven hardness assumptions. These results can be also viewed as direct corollaries of interpolationlike theorems for certain “split versions” of classical systems of Bounded Arithmetic introduced in this paper.
Proving lower bounds via pseudorandom generators
 FSTTCS 2005: Foundations of Software Technology and Theoretical Computer Science, 25th International Conference, Hyderabad, India, December 1518, 2005, Proceedings, volume 3821 of Lecture
, 2005
"... Abstract. In this paper, we formalize two stepwise approaches, based on pseudorandom generators, for proving P � = NP and its arithmetic analog: Permanent requires superpolynomial sized arithmetic circuits. 1 ..."
Abstract

Cited by 55 (4 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we formalize two stepwise approaches, based on pseudorandom generators, for proving P � = NP and its arithmetic analog: Permanent requires superpolynomial sized arithmetic circuits. 1
Cryptographic hardness for learning intersections of halfspaces.
 In 47th Annual IEEE Symposium on Foundations of Computer Science,
, 2006
"... ..."
(Show Context)