Results 1 - 10
of
120
Multiparty Communication Complexity
, 1989
"... A given Boolean function has its input distributed among many parties. The aim is to determine which parties to tMk to and what information to exchange with each of them in order to evaluate the function while minimizing the total communication. This paper shows that it is possible to obtain the Boo ..."
Abstract
-
Cited by 760 (22 self)
- Add to MetaCart
A given Boolean function has its input distributed among many parties. The aim is to determine which parties to tMk to and what information to exchange with each of them in order to evaluate the function while minimizing the total communication. This paper shows that it is possible to obtain the Boolean answer deterministically with only a polynomial increase in communication with respect to the information lower bound given by the nondeterministic communication complexity of the function.
Learning Deep Architectures for AI
"... Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as i ..."
Abstract
-
Cited by 183 (30 self)
- Add to MetaCart
Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.
Representation learning: A review and new perspectives.
- of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract
-
Cited by 173 (4 self)
- Add to MetaCart
Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Why does unsupervised pre-training help deep learning?
, 2010
"... Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks ..."
Abstract
-
Cited by 155 (20 self)
- Add to MetaCart
Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pre-training phase. The main question investigated here is the following: why does unsupervised pre-training work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al., 2009b) and as an aid to optimization (Bengio et al., 2007). Our results build on the work of Erhan et al. (2009b), showing that unsupervised pre-training appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pre-training effect.
Curriculum Learning
"... Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum ..."
Abstract
-
Cited by 147 (15 self)
- Add to MetaCart
Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them “curriculum learning”. In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions). 1.
Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion
, 2010
"... ..."
Noise sensitivity of Boolean functions and applications to percolation
, 2008
"... It is shown that a large class of events in a product probability space are highly sensitive to noise, in the sense that with high probability, the configuration with an arbitrary small percent of random errors gives almost no prediction whether the event occurs. On the other hand, weighted majority ..."
Abstract
-
Cited by 101 (18 self)
- Add to MetaCart
(Show Context)
It is shown that a large class of events in a product probability space are highly sensitive to noise, in the sense that with high probability, the configuration with an arbitrary small percent of random errors gives almost no prediction whether the event occurs. On the other hand, weighted majority functions are shown to be noise-stable. Several necessary and sufficient conditions for noise sensitivity and stability are given. Consider, for example, bond percolation on an n + 1 by n grid. A configuration is a function that assigns to every edge the value 0 or 1. Let ω be a random configuration, selected according to the uniform measure. A crossing is a path that joins the left and right sides of the rectangle, and consists entirely of edges e with ω(e) = 1. By duality, the probability for having a crossing is 1/2. Fix an ǫ ∈ (0,1). For each edge e, let ω ′ (e) = ω(e) with probability 1 − ǫ, and ω ′ (e) = 1 − ω(e)
Majority Gates vs. General Weighted Threshold Gates
- Computational Complexity
, 1992
"... . In this paper we study small depth circuits that contain threshold gates (with or without weights) and parity gates. All circuits we consider are of polynomial size. We prove several results which complete the work on characterizing possible inclusions between many classes defined by small depth c ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
(Show Context)
. In this paper we study small depth circuits that contain threshold gates (with or without weights) and parity gates. All circuits we consider are of polynomial size. We prove several results which complete the work on characterizing possible inclusions between many classes defined by small depth circuits. These results are the following: 1. A single threshold gate with weights cannot in general be replaced by a polynomial fan-in unweighted threshold gate of parity gates. 2. On the other hand it can be replaced by a depth 2 unweighted threshold circuit of polynomial size. An extension of this construction is used to prove that whatever can be computed by a depth d polynomial size threshold circuit with weights can be computed by a depth d + 1 polynomial size unweighted threshold circuit, where d is an arbitrary fixed integer. 3. A polynomial fan-in threshold gate (with weights) of parity gates cannot in general be replaced by a depth 2 unweighted threshold circuit of polynomial size...
Exploring strategies for training deep neural networks
- Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract
-
Cited by 90 (12 self)
- Add to MetaCart
(Show Context)
Département d’informatique et de recherche opérationnelle