Results 1  10
of
125
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Algorithms for hyperparameter optimization
 In NIPS
, 2011
"... Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyperparameter optimization has been the job of humans because they can be very efficient ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
(Show Context)
Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyperparameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it possible to run more trials and we show that algorithmic approaches can find better results. We present hyperparameter optimization results on tasks of training neural networks and deep belief networks (DBNs). We optimize hyperparameters using random search and two new greedy sequential methods based on the expected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P (yx) in which many elements of hyperparameter assignment (x) are known to be irrelevant given particular values of other elements. 1
AutoWEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms
"... Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previo ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
(Show Context)
Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA’s standard distribution, spanning 2 ensemble methods, 10 metamethods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR10, we show classification performance often much better than using standard selection and hyperparameter optimization methods. We hope that our approach will help nonexpert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.
Practical recommendations for gradientbased training of deep architectures
 Neural Networks: Tricks of the Trade
, 2013
"... ar ..."
(Show Context)
Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures
"... Many computer vision algorithms depend on configuration settings that are typically handtuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is freque ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Many computer vision algorithms depend on configuration settings that are typically handtuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is frequently critical to realizing a method’s full potential. Compounding matters, these parameters often must be retuned when the algorithm is applied to a new problem domain, and the tuning process itself often depends on personal experience and intuition in ways that are hard to quantify or describe. Since the performance of a given technique depends on both the fundamental quality of the algorithm and the details of its tuning, it is sometimes difficult to know whether a given technique is genuinely better, or simply better tuned. In this work, we propose a metamodeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace handtuning with a reproducible and unbiased optimizaProceedings of the 30 th
Algorithm runtime prediction: Methods & evaluation.
 Artificial Intelligence,
, 2014
"... Abstract Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm's runtime as a function of problemspecific instance features. Such models have many important appli ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
Abstract Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm's runtime as a function of problemspecific instance features. Such models have many important applications and over the past decade, a wide variety of techniques have been studied for building such models. In this extended abstract of our 2014 AI Journal article of the same title, we summarize existing models and describe new model families and various extensions. In a comprehensive empirical analyis using 11 algorithms and 35 instance distributions spanning a wide range of hard combinatorial problems, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously. Introduction NPcomplete problems are ubiquitous in AI. Luckily, while these problems may be hard to solve on worstcase inputs, it is often feasible to solve even large problem instances that arise in practice. Less luckily, stateoftheart algorithms often exhibit extreme runtime variation across instances from realistic distributions, even when problem size is held constant, and conversely the same instance can take dramatically different amounts of time to solve depending on the algorithm used • Algorithm selection. This classic problem of selecting the best from a given set of algorithms on a perinstance * This paper is an invited extended abstract of our 2014 AI Journal article
Recursive Autoencoders for ITGbased Translation
"... While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we pr ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variablesized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points. 1
Training Recurrent Neural Networks
, 2013
"... Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging probl ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessianfree (HF) optimizer and show that it can train RNNs on tasks that have extreme longrange temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to characterlevel language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with longterm dependencies. This directly contradicts widespread beliefs about the inability of firstorder methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization.
Bayesian optimization in high dimensions via random embeddings
 In Proceedings of the TwentyThird international joint conference on Artificial Intelligence
, 2013
"... Abstract Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, an ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Abstract Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple and applies to domains with both categorical and continuous variables. The experiments demonstrate that REMBO can effectively solve highdimensional problems, including automatic parameter configuration of a popular mixed integer linear programming solver.
RNADE: The realvalued neural autoregressive densityestimator
 In Advances in Neural Information Processing Systems 26 (NIPS 26
, 2013
"... We introduce RNADE, a new model for joint density estimation of realvalued vectors. Our model calculates the density of a datapoint as the product of onedimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, wh ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
We introduce RNADE, a new model for joint density estimation of realvalued vectors. Our model calculates the density of a datapoint as the product of onedimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradientbased optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case. 1