Results 1 - 10
of
50
Representation learning: A review and new perspectives.
- of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract
-
Cited by 173 (4 self)
- Add to MetaCart
(Show Context)
Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms
"... Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previo ..."
Abstract
-
Cited by 32 (8 self)
- Add to MetaCart
(Show Context)
Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA’s standard distribution, spanning 2 ensemble methods, 10 meta-methods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR-10, we show classification performance often much better than using standard selection and hyperparameter optimization methods. We hope that our approach will help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.
Practical recommendations for gradient-based training of deep architectures
- Neural Networks: Tricks of the Trade
, 2013
"... ar ..."
Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures
"... Many computer vision algorithms depend on configuration settings that are typically hand-tuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is freque ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
(Show Context)
Many computer vision algorithms depend on configuration settings that are typically hand-tuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is frequently critical to realizing a method’s full potential. Compounding matters, these parameters often must be re-tuned when the algorithm is applied to a new problem domain, and the tuning process itself often depends on personal experience and intuition in ways that are hard to quantify or describe. Since the performance of a given technique depends on both the fundamental quality of the algorithm and the details of its tuning, it is sometimes difficult to know whether a given technique is genuinely better, or simply better tuned. In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimiza-Proceedings of the 30 th
Bayesian optimization in high dimensions via random embeddings
- In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
, 2013
"... Abstract Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, an ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Abstract Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple and applies to domains with both categorical and continuous variables. The experiments demonstrate that REMBO can effectively solve high-dimensional problems, including automatic parameter configuration of a popular mixed integer linear programming solver.
Towards an empirical foundation for assessing Bayesian optimization of hyperparameters
- In NIPS Workshop on Bayesian Optimization in Theory and Practice
, 2013
"... Progress in practical Bayesian optimization is hampered by the fact that the only available standard benchmarks are artificial test functions that are not representative of practical applications. To alleviate this problem, we introduce a library of benchmarks from the prominent application of hyper ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
(Show Context)
Progress in practical Bayesian optimization is hampered by the fact that the only available standard benchmarks are artificial test functions that are not representative of practical applications. To alleviate this problem, we introduce a library of benchmarks from the prominent application of hyperparameter optimization and use it to compare Spearmint, TPE, and SMAC, three recent Bayesian optimization methods for hyperparameter optimization. 1
On correlation and budget constraints in model-based bandit optimization
"... with application to automatic machine learning ..."
Auto-WEKA: Automated selection and hyper-parameter optimization of classification algorithms
, 2012
"... There exists a large variety of machine learning algorithms; as most of these can be configured via hyper-parameters, there is a staggeringly large number of possible alternatives overall. There has been a consid-erable amount of previous work on choosing among learning algorithms and, separately, o ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
There exists a large variety of machine learning algorithms; as most of these can be configured via hyper-parameters, there is a staggeringly large number of possible alternatives overall. There has been a consid-erable amount of previous work on choosing among learning algorithms and, separately, on optimizing hyper-parameters (mostly when these are continuous and very few in number) in a given use context. However, we are aware of no work that addresses both problems together. Here, we demonstrate the feasibility of using a fully automated approach for choosing both a learning algorithm and its hyper-parameters, leveraging recent innovations in Bayesian optimization. Specifically, we apply this approach to the full range of classifiers implemented in WEKA, spanning 3 ensemble methods, 14 meta-methods, 30 base classifiers, and a wide range of hyper-parameter settings for each of these. On each of 10 popular data sets from the UCI repository, we show classification performance better than that of complete cross-validation over the default hyper-parameter settings of our 47 classification algorithms. We believe that our approach, which we dubbed Auto-WEKA, will enable typical users of machine learning algorithms to make better choices and thus to obtain better performance in a fully automated fashion. 1
An efficient approach for assessing hyperparameter importance
- In Proc. of ICML-14
, 2014
"... The performance of many machine learning meth-ods depends critically on hyperparameter set-tings. Sophisticated Bayesian optimization meth-ods have recently achieved considerable successes in optimizing these hyperparameters, in several cases surpassing the performance of human ex-perts. However, bl ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
The performance of many machine learning meth-ods depends critically on hyperparameter set-tings. Sophisticated Bayesian optimization meth-ods have recently achieved considerable successes in optimizing these hyperparameters, in several cases surpassing the performance of human ex-perts. However, blind reliance on such methods can leave end users without insight into the rela-tive importance of different hyperparameters and their interactions. This paper describes efficient methods that can be used to gain such insight, leveraging random forest models fit on the data already gathered by Bayesian optimization. We first introduce a novel, linear-time algorithm for computing marginals of random forest predictions and then show how to leverage these predictions within a functional ANOVA framework, to quan-tify the importance of both single hyperparame-ters and of interactions between hyperparameters. We conducted experiments with prominent ma-chine learning frameworks and state-of-the-art solvers for combinatorial problems. We show that our methods provide insight into the relation-ship between hyperparameter settings and perfor-mance, and demonstrate that—even in very high-dimensional cases—most performance variation is attributable to just a few hyperparameters. 1.
Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images
"... Saliency prediction typically relies on hand-crafted (multiscale) features that are combined in different ways to form a “master ” saliency map, which encodes local image conspicuity. Recent improvements to the state of the art on standard benchmarks such as MIT1003 have been achieved mostly by incr ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Saliency prediction typically relies on hand-crafted (multiscale) features that are combined in different ways to form a “master ” saliency map, which encodes local image conspicuity. Recent improvements to the state of the art on standard benchmarks such as MIT1003 have been achieved mostly by incrementally adding more and more hand-tuned features (such as car or face detectors) to existing mod-els [18, 4, 22, 34]. In contrast, we here follow an entirely automatic data-driven approach that performs a large-scale search for optimal features. We identify those instances of a richly-parameterized bio-inspired model family (hierarchi-cal neuromorphic networks) that successfully predict image saliency. Because of the high dimensionality of this param-eter space, we use automated hyperparameter optimization to efficiently guide the search. The optimal blend of such multilayer features combined with a simple linear classifier achieves excellent performance on several image saliency benchmarks. Our models outperform the state of the art on MIT1003, on which features and classifiers are learned. Without additional training, these models generalize well to two other image saliency data sets, Toronto and NUSEF, despite their different image content. Finally, our algorithm scores best of all the 23 models evaluated to date on the MIT300 saliency challenge [16], which uses a hidden test set to facilitate an unbiased comparison. 1.