• Aucun résultat trouvé

Bayesian Optimization for More Automatic Machine Learning

N/A
N/A
Protected

Academic year: 2022

Partager "Bayesian Optimization for More Automatic Machine Learning"

Copied!
1
0
0

Texte intégral

(1)

Bayesian Optimization for

More Automatic Machine Learning

(extended abstract for invited talk) Frank Hutter

1

Bayesian optimization(see, e.g., [2]) is a framework for the op- timization of expensive blackbox functions that combines prior as- sumptions about the shape of a function with evidence gathered by evaluating the function at various points. In this talk, I will briefly de- scribe the basics of Bayesian optimization and how to scale it up to handle structured high-dimensional optimization problems in the se- quential model-based algorithm configuration framework SMAC [6].

Then, I will discuss applications of SMAC to two structured high- dimensional optimization problems from the growing field ofauto- matic machine learning:

• Feature selection, learning algorithm selection, and optimization of its hyperparameters are crucial for achieving good performance in practical applications of machine learning. We demonstrate that a combined optimization over all of these choices can be carried out effectively by formulating the problem of finding a good in- stantiation of the popular WEKA framework as a 768-dimensional optimization problem. The resulting Auto-WEKA framework [7]

allows non-experts with some available compute time to achieve state-of-the-art learning performance on the push of a button.

• Deep learning has celebrated many recent successes, but its per- formance is known to be very sensitive to architectural choices and hyperparameter settings. Therefore, so far its potential could only be unleashed by deep learning experts. We formulated the com- bined problem of selecting the right neural network architecture and its associated hyperparameters as a 81-dimensional optimiza- tion problem and showed that an automated procedure could find a network whose performance exceeded the previous state-of-the- art achieved by human domain experts using the same building blocks [3]. Computational time remains a challenge, but this re- sult is a step towards deep learning for non-experts.

To stimulate discussion, I will finish by highlighting several fur- ther opportunities for combining meta-learning and Bayesian opti- mization:

• Prediction of learning curves [3],

• Learning the importance of hyperparameters (and of meta- features) [4, 5], and

• Using meta-features to generalize hyperparameter performance across datasets [1, 8], providing a prior for Bayesian optimization.

Based on joint work with Tobias Domhan, Holger Hoos, Kevin Leyton-Brown, Jost Tobias Springenberg, and Chris Thornton.

1University of Freiburg, Germany. Email: fh@cs.uni-freiburg.de.

REFERENCES

[1] R. Bardenet, M. Brendel, B. Kgl, and M. Sebag, ‘Collaborative hyperpa- rameter tuning’, inProc. of ICML, (2013).

[2] E. Brochu, V. M. Cora, and N. de Freitas, ‘A tutorial on Bayesian optimization of expensive cost functions, with application to ac- tive user modeling and hierarchical reinforcement learning’, CoRR, abs/1012.2599, (2010).

[3] Tobias Domhan, Tobias Springenberg, and Frank Hutter, ‘Extrapolating learning curves of deep neural networks’, inICML 2014 AutoML Work- shop, (June 2014).

[4] F. Hutter, H. Hoos, and K. Leyton-Brown, ‘Identifying key algorithm parameters and instance features using forward selection’, inLearning and Intelligent Optimization, pp. 364–381, (2013).

[5] F. Hutter, H. Hoos, and K. Leyton-Brown, ‘An efficient approach for assessing hyperparameter importance’, inInternational Conference on Machine Learning, (2014).

[6] F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘Sequential model-based optimization for general algorithm configuration’, inProc. of LION-5, (2011).

[7] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘Auto- WEKA: combined selection and hyperparameter optimization of clas- sification algorithms’, inProc. of KDD’13, (2013).

[8] D. Yogatama and G. Mann, ‘Efficient transfer learning method for auto- matic hyperparameter tuning’, inProc. of AISTATS, (2014).

Références

Documents relatifs

The use of surrogate models and parallel computing – or a combination of both – allows to reduce the optimization time and the number of calls to the objective function..

The talk will cover the main components of sequential model-based op- timization algorithms, e.g., surrogate regression models like Gaussian processes or random forests,

The optimization converges in 175 iterations for τ = 0.5. The left plot in Figure 20 shows the evolution with the iterations of the estimated minimal drag quantile expressed in

As the additional proximal step induces new difficulties in the asynchronous setting (both in practice and for the analysis), we start by introducing Sparse Proximal S AGA , a

Whereas traditional gradient-based methods may be effective for solving small-scale learning problems in which a batch approach may be used, in the context of large-scale

The optimization of the criterion is also a difficult problem because the EI criterion is known to be highly multi-modal and hard to optimize. Our proposal is to conduct

It is common for d to be large, d & 50 , making the optimization dicult, especially when f is an expensive black-box and the use of surrogate-based approaches [1] is

Based on their features and properties, we propose a new method that combines Bayesian optimization and local search with a reasonable transition condition to improve the