#### DMCA

## Black-box optimization of noisy functions with unknown smoothness

### Citations

813 | Finite-time analysis of the multiarmed bandit problem
- Auer, Cesa-Bianchi, et al.
(Show Context)
Citation Context ...unded noise with E [εt |xt ] = 0. After n evaluations, the algorithm outputs its best guess x(n), which can be different from xn. The performance measure we want to minimize is the value of the function at the returned point compared to the optimum, also referred to as simple regret, Rn def= sup x∈X f(x)− f (x (n)) . We assume there exists at least one point x? ∈ X such that f(x?) = supx∈X f(x). The relationship with bandit settings motivated UCT [10, 8], an empirically successful heuristic that hierarchically partitions domain X and selects the next point xt ∈ X using upper confidence bounds [1]. The empirical success of UCT on one side but the absence of performance guarantees for it on the other, incited research on similar but theoretically founded algorithms [4, 9, 12, 2, 6]. As the global optimization of the unknown function without absolutely any assumptions would be a daunting needle-in-a-haystack problem, most of the algorithms assume at least a very weak ∗on the leave from SequeL team, INRIA Lille - Nord Europe, France 1 assumption that the function does not decrease faster than a known rate around one of its global optima. In other words, they assume a certain local smoothn... |

442 | Bandit based Monte-Carlo planning
- Kocsis, Szepesvári
- 2006
(Show Context)
Citation Context ...ep t, an algorithm selects an action xt ∈ X and receives a reward rt, which is a noisy function evaluation such that rt = f(xt)+ εt, where εt is a bounded noise with E [εt |xt ] = 0. After n evaluations, the algorithm outputs its best guess x(n), which can be different from xn. The performance measure we want to minimize is the value of the function at the returned point compared to the optimum, also referred to as simple regret, Rn def= sup x∈X f(x)− f (x (n)) . We assume there exists at least one point x? ∈ X such that f(x?) = supx∈X f(x). The relationship with bandit settings motivated UCT [10, 8], an empirically successful heuristic that hierarchically partitions domain X and selects the next point xt ∈ X using upper confidence bounds [1]. The empirical success of UCT on one side but the absence of performance guarantees for it on the other, incited research on similar but theoretically founded algorithms [4, 9, 12, 2, 6]. As the global optimization of the unknown function without absolutely any assumptions would be a daunting needle-in-a-haystack problem, most of the algorithms assume at least a very weak ∗on the leave from SequeL team, INRIA Lille - Nord Europe, France 1 assumption ... |

71 | Bandit algorithms for tree search.
- Coquelin, Munos
- 2007
(Show Context)
Citation Context ...ep t, an algorithm selects an action xt ∈ X and receives a reward rt, which is a noisy function evaluation such that rt = f(xt)+ εt, where εt is a bounded noise with E [εt |xt ] = 0. After n evaluations, the algorithm outputs its best guess x(n), which can be different from xn. The performance measure we want to minimize is the value of the function at the returned point compared to the optimum, also referred to as simple regret, Rn def= sup x∈X f(x)− f (x (n)) . We assume there exists at least one point x? ∈ X such that f(x?) = supx∈X f(x). The relationship with bandit settings motivated UCT [10, 8], an empirically successful heuristic that hierarchically partitions domain X and selects the next point xt ∈ X using upper confidence bounds [1]. The empirical success of UCT on one side but the absence of performance guarantees for it on the other, incited research on similar but theoretically founded algorithms [4, 9, 12, 2, 6]. As the global optimization of the unknown function without absolutely any assumptions would be a daunting needle-in-a-haystack problem, most of the algorithms assume at least a very weak ∗on the leave from SequeL team, INRIA Lille - Nord Europe, France 1 assumption ... |

28 | X-armed Bandits.
- Bubeck, Munos, et al.
- 2011
(Show Context)
Citation Context ...is the value of the function at the returned point compared to the optimum, also referred to as simple regret, Rn def= sup x∈X f(x)− f (x (n)) . We assume there exists at least one point x? ∈ X such that f(x?) = supx∈X f(x). The relationship with bandit settings motivated UCT [10, 8], an empirically successful heuristic that hierarchically partitions domain X and selects the next point xt ∈ X using upper confidence bounds [1]. The empirical success of UCT on one side but the absence of performance guarantees for it on the other, incited research on similar but theoretically founded algorithms [4, 9, 12, 2, 6]. As the global optimization of the unknown function without absolutely any assumptions would be a daunting needle-in-a-haystack problem, most of the algorithms assume at least a very weak ∗on the leave from SequeL team, INRIA Lille - Nord Europe, France 1 assumption that the function does not decrease faster than a known rate around one of its global optima. In other words, they assume a certain local smoothness property of f . This smoothness is often expressed in the form of a semi-metric ` that quantifies this regularity [4]. Naturally, this regularity also influences the guarantees that t... |

27 | Pure exploration in finitely-armed and continuous-armed bandits.
- Bubeck, Munos, et al.
- 2011
(Show Context)
Citation Context ...ret bound for POO whereas HOO ensures a bound on both the cumulative and simple regret.6 Notice that since POO runs several HOOs with non-optimal values of the (ν, ρ) parameters, this algorithm explores much more than optimally fitted HOO, which dramatically impacts the cumulative regret. As a consequence, our result applies to the simple regret only. 4note that several possible values of those parameters are possible for the same function 5up to a logarithmic term √ lnn in the simple regret 6in fact, the bound on the simple regret is a direct consequence of the bound on the cumulative regret [3] 7 100 200 300 400 500 number of evaluations 0.06 0.08 0.10 0.12 0.14 0.16 0.18 si m pl e re gr et HOO, ρ = 0.0 HOO, ρ = 0.3 HOO, ρ = 0.66 HOO, ρ = 0.9 POO 4 5 6 7 8 number of evaluation (log-scaled) −4.0 −3.5 −3.0 −2.5 −2.0 si m pl e re gr et (lo gsc al ed ) HOO, ρ = 0.0 HOO, ρ = 0.3 HOO, ρ = 0.66 HOO, ρ = 0.9 POO Figure 2: Regret of POO and HOO run for different values of ρ. 4 Experiments We ran experiments on the function plotted in Figure 1 for HOO algorithms with different values of ρ and the POO7 algorithm for ρmax = 0.9. This function, as described in Section 1, has an upper and lower e... |

20 | Optimistic Optimization of Deterministic Functions without the Knowledge of its Smoothness.
- Munos
- 2011
(Show Context)
Citation Context ...x −1.0 −0.8 −0.6 −0.4 −0.2 0.0 f (x ) 0.0 0.2 0.4 0.6 0.8 1.0 ρ 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 si m pl e re gr et af te r 50 00 ev al ua tio ns Figure 1: Difficult function f : x → s (log2 |x− 0.5|) · ( √ |x− 0.5 |− (x− 0.5)2) − √ |x− 0.5| where, s(x) = 1 if the fractional part of x, that is, x − bxc, is in [0, 0.5] and s(x) = 0, if it is in (0.5, 1). Left: Oscillation between two envelopes of different smoothness leading to a nonzero d for a standard partitioning. Right: Regret of HOO after 5000 evaluations for different values of ρ. Another direction has been followed by Munos [11], where in the deterministic case (the function evaluations are not perturbed by noise), their SOO algorithm performs almost as well as the best known algorithms without the knowledge of the function smoothness. SOO was later extended to StoSOO [15] for the stochastic case. However StoSOO only extends SOO for a limited case of easy instances of functions for which there exists a semi-metric under which d = 0. Also, Bull [6] provided a similar regret bound for the ATB algorithm for a class of functions, called zooming continuous functions, which is related to the class of functions for which th... |

16 |
Multi-armed Bandit Problems in Metric Spaces. In
- Kleinberg, Slivkins, et al.
- 2008
(Show Context)
Citation Context ...is the value of the function at the returned point compared to the optimum, also referred to as simple regret, Rn def= sup x∈X f(x)− f (x (n)) . We assume there exists at least one point x? ∈ X such that f(x?) = supx∈X f(x). The relationship with bandit settings motivated UCT [10, 8], an empirically successful heuristic that hierarchically partitions domain X and selects the next point xt ∈ X using upper confidence bounds [1]. The empirical success of UCT on one side but the absence of performance guarantees for it on the other, incited research on similar but theoretically founded algorithms [4, 9, 12, 2, 6]. As the global optimization of the unknown function without absolutely any assumptions would be a daunting needle-in-a-haystack problem, most of the algorithms assume at least a very weak ∗on the leave from SequeL team, INRIA Lille - Nord Europe, France 1 assumption that the function does not decrease faster than a known rate around one of its global optima. In other words, they assume a certain local smoothness property of f . This smoothness is often expressed in the form of a semi-metric ` that quantifies this regularity [4]. Naturally, this regularity also influences the guarantees that t... |

14 | From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning. Foundation and Trends
- Munos
- 2014
(Show Context)
Citation Context ...is the value of the function at the returned point compared to the optimum, also referred to as simple regret, Rn def= sup x∈X f(x)− f (x (n)) . We assume there exists at least one point x? ∈ X such that f(x?) = supx∈X f(x). The relationship with bandit settings motivated UCT [10, 8], an empirically successful heuristic that hierarchically partitions domain X and selects the next point xt ∈ X using upper confidence bounds [1]. The empirical success of UCT on one side but the absence of performance guarantees for it on the other, incited research on similar but theoretically founded algorithms [4, 9, 12, 2, 6]. As the global optimization of the unknown function without absolutely any assumptions would be a daunting needle-in-a-haystack problem, most of the algorithms assume at least a very weak ∗on the leave from SequeL team, INRIA Lille - Nord Europe, France 1 assumption that the function does not decrease faster than a known rate around one of its global optima. In other words, they assume a certain local smoothness property of f . This smoothness is often expressed in the form of a semi-metric ` that quantifies this regularity [4]. Naturally, this regularity also influences the guarantees that t... |

13 | Stochastic Simultaneous Optimistic Optimization.
- Valko, Carpentier, et al.
- 2013
(Show Context)
Citation Context ...0.5| where, s(x) = 1 if the fractional part of x, that is, x − bxc, is in [0, 0.5] and s(x) = 0, if it is in (0.5, 1). Left: Oscillation between two envelopes of different smoothness leading to a nonzero d for a standard partitioning. Right: Regret of HOO after 5000 evaluations for different values of ρ. Another direction has been followed by Munos [11], where in the deterministic case (the function evaluations are not perturbed by noise), their SOO algorithm performs almost as well as the best known algorithms without the knowledge of the function smoothness. SOO was later extended to StoSOO [15] for the stochastic case. However StoSOO only extends SOO for a limited case of easy instances of functions for which there exists a semi-metric under which d = 0. Also, Bull [6] provided a similar regret bound for the ATB algorithm for a class of functions, called zooming continuous functions, which is related to the class of functions for which there exists a semi-metric under which the near-optimality dimension is d = 0. But none of the prior work considers a more general class of functions where there is no semi-metric adapted to the standard partitioning for which d = 0. To give an exampl... |

12 | Lipschitz Bandits without the Lipschitz Constant. In Algorithmic Learning Theory,
- Bubeck, Stoltz, et al.
- 2011
(Show Context)
Citation Context ...is and previous algorithms, such as HOO [4], TaxonomyZoom [14], or HCT [2], can be shown to scale with this new notion of d. Most of the prior bandit-based algorithms proposed for function optimization, for either deterministic or stochastic setting, assume that the smoothness of the optimized function is known. This is the case of known semi-metric [4, 2] and pseudo-metric [9]. This assumption limits the application of these algorithms and opened a very compelling question of whether this knowledge is necessary. Prior work responded with algorithms not requiring this knowledge. Bubeck et al. [5] provided an algorithm for optimization of Lipschitz functions without the knowledge of the Lipschitz constant. However, they have to assume that f is twice differentiable and a bound on the second order derivative is known. Combes and Proutiere [7] treat unimodal f restricted to dimension one. Slivkins [14] considered a general optimization problem embedded in a taxonomy2 and provided guarantees as a function of the quality of the taxonomy. The quality refers to the probability of reaching two cells belonging to the same branch that can have values that differ by more that half of the diamet... |

12 | Multi-armed Bandits on Implicit Metric Spaces.
- Slivkins
- 2011
(Show Context)
Citation Context ...andard partitioning as to one where each cell is split into regular same-sized subcells [13]. An important insight, detailed in Section 2, is that a near-optimality dimension d that is independent from the partitioning used by an algorithm (as defined in prior work [4, 9, 2]) does not embody the optimization difficulty perfectly. This is easy to see, as for any f we could define a partitioning, perfectly suited for f . An example is a partitioning, that at the root splits X into {x?} and X \ x?, which makes the optimization trivial, whatever d is. This insight was already observed by Slivkins [14] and Bull [6], whose zooming dimension depends both on the function and the partitioning. In this paper, we define a notion of near-optimality dimension d which measures the complexity of the optimization problem directly in terms of the partitioning used by an algorithm. First, we make the following local smoothness assumption about the function, expressed in terms of the partitioning and not any metric: For a given partitioning P , we assume that there exist ν > 0 and ρ ∈ (0, 1), s.t., ∀h ≥ 0,∀x ∈ Ph,i?h , f(x) ≥ f (x ?)− νρh where (h, i?h) is the (unique) cell of depth h containing x ?. The... |

5 |
Online Stochastic Optimization under Correlated Bandit Feedback.
- Azar, Lazaric, et al.
- 2014
(Show Context)
Citation Context |

3 | Adaptive-treed bandits.
- Bull
- 2015
(Show Context)
Citation Context |

2 |
Bandits Attack Function Optimization.
- Preux, Munos, et al.
- 2014
(Show Context)
Citation Context ...an algorithm that uses a given hierarchical partitioning of the space X as its input. The kind of hierarchical partitioning {Ph,i} we consider is similar to the ones introduced in prior work: for any depth h ≥ 0 in the tree representation, the set of cells {Ph,i}1≤i≤Ih form a partition of X , where Ih is the number of cells at depth h. At depth 0, the root of the tree, there is a single cell P0,1 = X . A cell Ph,i of depth h is split into several children subcells {Ph+1,j}j of depth h+ 1. We refer to the standard partitioning as to one where each cell is split into regular same-sized subcells [13]. An important insight, detailed in Section 2, is that a near-optimality dimension d that is independent from the partitioning used by an algorithm (as defined in prior work [4, 9, 2]) does not embody the optimization difficulty perfectly. This is easy to see, as for any f we could define a partitioning, perfectly suited for f . An example is a partitioning, that at the root splits X into {x?} and X \ x?, which makes the optimization trivial, whatever d is. This insight was already observed by Slivkins [14] and Bull [6], whose zooming dimension depends both on the function and the partitioning... |

1 |
Unimodal Bandits without Smoothness. ArXiv e-prints: http://arxiv.org/abs/1406.7447,
- Combes, Proutiere
- 2015
(Show Context)
Citation Context ... assume that the smoothness of the optimized function is known. This is the case of known semi-metric [4, 2] and pseudo-metric [9]. This assumption limits the application of these algorithms and opened a very compelling question of whether this knowledge is necessary. Prior work responded with algorithms not requiring this knowledge. Bubeck et al. [5] provided an algorithm for optimization of Lipschitz functions without the knowledge of the Lipschitz constant. However, they have to assume that f is twice differentiable and a bound on the second order derivative is known. Combes and Proutiere [7] treat unimodal f restricted to dimension one. Slivkins [14] considered a general optimization problem embedded in a taxonomy2 and provided guarantees as a function of the quality of the taxonomy. The quality refers to the probability of reaching two cells belonging to the same branch that can have values that differ by more that half of the diameter (expressed by the true metric) of the branch. The problem is that the algorithm needs a lower bound on this quality (which can be tiny) and the performance depends inversely on this quantity. Also it assumes that the quality is strictly positive. ... |