| Nicolo Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427-485, 1997. |
....(Adaptive Replacement Cache) we study how to best combine disjoint lists of pages in order to construct an adaptive paging algorithm that has a miss rate lower than LRU. This paper contains our preliminary results. The approach we use takes advantage of the Expert Framework from online learning[7, 5, 8, 4, 3]. It is also an interesting application of Path Kernels [11] which are used to keep track exponentially many combinations of the disjoint lists implicitly, by only maintaining polynomially many weights. We introduce the concept of rollover [6] in order to better track a dynamically changing ....
....on the observed sequence of requests. Unfortunately, adaptive is seldom clearly de ned, and little theoretical or experimental evidence is ever given that policies are good at adapting. Even more recently, there has been a push by the on line learning community to apply the Expert Framework [7, 5, 8, 4, 3] to systems problems. Specifically, there has bee a push to use the framework to help develop more adaptive page replacement policies, and to better quantify what it means for a policy to be theoretically or experimentally [2, 1, 6] adaptive. Such approaches involve combining the criteria of ....
Nicolo Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427-485, 1997.
....decision sequence resembling that of the greedy algorithm, and thus will be vulnerable to adversarial attacks. On the other hand, values of 3 which are close to I will cause the algorithm to respond slowly to changes. The optimal value of 3 can be determined using the best expert framework of [33, 15]. See Sec. B.3. Final policy: distribution over outgoing edges. Our final policy is determined by each node making a probabilistic decision on an outgoing edge, selecting the edge which has the best chance of reaching the receiver. Our policy will discriminate packets based on the packet s hop ....
....sampling the less desirable edges it is able to respond quickly to changes in the adversarial fault pattern. 6 Existing Work Existing Algorithmic Work: The only algorithmic results that possibly work under this strong adversarial model are based on a computa tional learning framework [33, 15]. Near optimal learning algorithms, with reliable global information for finding a shortest path in a graph, where at each time a different known cost is assigned to each edge, were studied in [33, 15] these solu tions have an exponential computational overhead. Polynomial computational overhead ....
[Article contains additional citation context not shown here]
Nicol6 Cesa-Bianchi, Yoav Freund, David P. Helmbold, David Haussler, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. pages 382-391, 1993. To appear, Joumal of the Association for Computing Machinery.
....of a weighting over exponentially many different subtrees [3, 15, 8] Weighted model mixtures are also widely used in constructing algorithms with on line guarantees. In particular, the weighted majority algorithm and 2 its variants can be proved to compete well with the best expert [11, 4, 6, 5]. The weighting used in on line weighted majority algorithms is analogous to a Bayesian posterior distribution. In spite of the common use of posterior distributions, however, existing on line algorithms do not provide analogous structural risk minimization guarantees they are not guaranteed ....
Nicol Cesa-Bianchi, Yoav Freund, David P. Helmbold, David Haussler, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. JACM, 44(3):427--485, 1997.
....apply to the more general statement in [22] Weighted model mixtures are also widely used in constructing algorithms with on line guarantees. In particular, the weighted majority algorithm and its variants can be proved to compete well with the best expert on an arbitrary sequence of labeled data [13, 6, 8, 7]. The posterior weighting used in most on line algorithms is a Gibbs posterior Q fi as defined in the statement of theorem 2. One difference between these on line guarantees and theorem 1 is that for these algorithms one must know the appropriate value of fi before seeing the training data. ....
Nicol Cesa-Bianchi, Yoav Freund, David P. Helmbold, David Haussler, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. JACM, 44(3):427--485, 1997.
....and Warmuth [40] also construct some semi on line algorithms; such algorithms must be a priori given a good upper estimate K on the loss of the best regression function. For such algorithms, they derive bounds of the type L T L T O( p K) Using the usual doubling trick (Cesa Bianchi et al. [10]) it is possible to obtain bounds of the type L T L T O( p L T ) in the pure on line setting (see Cesa Bianchi et al. [12] Theorem IV.4) In our example of a true regression function corrupted by Gaussian noise, the difference L T Gamma L T will be bounded by a linear function of p T , ....
.... weight performs very well, j should be big (we should learn quickly, even if badly) Sometimes we will want to make j infinitesimal , without actually setting j = 0 (eg, in the case of the absolute loss function, where c(j) 1 as j 0) some way of doing this is described in Cesa Bianchi et al. [10]. Cesa Bianchi et al. [11] discuss using a prior distribution in the set of possible values of j (it would be natural to take an approximation to the universal prior ) In the case of perfectly mixable games the most natural thing to do is to take the largest j such that c(j) 1 (in other words, ....
Nicol`o Cesa-Bianchi, Yoav Freund, David Haussler, David P Helmbold, Robert E Schapire, and Manfred K Warmuth. How to use expert advice. Journal of the Association for Computing Machinery, 44:427--485, 1997.
....to the Bayesian algorithm but uses a slightly different formula for computing the posterior distribution. This formula is the exponential weights formula introduced by Littlestone and Warmuth in the context of the weighted majority algorithm [6] and further analyzed by Cesa Bianchi et al. [4]. The analysis of the algorithm consists of two parts. First, we consider, for each instance x, the log of the ratio of the total weight between those hypotheses that predict 1 on x and those hypotheses that predict Gamma1. We denote this ratio by (x) We prove that (x) is rather ....
Nicol`o Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the Association for Computing Machinery, 44(3):427--485, May 1997.
....suggested by Bayesian analysis but uses a slightly different formula for computing the posterior distribution. This formula is the exponential weights formula introduced by Littlestone and Warmuth in the context of the weighted majority algorithm [8] and further analyzed by Cesa Bianchi et al. [5]. Note however that we are generating a fixed classification rule and are therefore working in the standard batch learning model and not in the online learning model. The analysis of the algorithm consists of two parts. First, we consider, for each instance x, the log of the ratio of the total ....
Nicol o Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the Association for Computing Machinery,44(3):427-- 485, May 1997.
....procedure is guaranteed to be competitive (in terms of the quality of its predictions) with any pruning algorithm. We prove that our procedure is very efficient and highly robust. Our method can be viewed as a synthesis of two previously studied techniques. First, we apply Cesa Bianchi et al. s [3] results on predicting using expert advice (where we view each pruning as an expert ) to obtain an algorithm that has provably low prediction loss, but that is computationally infeasible. Next, we generalize and apply a method developed by Buntine [2, 1] and Willems, Shtarkov and Tjalkens [18, ....
....making a mistake (i.e. a prediction differing from the outcome y t ) The learner computes its predictions using predictions t P that are generated in a natural way by each pruning P of the given unpruned tree T . We first show how an algorithm developed and analyzed by Cesa Bianchi et al. [3] can be applied immediately to obtain a learning algorithm whose loss is bounded by a function that, for any pruning P, is linear in the prediction loss of P and the size of P (roughly, the number of nodes in the pruning) Their algorithm is closely related to work by Vovk [14] and Littlestone ....
[Article contains additional citation context not shown here]
Nicol`o Cesa-Bianchi, Yoav Freund, David P. Helmbold, David Haussler, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. In Proceedings of the Twenty-Fifth Annual ACM Symposium on the Theory of Computing, pages 382-- 391, 1993.
....be suggested by Bayesian analysis but uses a slightly different formula for computing the posterior distribution. This formula is the exponential weights formula introduced by Littlestone and Warmuth in the context of the weighted majority algorithm [8] and further analyzed by CesaBianchi et al. [5]. Note however that we are generating a fixed classification rule and are therefore working in the standard batch learning model and not in the online learning model. The analysis of the algorithm consists of two parts. First, we consider, for each instance x, the log of the ratio of the total ....
Nicol o Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. Journal of the Association for Computing Machinery, 44(3):427--485, May 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC