| P. L. Bartlett. Learning with a slowly changing distribution. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, pages 243-252. ACM Press, 1992. |
.... to learn the target exactly [46, 12, 60, 97, 13] and in which the goal is to learn an approximation to the target in a probabilistic sense [87, 86] Other useful variants not discussed here are those in which the distribution and target are permitted to change a little between observations, as in [23, 53, 54], models REFERENCES 60 of weak learning in which the learner only has to do slightly better than random guessing [48, 90, 55] and variants in which the learning algorithm has access to the predictions of experts [36] It is hoped that the reader has gained a flavour of this subject. There are ....
P. L. Bartlett. Learning with a slowly changing distribution. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 243--252. ACM Press, New York, NY, 1992.
....a learning algorithm must produce a hypothesis that will be evaluated with respect to some distribution D 2 D, based on examples it receives from a distribution D 0 chosen by an adversary with the restriction that d(D; D 0 ) fi, where d( Delta; Delta) is defined as in Definition 10. Bartlett [2] studies learning when the sequence of examples come from a sequence of distributions such that the shift between consecutive distributions in the sequence is bounded. Distribution shift differs from this model in that all of the training examples come from the same distribution. We next show the ....
P. Bartlett. Learning with a slowly changing distribution. In Proceedings of COLT '92, pages 243--252. Morgan Kaufmann, 1992.
....in our approach are tentative, since they always rely upon incomplete data sets. The capability of performing revisions of onceestablished version boundaries therefore constitutes an integral part of our concept drift model. This contrasts with the assumptions underlying the experiments in [5] and [6] where the complete training sets are given once and for all and learning proceeds under closed data sets. In accordance with previous approaches, we maintain the most recent concept description which specifies the currently valid standards in the domain. Additionally, we need to ....
P.L.Bartlett, `Learning with a slowly changing distribution', COLT'92: Proc. 5th Workshop on Computational Learning Theory, 243-252, 1992.
.... to learn the target exactly [49, 13, 65, 108, 14] and in which the goal is to learn an approximation to the target in a probabilistic sense [97, 96] Other useful variants not discussed here are those in which the distribution and target are permitted to change a little between observations, as in [25, 58, 59], models of weak learning in which the learner only has to do slightly better than random guessing [51, 100, 60] and variants in which the learning algorithm has access to the predictions of experts [38] Something which has not been discussed in any detail here is the use of real output neural ....
P. L. Bartlett. Learning with a slowly changing distribution. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 243--252. ACM Press, New York, NY, 1992.
....the PAC model [Val84] is that all the examples are drawn from the same distribution, and that the target function does not change with time. The drawbacks of this assumption have been widely recognized, and a considerable amount of work was devoted to study the cases where either the distribution [Bar92, BL96] or the target function [HL94, BBDK96] changes over time. Clearly, without constraints on the way the distribution or target function change over time, it is hopeless to achieve any meaningful learning result. The most common and natural assumption is that the changes are not drastic. A formal way ....
....any meaningful learning result. The most common and natural assumption is that the changes are not drastic. A formal way to say this is that the distance between two consecutive distributions (target functions) is bounded by some parameter. This approach was the main subject of previous research [HL94, Bar92, BL96, BBDK96], and has developed interesting learning results. Common to the results in [HL94, Bar92, BL96] is the assumption that the rate of drift is sufficiently small that the same hypothesis is good for a sufficiently long period of time. Based on this assumption, the learning method of choice is to ....
[Article contains additional citation context not shown here]
Bartlett. Learning with slowly changing distribution. In Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, pages 243--252, 1992.
....models for classi cation whichwe do not consider here. These include models which assume that there exists a target function belonging to a particular class of functions [41, 66] relax the assumption of independently generated examples [1, 7] allow for drift in the generating distribution [4, 8, 33] and relax the assumption that there is a xed relationship between measurements and class labels [6, 12, 39] Typically, the measurement space X is taken to be some subset of # . For the sake of simplicity,we will almost exclusively deal with the problem of binary classi cation with Y = ....
P. L. Bartlett. Learning with a slowly changing distribution. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, pages 243-252. ACM Press, 1992.
....jp(x Gamma y nff) Gamma p (x Gamma Q ff (y) nff)j dx Z ff=2 Gammaff=2 1 X n= Gamma1 sup z2( Gammaff=2;ff=2) jp(x Gamma y nff) Gamma p(x Gamma y nff z)j dx ffv(oe) We will use the following lemma. The proof is by induction, and is implicit in the proof of Lemma 12 in [8]. Lemma 7 If P i and Q i (i = 1; m) are distributions on a set Y , and is a [0; 1] valued random variable defined on Y m , then fi fi fi fi Z Y m dP Gamma Z Y m dQ fi fi fi fi 1 2 m X i=1 d TV (P i ; Q i ) where P = Q m i=1 P i and Q = Q m i=1 Q i are ....
P. L. Bartlett, Learning with a slowly changing distribution, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, New York, 1992.
....p Gamma x Gamma Q ff (y) Delta dx fi fi fi R ff=2 Gammaff=2 P1 n= Gamma1 jp(x Gamma y nff) Gammap Gamma x Gamma Q ff (y) nff Delta fi fi dx ffv(oe) We will use the following lemma. The proof is by induction, and is implicit in the proof of Lemma 12 in [6]. Lemma 7 If P i and Q i are distributions on a set Y (i = 1; m) and E is a measurable subset of Y m , then fi fi fi fi fi m Y i=1 P i (E) Gamma m Y i=1 Q i (E) fi fi fi fi fi 1 2 P m i=1 d TV (P i ; Q i ) Proof (of Lemma 5) We will describe a randomized ....
P. L. BARTLETT, Learning with a slowly changing distribution, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, 1992.
....to uniform if, for all k 2 IN , for all x 2 X IN n , P kjx is defined and d TV (P kjx ; U n ) fl: The following lemma shows that if a process is close to uniform, the expectation of a bounded function is close to that under the uniform distribution. The proof is by induction, using Lemma 3 in [2]. In Lemma 3.2 we will apply this result to the mistake indicator function M t A;f . Lemma 3.1 If m 2 IN , X m n [0; 1] 0 fl 1, and P is a stochastic process on X n that is fl close to uniform, then fi fi fiIE x2P ( sub 1;m (x) Gamma IE x2U m n ( x) fi fi fi mfl=2 ; where ....
P. L. Bartlett. Learning with a slowly changing distribution. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 243--252. ACM Press, New York, NY, 1992.
.... on Xn P kjx (b) Pr y2P (y k 1 = bjy i = x i ; i = 1; 2; k) The stochastic process P is said to be fl close to uniform if, for all k 2 IN , for all x 2 X IN n , P kjx is defined and dTV (P kjx ; Un ) fl: We will use the following lemma; it is proved inductively using Lemma 3 in [2]. Lemma 8 If m 2 IN , X m n [0; 1] 0 fl 1, and P is a stochastic process on Xn that is fl close to uniform, then jE x2P ( sub 1;m (x) Gamma E x2Un ( sub 1;m (x) j mfl ; where sub 1;m (x) x 1 ; xm ) Lemma 9 Let P n;fl be the class of stochastic processes on Xn ....
P. L. Bartlett. Learning with a slowly changing distribution. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 243--252. ACM Press, New York, NY, 1992.
....version of this bound. The final result of the paper, when converted to the setting of the earlier work described above, shows that with this weaker constraint, the allowable drift rate decreases by no more than log factors, ffl 2 =d versus ffl 2 = d log 2 (d=ffl) Several authors [1, 2, 3, 11] have considered learning problems in which a changing environment is modelled by a slowly changing distribution on the product space X Theta f0; 1g. The allowable drift is restricted by ensuring that consecutive probability distributions are close in total variation distance. Clearly, allowing a ....
P. L. Bartlett. Learning with a slowly changing distribution. In Proceedings of the 1992 Workshop on Computational Learning Theory, pages 243--252, 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC