74 citations found. Retrieving documents...
N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California Santa Cruz, 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Researching System Administration - Anderson   (Correct)

....it works in practice. In general, the additional work described in section 6.6 on monitoring is a good direction for future work. The most interesting direction for work on CARD is automatic derivation of dependencies. The idea here is to use either machine learning [AL88, Kea93, BHL91, KL88, Lit89] or association rule mining [AIS93, AS94] techniques to automatically determine dependencies. This approach requires having some monitored values that indicate if a system is up or down. Then, if we can show that any time component 1 is down, component 2 is also down, but not the reverse, then it ....

Nick Littlestone. Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, U. C. Santa Cruz, March 1989.


Linear Hinge Loss and Average Margin - Gentile, Warmuth (1998)   (10 citations)  (Correct)

....how far w is from w t . The divergence function has two purposes. It motivates the update and it becomes the potential function in the amortized analysis used to prove loss bounds for the corresponding algorithm. The use of an amortized analysis in the context of learning essentially goes back to [Lit89] and the method for deriving updates based on the divergence was introduced in [KW97] The divergence may be seen as a regularization term and may also serve as a barrier function in the optimization problem (1) for the purpose of keeping the weights in a particular region. The additive ....

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California Santa Cruz, 1989.


Combinatorial Variability of Vapnik-Chervonenkis Classes.. - Ben-David, Litman   (Correct)

....aspects of the combinatorial richness of classes, are invariant under the above notions of embedding. The following claim lists some examples of such parameters that arise in the context of computational learning theory. We refer the reader to [BEHW89] for a definition of PAC learnability and to [L89] for a definition of online learning. Claim 2: 1. C gemb C 0 implies VC dim(C) VC dim(C 0 ) 2. C gemb C 0 implies that, for any type of compression scheme discussed below, if C 0 has a compression scheme of some size, d, then so does C. 3. C gemb C 0 implies that, for ....

N. Littlestone, "Mistake bounds and logarithmic linear-threshold learning algorithms" PhD thesis, U.C. Santa Cruz, March 1989.


Tracking the Best Linear Predictor - Herbster (2001)   (5 citations)  (Correct)

....and the static bounds can be converted to shifting bounds. Keywords: on line learning, amortized analysis, shifting, switching, bregman divergence, projection 1. Introduction Consider the following by now standard on line learning model which is a generalization of a model introduced by Littlestone (1989; 1988) The authors were supported by the NSF grants CCR 9700201 and CCR 9821087. Mark Herbster was also supported by ESPRC grant GR M15972. An extended abstract appeared in (Herbster and Warmuth, 1998b) c 0 Mark Herbster and Manfred K. Warmuth. Herbster Warmuth Learning proceeds in ....

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz, 1989.


Avrim Blum - School Of Computer   (Correct)

....In addition, we would like to discourage the use of learning methods that involve creating a list of all possible descriptive terms and then deleting the ones deemed unnecessary. In the standard theoretical models for learning Boolean functions from examples (Valiant, 1984) Kearns et al. 1987) (Littlestone, 1989), we imagine there are n attributes that our algorithm is aware of and we are trying to learn some Boolean function f of these attributes. An example is some x 2 f0; 1g n with the interpretation that the ith component of x is 1 if x has the ith attribute. The learning algorithm gains information ....

Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, U. C. Santa Cruz.


A Dynamic Disk Spin-Down Technique for Mobile Computing - Sherrod (1997)   (Correct)

....learning The share algorithm is a member of the multiplicative weight algorithmic family that has been developed by the computational learning theory community. This family has a long history and excellent performance for a wide variety of on line problems [CBFH 94, KW93, LW94, HKW94, Lit88, Lit89, Vov90] Algorithms in this family receive as input a set of experts, other algorithms that make predictions. On each trial, each expert makes a prediction. The goal of the algorithm is to combine the predictions of the experts in a way that minimizes the total error, or loss, over the sequence ....

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. Ph.D. dissertation, University of California Santa Cruz, 1989.


Adaptive and Self-Confident On-Line Learning Algorithms - Auer, Cesa-Bianchi, Gentile (2000)   (6 citations)  (Correct)

....data. In an on line setting these informations are typically not available. Tuning is one of the most critical aspects of an on line learning algorithm and might a ect performance in a substantial way. In this introductory section we begin by using the (randomized) Weighted Majority algorithm of [24, 26, 29, 31, 7] as a motivating example to illustrate the Introduction 2 tuning problem we are interested in. We then introduce our tuning techniques and compare them to those already available. In later sections we will apply our techniques to the much more general class of quasi additive algorithms [16, 22] ....

....We call a sequence S = x 1 ; y 1 ) x 2 ; y 2 ) of instances and labels processed by the algorithm in a run a trial sequence. To analyze these algorithms, we adopt a well established mathematical model which is a generalization of a learning model introduced by Littlestone and Warmuth [23, 24, 26] and Angluin [1] We are given a comparison class of 1 The learning rate in those papers is actually a covariance matrix. Introduction 4 predictors and a loss function L. Broadly speaking, the goal of A is to learn on the y the best o line predictor in the comparison class for the whole ....

[Article contains additional citation context not shown here]

Littlestone, N. (1989), Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms, PhD thesis, TR UCSC-CRL-89-11, University of California Santa Cruz.


The Relaxed Online Maximum Margin Algorithm - Li, Long (2000)   (17 citations)  (Correct)

....is the same as the previous algorithm, except that an update is made after any trial in which y t ( w t x t ) 1, not just after mistakes. 2.2. Upper bound on the number of mistakes made Now we prove a bound on the number of mistakes made by ROMMA. As in previous mistake bound proofs (e.g. [26]) we will show that mistakes result in an increase in a measure of progress , and then appeal to a bound on the total possible progress. Our proof will use the squared length of w t as its measure of progress. We begin with a couple of properties of ROMMA. Our analysis can proceed without ....

Littlestone, N.: 1989b, `Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms'. Ph.D. thesis, UC Santa Cruz.


The Curse of Dimensionality and the Perceptron Algorithm - Kivinen, Warmuth (1995)   (Correct)

....given by the algorithm s current weight vector w t and threshold t . The algorithm then receives a binary outcome and may update its weight vector and threshold to w t 1 and t 1 . If the outcome differs from the prediction, we say that the algorithm made a mistake. Following Littlestone [Lit89, Lit88], our goal is to minimize the total number of mistakes that the learning algorithm makes for certain sequences of trials. The standard on line algorithm for learning with linear threshold functions is the simple Perceptron algorithm of Rosenblatt [Ros58] An alternate algorithm called Winnow was ....

....the total number of mistakes that the learning algorithm makes for certain sequences of trials. The standard on line algorithm for learning with linear threshold functions is the simple Perceptron algorithm of Rosenblatt [Ros58] An alternate algorithm called Winnow was introduced by Littlestone [Lit89, Lit88]. To see how the algorithms work, consider a binary vector x t 2 f 0; 1 g N as an instance, and assume that the algorithm predicted 0 while the outcome was 1. Then both algorithms increment those weights w t;i for which the corresponding input x t;i was 1, but do not change the weights w t;i ....

[Article contains additional citation context not shown here]

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-8911, University of California Santa Cruz, 1989.


The Perceptron algorithm vs. Winnow: linear vs. logarithmic.. - Kivinen, Warmuth (1997)   (Correct)

....the algorithm produces its prediction y t using its current hypothesis. The algorithm then receives a binary outcome y t and may update its weight vector and threshold to w t 1 and t 1 . If the outcome differs from the prediction, we say that the algorithm made a mistake. Following Littlestone [7, 6], our goal is to minimize the total number of mistakes that the learning algorithm makes for certain sequences of trials. The standard on line algorithm for learning with linear classifiers is the simple Perceptron algorithm of Rosenblatt [12] An alternate algorithm called Winnow was introduced ....

....to minimize the total number of mistakes that the learning algorithm makes for certain sequences of trials. The standard on line algorithm for learning with linear classifiers is the simple Perceptron algorithm of Rosenblatt [12] An alternate algorithm called Winnow was introduced by Littlestone [7, 6]. To see how the algorithms work, Introduction 3 consider a binary vector x t 2 f 0; 1 g N as an instance, and assume that the algorithm predicted 0 while the outcome was 1. Then both algorithms increment those weights w t;i for which the corresponding input x t;i was 1. These weights are called ....

[Article contains additional citation context not shown here]

N. Littlestone, Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms, PhD thesis, Report UCSC-CRL-89-11, University of California, Santa Cruz (1989).


Worst-case Quadratic Loss Bounds for Prediction Using.. - Cesa-Bianchi, Long.. (1996)   (Correct)

....smoothing, inner product spaces, computational learning theory, on line learning, linear systems, worst case loss bounds. Introduction 3 1 Introduction In this paper we analyze algorithms in the on line prediction model. This model was introduced by Angluin [Ang88] and Littlestone [Lit88, Lit89] Unlike other settings, where the predictor s goal is to estimate a set of parameters in a nearly optimal way with respect to some criterion, the goal in this model is to generate predictions, in a sequential fashion, so as to minimize the total (sum) loss over the whole sequence of examples. ....

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL89 -11, University of California Santa Cruz, 1989.


Entropy Estimation - Bercher, Vignat (1996)   (78 citations)  (Correct)

.... approaches where statistical assumptions about the distribution of the instances and the dependence of the outcomes on the instances are used in order to derive probabilistic loss bounds for the prediction algorithm [WS85, Hay91] The research reported in this paper was inspired by Littlestone [Lit89b, Lit88] who proved worst case bounds for the case when the comparison class consists of Boolean monomials, or more generally linear threshold functions. In this case it was assumed that the components of the instances, as well as the predictions and the outcomes, were Boolean, and the total loss ....

....of distance from the weight vector w t of the algorithm to a comparison vector u at each update, it is possible to prove the kind of worst case loss bounds we consider here. This use of a distance measure for obtaining worst case loss bounds was pioneered by Littlestone s analysis of Winnow [Lit89b] which also employs a Introduction 6 variant of the relative entropy. Amari s [Ama94, Ama95] approach in using the relative entropy for deriving neural network learning algorithms is similar to the first use we have here for the distance measure. The distance term in the minimized function is ....

[Article contains additional citation context not shown here]

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL89 -11, University of California Santa Cruz, 1989.


The Relaxed Online Maximum Margin Algorithm - Li, Long (2000)   (17 citations)  (Correct)

....same as the previous algorithm, except that an update is made after any trial in which y t ( w t Delta x t ) 1, not just after mistakes. 2.2 Upper bound on the number of mistakes made Now we prove a bound on the number of mistakes made by ROMMA. As in previous mistake bound proofs (e.g. [8]) we will show that mistakes result in an increase in a measure of progress , and then appeal to a bound on the total possible progress. Our proof will use the squared length of w t as its measure of progress. First we will need the following lemmas. Lemma 1 On any run of ROMMA on linearly ....

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, UC Santa Cruz, 1989.


Direct and Indirect Algorithms for On-line Learning of.. - Helmbold, Panizza (1999)   (1 citation)  (Correct)

....indicating when the update function should be applied. The update policies considered in this paper are: 1) update after each trial, and 2) only update after trials where the algorithm makes an incorrect prediction. Algorithms with the latter policy are called mistake driven (or conservative) [Lit89,Lit95]. When learning monotone disjunctions, some algorithms keep one weight per disjunction (i.e. a total of n k weights) We call such algorithms direct algorithms since the weights directly encode the con dence in or likelihood of each individual disjunction. There are other algorithms that ....

....indirect algorithms since they indirectly encode their con dences in the disjunctions using O(1) weights per attribute. Surprisingly these more ecient algorithms learn disjunctions almost as well as the direct algorithms. The rst such indirect algorithm was Littlestone s Winnow algorithm [Lit88,Lit89]. In this paper we are primarily interested in a performance criteria that makes no probabilistic assumptions about how the data is generated. On the contrary the examples can be chosen by an adversary and the goal is to make relatively few mistakes compared to the number of mistakes made by the ....

[Article contains additional citation context not shown here]

Littlestone, N.: Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz (1989)


Exponentiated Gradient Versus Gradient Descent for Linear.. - Kivinen, Warmuth (1994)   (78 citations)  (Correct)

.... approaches where statistical assumptions about the distribution of the instances and the dependence of the outcomes on the instances are used in order to derive probabilistic loss bounds for the prediction algorithm [WS85, Hay91] The research reported in this paper was inspired by Littlestone [Lit89b, Lit88] who proved worst case bounds for the case when the comparison class consists of Boolean monomials, or more generally linear threshold functions. In this case it was assumed that the components of the instances, as well as the predictions and the outcomes, were Boolean, and the total loss ....

....of distance from the weight vector w t of the algorithm to a comparison vector u at each update, it is possible to prove the kind of worst case loss bounds we consider here. This use of a distance measure for obtaining worst case loss bounds was pioneered by Littlestone s analysis of Winnow [Lit89b] which also employs a variant of the relative entropy. Amari s [Ama94] approach in using the relative entropy for deriving neural network learning algorithms is similar to the first use 3 we have here for the distance measure. The distance term in the minimized function U is also somewhat ....

[Article contains additional citation context not shown here]

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California, Santa Cruz, 1989.


Learning Linear Threshold Functions in the Presence of.. - Bylander (1994)   (10 citations)  (Correct)

....is in essence a statistical query algorithm because it uses all the examples for each update. There has been much work on learning linear threshold functions for other kinds of noise and under different optimization criteria [4] Perhaps of most relevance to this paper is previous work by [9, 10, 2]. 9, 10] show how the Winnow algorithm is affected by a fixed number of classification and attribute errors; in essence, there is a limit on how much the mistake bound can increase with each error. 10] shows that if a subset of the attributes are conditionally independent, 2 then the Winnow ....

....essence a statistical query algorithm because it uses all the examples for each update. There has been much work on learning linear threshold functions for other kinds of noise and under different optimization criteria [4] Perhaps of most relevance to this paper is previous work by [9, 10, 2] [9, 10] show how the Winnow algorithm is affected by a fixed number of classification and attribute errors; in essence, there is a limit on how much the mistake bound can increase with each error. 10] shows that if a subset of the attributes are conditionally independent, 2 then the Winnow algorithm ....

[Article contains additional citation context not shown here]

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Univ. of Calif., Santa Cruz, California, 1989.


Exponentiated Gradient Versus Gradient Descent for Linear.. - Kivinen, Warmuth (1995)   (78 citations)  (Correct)

.... approaches where statistical assumptions about the distribution of the instances and the dependence of the outcomes on the instances are used in order to derive probabilistic loss bounds for the prediction algorithm [WS85, Hay91] The research reported in this paper was inspired by Littlestone [Lit89b, Lit88] who proved worst case bounds for the case when the comparison class consists of Boolean monomials, or more generally linear threshold functions. In this case it was assumed that the components of the instances, as well as the predictions and the outcomes, were Boolean, and the total loss ....

....of distance from the weight vector w t of the algorithm to a comparison vector u at each update, it is possible to prove the kind of worst case loss bounds we consider here. This use of a distance measure for obtaining worst case loss bounds was pioneered by Littlestone s analysis of Winnow [Lit89b] which also employs a variant of the relative entropy. Amari s [Ama94, Ama95] approach in using the relative entropy for deriving neural network learning algorithms is similar to the first use we have here for the distance measure. The distance term in the minimized function is also somewhat ....

[Article contains additional citation context not shown here]

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz, 1989.


On-line Prediction and Conversion Strategies - Cesa-Bianchi, Freund.. (1994)   (4 citations)  (Correct)

....mistakes made by the BW algorithm independent of . There are two versions of the Halving algorithm: one that discards all inconsistent experts in each trial and one that does this only in trials when the Halving algorithm makes a mistake (such algorithms are called conservative by Littlestone [15]) Both versions of the Halving algorithm have the same worst case mistake bound (log N ) so nothing is lost by making the Version Space algorithm conservative. The Binomial Weighting algorithm is the implementation of the conservative Version Space algorithm with binomial weights and is ....

....incorrect. This is achieved by resetting the state of A after each trial in which A predicts correctly to the state of A before the trial. This conversion does not increase the worst case number of mistakes on the subsequence closed set Sigma. The converted algorithm is called conservative (see [15]) For the rest of this section we shall always assume that the set of sequences is subsequence closed and that the prediction algorithm is conservative. Algorithm A is allowed to perform arbitrarily badly if given an instance outcome sequence that is not in Sigma. For example, if Sigma = X ....

[Article contains additional citation context not shown here]

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California at Santa Cruz, 1989.


Computational Sample Complexity and Attribute-Efficient Learning - Servedio (2000)   (Correct)

.... attribute efficient learning algorithms to real world problems such as calendar scheduling [3] text categorization [8] and context sensitive spelling correction [11] Attribute efficiency has chiefly been studied in the on line mistake bound model of concept learning which was introduced in [22, 23]. In this model learning proceeds in a series of trials, where in each trial the learner is given an unlabelled boolean example x 2 f0; 1g n and must predict the value c(x) after each prediction the learner is told the true value of c(x) and can update its hypothesis. The mistake bound of a ....

N. Littlestone, "Mistake bounds and logarithmic linear-threshold learning algorithms," Ph.D. thesis, Technical Report UCSC-CRL-89-11, Univ. of Calif., Santa Cruz, 1989.


On PAC Learning Using Winnow, Perceptron, and a Perceptron-Like.. - Servedio   (Correct)

....N00014 96 1 0550. 37] was introduced. Even today, learning problems involving linear threshold functions continue to be the subject of intensive research in both the applied and theoretical machine learning communities. The classical Perceptron algorithm [37] and Littlestone s Winnow algorithm [25, 26] are two algorithms for learning linear threshold functions which have been studied extensively in the on line mistake bound model [6, 15, 24, 27, 30, 32, 42] In this model the learning algorithm sequentially makes predictions on examples as they are received, using a hypothesis which it can ....

....false negative prediction, for all i set w i ff x i w i . It should be noted that in this form, Winnow is only capable of expressing positive threshold functions as its hypotheses. This limitation can be easily overcome, however, by using various simple transformations on the input (see [25] [26]) 2.3 PAC LEARNING USING ON LINE LEARNING ALGORITHMS In Valiant s PAC learning model [41] the learning algorithm has access to an example oracle EX(c; D) which, in one time step, provides a labelled example hx; c(x)i where x is drawn from the distribution D on the example space X: The ....

N. Littlestone. Mistake Bounds and Logarithmic Linear-Threshold Learning Algorithms. Ph.D. thesis, Technical Report UCSC-CRL-89-11, University of Calif., Santa Cruz, 1989.


Journal of Machine Learning Research 7 (2006) 1205--1230 .. - For Linear..   (Correct)

No context found.

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California Santa Cruz, 1989.


Toward Attribute Efficient Learning of Decision Lists and.. - Klivans, Servedio (2006)   (Correct)

No context found.

N. Littlestone. Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, University of California at Santa Cruz, 1989a. N. Littlestone. From online to batch learning. In Proceedings of the Second Annual Workshop on Computational Learning Theory, pages 269--284, 1989b. T. Mitchell. Generalization as search. Artificial Intelligence, 18:203--226, 1982.


Journal of Machine Learning Research 7 (2006) 551--585.. - Koby Crammer Crammer   (Correct)

No context found.

N. Littlestone. Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, U. C. Santa Cruz, March 1989.


Kernel Query By Committee (KQBC) - Ran Gilad-Bachrach Ranb (2003)   (Correct)

No context found.

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California Santa Cruz, 1989.


Kernel Query By Committee (KQBC) - Ran Gilad-Bachrach Ranb   (Correct)

No context found.

N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California Santa Cruz, 1989.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC