88 citations found. Retrieving documents...
J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--64, January 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Multiplicative Updates for Nonnegative Quadratic Programming.. - Sha, Saul, Lee (2002)   (5 citations)  (Correct)

....17] Gradient based methods are the simplest possible approach, but their convergence depends on careful selection of the learning rate, as well as constant attention to the nonnegativity constraints which may not be naturally enforced. Multiplicative updates based on exponentiated gradients (EG)[5, 10] have been investigated as an alternative to traditional gradient based methods. Multiplicative updates are naturally suited to sparse nonnegative optimizations, but EG updates like their additive counterparts suffer the drawback of having to choose a learning rate. Subset selection methods ....

....as in eq. 2) We will refer to the learning algorithm for hard margin SVMs based on these updates as Multiplicative Margin Maximization (M ) It is worth comparing the properties of these updates to those of other approaches. Like multiplicative updates based on exponentiated gradients (EG)[5, 10], the M updates are well suited to sparse nonnegative optimizations ; unlike EG updates, however, they do not involve a learning rate, and they come with a guarantee of monotonic improvement. Like the updates for Sequential Minimal Optimization (SMO) 15] the M updates have a simple ....

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--63, 1997.


Choosing feature sets for training and testing.. - Ahmad, Vrusias, Ledford (2001)   (1 citation)  (Correct)

....text categorisation. The first group, which includes David Lewis and colleagues, is using supervised learning algorithms for training neural networks, particularly WidrowHoff s error correction learning paradigm and its recent variant, the exponentiated gradient method, due to Kivinen and Warmuth [6]. Lewis [7] used the TREC data set, containing Associated Press (AP) newswire texts and a collection of medical abstracts, to compare Widrow Hoff and exponentiated gradient methods with a conventional information retrieval technique (the Rocchio learning algorithm) Lewis has shown that both the ....

Kivinen, J & Warmuth, MK. Exponentiated gradient versus gradient descent for linear predictors. Technical Report No. UCSC-CRL-94-16, 1994. Santa Cruz, Basking Center for Computer Engineering and Information Sciences.


Regret Bounds for Prediction Problems - Gordon (1999)   (2 citations)  (Correct)

.... theorems will unify results from classical statistics (inference in exponential families and generalized linear models) with those from computational learning theory (weighted majority, aggregating algorithm, exponentiated gradient) This regret bound framework has been studied before in [LW92, KW97, KW96, Vov90, CBFH 95] among others. Also, some of our results are similar to results from classical statistics such as the Cramer Rao variance bound [SO91] Our theorems are more general than each of these previous results in at least one of the following ways. First, they apply to more ....

....Logarithmic ln a exp w Normalized exponential expa i i expa i i w i ln w i Gamma 1 ffi(wj i w i = 1) Figure 4: Some examples of link functions. Some examples of GGD algorithms are ordinary gradient descent, the perceptron learning rule, and the Exponentiated Gradient algorithm of [KW97] We will examine some of these algorithms in more detail below. But first, we will prove regret bounds for a class of algorithms that includes GGD. 6 General regret bounds 6.1 Preliminaries In many commonMAP algorithms, each individual loss function can be written as a Bregman divergence. ....

[Article contains additional citation context not shown here]

Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--63, 1997. Preliminary version appeared as tech report UCSC-CRL-94-16; extended abstract appeared in 27th STOC.


Analysis of Two Gradient-based Algorithms for On-line Regression - Dsi (1999)   (Correct)

....trials, the difference between its cumulative loss (i.e. the sum of the losses incurred in each trial) and the corresponding cumulative loss of a reference predictor , whose predictions are kept hidden from the master. Using this sequential prediction model, we will show (extending results from [3, 10, 12]) that a well known algorithm for linear regression, Gradient Descent, and a recently proposed variant, Exponentiated Gradient, have a reasonably good performance for a wide range of loss functions even when the regression problem is highly nonlinear and the data are generated with no statistical ....

....range of loss functions even when the regression problem is highly nonlinear and the data are generated with no statistical assumption. As a further motivation for the study of this prediction model, we point out the fact that any good sequential prediction algorithm can be efficiently transformed [2, 12, 15] into an algorithm that performs well in the more traditional statistical (or batch ) frameworks, like those studied in [5, 9] We use the sequential prediction model to analyze two types of on line regression problems. In the linear regression problem the master algorithm predicts, in each trial ....

[Article contains additional citation context not shown here]

J. Kivinen and M.K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--63, 1997. Proof of Lemma 2. 17


The Exponentiated Subgradient Algorithm for Heuristic.. - Schuurmans, Southey.. (2001)   (12 citations)  (Correct)

....from being easily generalizable beyond SAT. 3 SDF uses a different penalty for its primal search. Multiplicative versus additive updates: The SDF procedure updates multiplicatively rather than additively, in an analogy to the work on multiplicative updates in machine learning theory [Kivinen and Warmuth, 1997] . A multiplicative update is naturally interpreted as following an exponentiated version of the subgradient; that is, instead of using the traditional additive update L=1mV e one uses L=L given the vector of penalized violation values e . Below we compare additive and multiplicative ....

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Infor. Comput., 132:1--63, 1997.


Evaluating Topic-Driven Web Crawlers - Menczer, Pant, Srinivasan, Ruiz (2001)   (11 citations)  (Correct)

....newly crawled pages. A positively classified page is viewed as a good page for the topic. In this sense this measure may be viewed as similar to precision where content based relevance is decided by the classifier. We use Widrow Ho# (WH) Exponentiated Gradient (EG) and Rocchio classifiers [23, 12, 11] with feature selection using Correlation Coe#cient [16] to select the best 50 features for each topic. The optimal threshold is set by maximizing the F1 score [22] on the training set. Due to limited space we refer the reader to [14, 25] for details on the classifiers. It may be observed that ....

J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear p redictors. Technical Report Technical Report UCSC-CRL-94-16, Baking Center for Computer Engineering & Information Scien ces; University of California, Santa Cruz, CA, 1994.


Using Machine Learning To Improve Information Access - Sahami (1999)   (15 citations)  (Correct)

....representations seems to provide the best performance in their experiments. On a related note, Lewis et al. [111] compare a number of learning methods for linear text classifiers including Rocchio s algorithm, the Widrow Hoff (WH) update rule [170] and the exponentiated gradient (EG) algorithm [89]. They find that both WH and EG yield consistently superior results to Rocchio. Furthermore, they point CHAPTER 4. RELATED WORK IN INFORMATION ACCESS 52 out that since EG seems to drive many of the linear discriminant coefficients to zero, effectively reducing the number of features used in making ....

Kivinen, J., and Warmuth, M. Exponentiated gradient versus gradient descent for linear predictors. Tech. Rep. UCSC-CRL-94-16, Basking Center for Computer Engineering and Information Sciences; University of California, Santa Cruz, 1994.


The Exponentiated Subgradient Algorithm for Heuristic.. - Schuurmans, Southey.. (2001)   (12 citations)  (Correct)

....optimal nature of the dual updates, the hinge penalty appears to retain an advantage over the linear penalty. Multiplicativeversus additive updates: The SDF procedure updates y multiplicatively rather than additively, in an analogy to the work on multiplicative updates in machine learning theory [Kivinen and Warmuth, 1997] . A multiplicative update is naturally interpreted as following an exponentiated version of the subgradient; that is, instead of using the traditional additive update y 0 = y (v) one uses y 0 = y (v) 1, given the vector of penalized violation values (v) Below we compare both ....

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Infor. Comput., 132:1--63, 1997.


Convergence rates of the Voting Gibbs classifier, with.. - Ng, Jordan   (Correct)

....1. Plot of error vs. total number of features for an optimal classifier that knows which feature is relevant to a classification decision (solid) and Voting Gibbs algorithms using different priors (dash and dash dot) Details are provided in Section 5. setting (Ng, 1998; Littlestone, 1988; Kivinen and Warmuth, 1994). That this result is not merely of theoretical interest is demonstrated by the empirical results shown in Figure 1. These results, which are described in more detail in Section 5, show classification error rates in an experiment in which one feature is relevant to a classification decision. The ....

....selection, and since it is only logarithmic in f , the total number of features, it means that Bayesian feature selection using the particular prior described earlier is very insensitive to the presence of irrelevant features. This result also recovers the best known such rates (Littlestone, 1988; Kivinen Warmuth, 1994; Ng, 1998) and has sample complexity that beats that of the common wrapper model (Kohavi John, 1997) feature selection algorithm (see the analysis in Ng, 1998) Indeed, the logarithmic dependence suggests that we can, for instance, square the total number of features, and need only twice ....

Kivinen, J., & Warmuth, M. K. (1994). Exponentiated gradient versus gradient descent for linear predictors (Technical Report UCSC-CRL-94-16). Univ. of California Santa Cruz, Computer Research Laboratory.


Learned Text Categorization By Backpropagation Neural Network - Yin, SAVIO   (Correct)

....model. Lewis et al. 33] studied the Adaline (see section 2.5.3) as the classifier model for text categorization. Three different training methods were compared, namely the Rocchio algorithm [50] the Widrow Hoff algorithm or the delta rule [61] and the Kivinen and Warmuth s EG algorithm [24] which is an extension to the delta rule. Batch training was used for the Rocchio Algorithm to update the weights of the Adaline by taking into account the whole training set at once, while online training was used for the other two algorithms to perform weight update by running through the ....

J. Kivinen and M. K. Warmuth, "Exponentiated gradient versus gradient descent for linear predictors." Technical Report UCSC-CRL-94-16, Basking Center for Computer Engineering and Information Sciences, University of California, Santa Cruz, 1994.


Journal of Machine Learning Research 7 (2006) 551--585.. - Koby Crammer Crammer   Self-citation (Warmuth)   (Correct)

No context found.

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--64, January 1997.


Merl A Mitsubishi Electric Research Laboratory - Http Www Merl (1996)   (Correct)

No context found.

J. Kivinen and M. K. Warmuth. Exponentiated gradientversus gradient descent for linear predictors. In ACM Symp. on the Theory of Computing, 1995.


Algorithmic Analysis And Implementation Of A Novel Natural - Gradient Adaptive Filter (2003)   (Correct)

No context found.

J. Kivinen and M. K. Warmuth, "Exponentiated gradient versus gradient descent for linear predictors," Information and Computation, vol. 132, no. 1, pp. 1-- 64, Jan. 1997.


Exploiting Sparsity in Adaptive Filters - Martin, Sethares, Williamson.. (2002)   (1 citation)  (Correct)

No context found.

J. Kivinen and M. K. Warmuth, "Exponentiated gradient versus gradient descent for linear predictors," Inform. Comput., vol. 132, no. 1, pp. 1--64, Jan. 1997.


Style Mining of Electronic Messages for Multiple Authorship .. - Argamon, Saric, al. (2003)   (1 citation)  (Correct)

No context found.

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--63, 1997.


Combining Machine Learning and Hierarchical Structures for Text.. - Ruiz (2001)   (1 citation)  (Correct)

No context found.

J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Technical Report Technical Report UCSC-CRL-94-16, Baking Center for Computer Engineering & Information Sciences; University of California, Santa Cruz, CA, 1994.


Biologically Inspired Modular Neural Networks - Azam (2000)   (Correct)

No context found.

J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--63, 1997.


Smooth epsilon-Insensitive Regression by Loss Symmetrization - Dekel, Shalev-Shwartz.. (2003)   (Correct)

No context found.

J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--64, January 1997.


Multiplicative Updatings for Support-Vector Learning - Cristianini, Campbell.. (1998)   (Correct)

No context found.

J. Kivinen and M. Warmuth, Exponentiated Gradient Versus Gradient Descent for Linear Predictors, Journal of Information and Computation, vol. 132, no. 1, pp. 1-64, 1997


Online Independent Component Analysis with Local.. - Schraudolph.. (2000)   (Correct)

No context found.

J. Kivinen and M. K. Warmuth, \Exponentiated gradient versus gradient descent for linear predictors", Tech. Rep. UCSC-CRL-94-16, University of California, Santa Cruz, June 1994.


Large Margin Methods for Structured.. - Bartlett..   (Correct)

No context found.

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--63, 1997.


Evaluating Topic-Driven Web Crawlers - Menczer, Pant, Srinivasan, Ruiz (2001)   (11 citations)  (Correct)

No context found.

J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Technical Report Technical Report UCSC-CRL-94-16, Baking Center for Computer Engineering & Information Sciences; University of California, Santa Cruz, CA, 1994.


Online Convex Programming and Generalized Infinitesimal Gradient .. - Zinkevich (2003)   (6 citations)  (Correct)

No context found.

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132:1-64, 1997.


Online Passive-Aggressive Algorithms - Crammer, Dekel, Shalev-Shwartz.. (2003)   (Correct)

No context found.

J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--64, January 1997.


Unsupervised Neural Learning On Lie Group - Fiori (2002)   (Correct)

No context found.

J. Kivinen and M. Warmuth 1997, "Exponentiated gradient versus gradient descent for linear predictors, " Information and Computation 132, 1--64.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC