10 citations found. Retrieving documents...
COLLINS, M., SCHAPIRE, R., AND SINGER, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning 48, 1-3, 253--285.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Boosting as a Regularized Path to a Maximum Margin Classifier - Rosset, Zhu, Hastie (2003)   (3 citations)  (Correct)

....and l 1 constrained optimization. The two examples we have seen so far have used squared error loss, and we should ask ourselves whether this equivalence stretches beyond this loss. Figure 6 shows a similar result, but this time for the binomial log likelihood loss, C l . 12 0 500 1 2 34 56 78 910 1 2 3 4 5 6 7 8 9 . Stagwise 0 500 1 2 34 56 78 910 1 2 3 4 5 6 7 8 9 . # # # # # # # ## # # # # # # # ## Figure 5: Another example of the equivalence between the Lasso optimal solution path (left) and # boosting with ....

....optimization. The two examples we have seen so far have used squared error loss, and we should ask ourselves whether this equivalence stretches beyond this loss. Figure 6 shows a similar result, but this time for the binomial log likelihood loss, C l . 12 0 500 1 2 34 56 78 910 1 2 3 4 5 6 7 8 9 . Stagwise 0 500 1 2 34 56 78 910 1 2 3 4 5 6 7 8 9 . # # # # # # # ## # # # # # # # ## Figure 5: Another example of the equivalence between the Lasso optimal solution path (left) and # boosting with squared error loss. Note ....

[Article contains additional citation context not shown here]

Collins, M., Schapire, R.E. & Singer, Y. (2000). Logistic regression, AdaBoost and bregman distances. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory.


Multiplicative Updates for Classification by Mixture Models - Saul, Lee (2002)   (Correct)

....bound holds for arbitrary distributions, provided they are properly normalized: ij P = 1 for all n. The second term in the conditional log likelihood occurs with a minus sign, so for this term we require an upper bound. The same bounds can be used here as in derivations of iterative scaling[1, 2, 4, 13]. Note that the logarithm function is upper bounded by: log z z Gamma 1 for all z 0. We can therefore write: Gamma Gamma L Gamma = A 4 3 5 : 16) To further bound the right hand side of eq. 16) we make the following observation: though the exponentials ....

M. Collins, R. Schapire, and Y. Singer (2000). Logistic regression, adaBoost, and Bregman distances. In Proceedings of the Thirteenth Annual Conferenceon Computational Learning Theory.


Conditional Random Fields: Probabilistic Models for.. - Lafferty, McCallum.. (2001)   (47 citations)  (Correct)

.... 1997) Typically, boosting is applied to classification problems with a small, fixed number of classes; applications of boosting to sequence labeling have treated each label as a separate classification problem (Abney et al. 1999) However, it is possible to apply the parallel update algorithm of Collins et al. 2000) to optimize the per sequence exponential loss. This requires a forward backward algorithm to compute efficiently certain feature expectations, along the lines of Algorithm T, except that each feature requires a separate set of forward and backward accumulators. Another attractive aspect of CRFs ....

....generative model, which can only model local dependencies, to produce a list of candidates, and then use a more global discriminative model to rerank those candidates. This approach is standard in large vocabulary speech recognition (Schwartz Austin, 1993) and has also been proposed for parsing (Collins, 2000). However, these methods fail when the correct output is pruned away in the first pass. Closest to our proposal are gradient descent methods that adjust the parameters of all of the local classifiers to minimize a smooth loss function (e.g. quadratic loss) combining loss terms for each label. If ....

[Article contains additional citation context not shown here]

Collins, M., Schapire, R., & Singer, Y. (2000). Logistic regression, AdaBoost, and Bregman distances. Proc. 13th COLT.


Conditional Random Fields: Probabilistic Models for.. - Lafferty, McCallum.. (2001)   (47 citations)  (Correct)

....) arg min N X i=1 X y e h(x (i) y) e h(x (i) y (i) N = arg min N X i=1 1 p (y (i) j x (i) This optimization problem cannot be derived as a sampling likelihood for a statistical model. However, it is possible to apply the parallel update algorithm of Collins et al. 2000) to our task. This requires a forward backward algorithm to compute efficiently certain feature expectations, along the lines of Algorithm T, except that each feature requires a separate set of forward and backward accumulators. Another attractive aspect of CRFs is that one can implement ....

....permissive generative model, which can only model local dependencies, to produce a list of candidates, and then use a more global model to rerank those candidates. This is the standard approach in large vocabulary speech recognition (Schwartz Austin, 1993) and has also been proposed for parsing (Collins, 2000). However, these methods are sub optimal in that they are subject to failing to find the correct output because it was pruned away in the first pass. Closest to our proposal are gradient descent methods that adjust the parameters of all of the local classifiers to minimize a smooth loss function ....

Collins, M., Schapire, R., & Singer, Y. (2000). Logistic regression, AdaBoost, and Bregman distances. Proceedings of the Thirteenth Annual Conference on Computational Learning Theory.


Automatic Photo Pop-up - Hoiem, Efros, Hebert (2005)   (Correct)

No context found.

COLLINS, M., SCHAPIRE, R., AND SINGER, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning 48, 1-3, 253--285.


Automatic Photo Pop-up - Hoiem, Efros, Hebert (2005)   (Correct)

No context found.

COLLINS, M., SCHAPIRE, R., AND SINGER, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning 48, 1-3, 253--285.


Constituent Parsing by Classification - Turian, Melamed (2005)   (Correct)

No context found.

Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1-3), 253-- 285.


Discriminative Reranking for Natural Language Parsing - Collins, Koo (2000)   (35 citations)  (Correct)

No context found.

Collins, Michael, Robert E. Schapire, and Yoram Singer. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1/2/3).


Boosting with the L_2-Loss: Regression and Classification - Bühlmann, Yu (2002)   (Correct)

No context found.

Collins, M., Schapire, R.E. and Singer, Y. (2000). Logistic regression, AdaBoost and Bregman distances. Proc. Thirteenth Annual Conference Computational Learning Theory.


Boosting with the L_2-Loss: Regression and Classification - Bühlmann, Yu (2001)   (Correct)

No context found.

Collins, M., Schapire, R.E. and Singer, Y. (2000). Logistic regression, AdaBoost and Bregman distances. Proc. Thirteenth Annual Conference Computational Learning Theory.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC