| COLLINS, M., SCHAPIRE, R., AND SINGER, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning 48, 1-3, 253--285. |
....and l 1 constrained optimization. The two examples we have seen so far have used squared error loss, and we should ask ourselves whether this equivalence stretches beyond this loss. Figure 6 shows a similar result, but this time for the binomial log likelihood loss, C l . 12 0 500 1 2 34 56 78 910 1 2 3 4 5 6 7 8 9 . Stagwise 0 500 1 2 34 56 78 910 1 2 3 4 5 6 7 8 9 . # # # # # # # ## # # # # # # # ## Figure 5: Another example of the equivalence between the Lasso optimal solution path (left) and # boosting with ....
....optimization. The two examples we have seen so far have used squared error loss, and we should ask ourselves whether this equivalence stretches beyond this loss. Figure 6 shows a similar result, but this time for the binomial log likelihood loss, C l . 12 0 500 1 2 34 56 78 910 1 2 3 4 5 6 7 8 9 . Stagwise 0 500 1 2 34 56 78 910 1 2 3 4 5 6 7 8 9 . # # # # # # # ## # # # # # # # ## Figure 5: Another example of the equivalence between the Lasso optimal solution path (left) and # boosting with squared error loss. Note ....
[Article contains additional citation context not shown here]
Collins, M., Schapire, R.E. & Singer, Y. (2000). Logistic regression, AdaBoost and bregman distances. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory.
....bound holds for arbitrary distributions, provided they are properly normalized: ij P = 1 for all n. The second term in the conditional log likelihood occurs with a minus sign, so for this term we require an upper bound. The same bounds can be used here as in derivations of iterative scaling[1, 2, 4, 13]. Note that the logarithm function is upper bounded by: log z z Gamma 1 for all z 0. We can therefore write: Gamma Gamma L Gamma = A 4 3 5 : 16) To further bound the right hand side of eq. 16) we make the following observation: though the exponentials ....
M. Collins, R. Schapire, and Y. Singer (2000). Logistic regression, adaBoost, and Bregman distances. In Proceedings of the Thirteenth Annual Conferenceon Computational Learning Theory.
.... 1997) Typically, boosting is applied to classification problems with a small, fixed number of classes; applications of boosting to sequence labeling have treated each label as a separate classification problem (Abney et al. 1999) However, it is possible to apply the parallel update algorithm of Collins et al. 2000) to optimize the per sequence exponential loss. This requires a forward backward algorithm to compute efficiently certain feature expectations, along the lines of Algorithm T, except that each feature requires a separate set of forward and backward accumulators. Another attractive aspect of CRFs ....
....generative model, which can only model local dependencies, to produce a list of candidates, and then use a more global discriminative model to rerank those candidates. This approach is standard in large vocabulary speech recognition (Schwartz Austin, 1993) and has also been proposed for parsing (Collins, 2000). However, these methods fail when the correct output is pruned away in the first pass. Closest to our proposal are gradient descent methods that adjust the parameters of all of the local classifiers to minimize a smooth loss function (e.g. quadratic loss) combining loss terms for each label. If ....
[Article contains additional citation context not shown here]
Collins, M., Schapire, R., & Singer, Y. (2000). Logistic regression, AdaBoost, and Bregman distances. Proc. 13th COLT.
....) arg min N X i=1 X y e h(x (i) y) e h(x (i) y (i) N = arg min N X i=1 1 p (y (i) j x (i) This optimization problem cannot be derived as a sampling likelihood for a statistical model. However, it is possible to apply the parallel update algorithm of Collins et al. 2000) to our task. This requires a forward backward algorithm to compute efficiently certain feature expectations, along the lines of Algorithm T, except that each feature requires a separate set of forward and backward accumulators. Another attractive aspect of CRFs is that one can implement ....
....permissive generative model, which can only model local dependencies, to produce a list of candidates, and then use a more global model to rerank those candidates. This is the standard approach in large vocabulary speech recognition (Schwartz Austin, 1993) and has also been proposed for parsing (Collins, 2000). However, these methods are sub optimal in that they are subject to failing to find the correct output because it was pruned away in the first pass. Closest to our proposal are gradient descent methods that adjust the parameters of all of the local classifiers to minimize a smooth loss function ....
Collins, M., Schapire, R., & Singer, Y. (2000). Logistic regression, AdaBoost, and Bregman distances. Proceedings of the Thirteenth Annual Conference on Computational Learning Theory.
No context found.
COLLINS, M., SCHAPIRE, R., AND SINGER, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning 48, 1-3, 253--285.
No context found.
COLLINS, M., SCHAPIRE, R., AND SINGER, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning 48, 1-3, 253--285.
No context found.
Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1-3), 253-- 285.
No context found.
Collins, Michael, Robert E. Schapire, and Yoram Singer. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1/2/3).
No context found.
Collins, M., Schapire, R.E. and Singer, Y. (2000). Logistic regression, AdaBoost and Bregman distances. Proc. Thirteenth Annual Conference Computational Learning Theory.
No context found.
Collins, M., Schapire, R.E. and Singer, Y. (2000). Logistic regression, AdaBoost and Bregman distances. Proc. Thirteenth Annual Conference Computational Learning Theory.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC