39 citations found. Retrieving documents...
R. E. Schapire and Y. Singer, \Improved boosting algorithms using con dencerated predictions," ####### ########,vol. 37, no. 3, pp. 1-40, 1999.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

A Hybrid Filter/Wrapper Approach of Feature Selection using.. - Sebban, Nock (2001)   (Correct)

.... entropy becomes equivalent to Gini s impurity criterion, used in decision tree induction to evaluate the quality of a tree node in the well known CART TM package [24] It is worthwhile to remark that Gini s criterion, as well as more recent criteria such as Schapire Singer s Z criterion [25] have been rigorously proven to be very ecient measures to grow decision trees, in particular more accurate than the accuracy itself [26] Furthermore, in our case, a convenient statistical test, which we now describe, allows to estimate with con dence whether a feature subset can be preferred to ....

....should be done directly by maximizing the accuracy s increasing using a highly concave criterion, like Gini s or the entropy. In addition, 26] provide an optimal criterion which should give the maximal increase of the accuracy. This criterion was later used in the AdaBoost boosting algorithm of [25], and we refer to it as Schapire Singer s Z criterion. It is a function from [0; 1] to [0; 1] 1 ; k ) Z( 1 ; k ) The results of [26] along with our results on comparing the RCG s and the accuracy s optimization on the 1 NN graph (that of the nal classi er) are ....

[Article contains additional citation context not shown here]

R. E. Schapire, Y. Singer. Improved boosting algorithms using con dencerated predictions. In Proceedings of the 11 Computational Learning Theory (1998) pp. 80-91.


A PAC bound for mixture discriminants - Seeger (2000)   (Correct)

....paradigm, as shown in section 5. Although MED already proved quite successful in practice [9] 10] this paper attempts, to our knowledge, the rst learning theoretical foundation of this paradigm. Further work will include the application of our bound to popular techniques such as AdaBoost [5] [15] and variants. This could be done by rst relating these algorithms to the MRED framework and then use the comments in section 5. AdaBoost lacks the notion of a prior distribution over hypothesis space, complexity regularization mainly works because of the assumption of very weak base hypotheses, ....

R. Schapire and Yoram Singer. Improved boosting algorithms using con dence-rated predictions. In Proceedings of the 11th annual conference on computational learning theory, 1998.


Statistical Behavior and Consistency of Classification Methods.. - Zhang (2001)   (17 citations)  (Correct)

....of (1) is typically NP hard. Recently a number of methods have been proposed to alleviate this computational problem. The basic idea is to minimize a convex upper bound of the classi cation error function I(p; y) For example, AdaBoost [6] employs the exponential loss function exp( py) [2, 3, 14, 7], and support vector machines (SVMs) employ a loss function of the form max(1 py; 0) 16] In general, let be a one variable convex function, we may consider the (approximate) minimization in a function class C with respect to the following empirical risk: f(X i )Y i ) 3) which can be ....

Robert E. Schapire and Yoram Singer. Improved boosting algorithms using con dencerated predictions. Machine Learning, 37:297-336, 1999.


A Robust Boosting Algorithm - Nock, Lefaucheur   (Correct)

.... theory, the rst evidences that the practical importance of Boosting is much more than possible (quote from [Kea88] and can actually be of great help to solve challenging problems, culminates in the paper of [FS97] and its algorithm, AdaBoost, and more recently in re ned analyzes of AdaBoost [FHT00,SS98]. Interestingly, this approach follows the voting approach proned by [Kea88] but with a powerful reweighting scheme, that Arcing further studied [Bre96b] This scheme is a stepwise multiplicative update of the training example s weights, so as to bring higher importance to those that have been ....

....Arcing further studied [Bre96b] This scheme is a stepwise multiplicative update of the training example s weights, so as to bring higher importance to those that have been hard to classify for the last hypothesis. Most approaches derived from Boosting are voting procedures (see e.g. the papers [Bre96b,FS97,SS98]) and more generally many ensemble methods are also voting procedures [Bre96a] A set of voters is grown, which is a way to cast the initial examples onto a new representation space of di erent dimension, space into which each hypothesis built de nes a new variable. Afterwards, a linear ....

[Article contains additional citation context not shown here]

R. E. Schapire and Y. Singer. Improved boosting algorithms using con dencerated predictions. In Proceedings of the 11 International Conference on Computational Learning Theory, pages 80-91, 1998.


Learning Classification RBF Networks by Boosting - Diez, González   (Correct)

....xn ) In this case it is interesting the possibility of using the distance between each series independently. Each base classi er was only one literal, and their result was simply true or false. One of the improvements to the original AdaBoost algorithm is the use of con dence based predictions [22], where each base classi er also returns, for each example, a con dence (a real number) on its prediction. A natural question is how to combine con dence based predictions with similarity literals. The rst option was to consider a given literal as a boolean attribute and, using the methods of ....

....where each base classi er also returns, for each example, a con dence (a real number) on its prediction. A natural question is how to combine con dence based predictions with similarity literals. The rst option was to consider a given literal as a boolean attribute and, using the methods of [22] for domain partitioning weak classi ers, to assign a con dence value corresponding to the values true and false of the literal. Nevertheless, when using distance literals it is natural to use, somehow, the value of the distance for the current example, to the reference example, and the ....

[Article contains additional citation context not shown here]

Robert E. Schapire and Yoram Singer. Improved boosting algorithms using con dence-rated predictions. In 11 Annual Conference on Computational Learning Theory (COLT 1998.


Boosting Interval Based Literals - Rodríguez, Alonso, Boström (2001)   (Correct)

....algorithms. Boosting binary stumps produces good results, but the e ect of using more complex base learners, such as decision trees or rules (of interval literals) must be studied. Currently the base learners only return the classi cation of the example. The use of con dence rated predictions [SS98] may however improve the method. Acknowledgements To the maintainers of the ML [BM98] and KDD [Bay99] UCI Repositories, and to all the donators of the used data sets. ....

Robert E. Schapire and Yoram Singer. Improved boosting algorithms using con dencerated predictions. In 11 Annual Conference on Computational Learning Theory (COLT-98), pages 80-91. ACM, 1998.


Margins and Combined Classifiers - Mason (1999)   (Correct)

....of misclassi ed training examples. Wethenshow that this general class of algorithms includes as special cases a number of popular and successful voting methods, including Freund and Schapire s AdaBoost [35] Schapire and Singer s extension of AdaBoost to combinations of real valued functions [61], Breiman s ARC X4 [15] and Friedman, Hastie and Tibshirani s LogitBoost [37] That is, all of these algorithms implicitly minimize some cost function of the margin by gradient descent. In Chapter 8 we examine a restriction of the MarginBoost algorithm, DOOM II, designed to iteratively construct ....

....decision trees, by virtue of their representation as a voted combination, can be constructed directly using existing voting methods. In particular, in Section 4. 2 we present an algorithm for learning alternating decision trees based upon Schapire and Singer s real valued extension of AdaBoost [61]. Despite their generality, alternating decision trees can be interpreted using the same simple techniques that are applied to interpret single decision trees and voted combinations of decision stumps. In Section 4.3 we demonstrate these techniques by analyzing the alternating decision tree ....

[Article contains additional citation context not shown here]

R. E. Schapire and Y. Singer. Improved boosting algorithms using con dence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 80-91, 1998.


Applying Boosting to Similarity Literals for Time Series .. - Rodríguez, Alonso (2000)   (Correct)

.... examples considered is of min(ir; e) Hence, the execution time for the boosting process is O(min(ir; e)ve d(n) irve lg e) Multiclass problems The simpler AdaBoost algorithm is de ned for binary classi cations problems [Schapire, 1999] although there are extensions for multiclass problems [Schapire and Singer, 1998]. In our case the base classi ers are also binary (only one literal) and it excludes some techniques for handling multiclass problems. We have used a simple approximation: the problem is reduced to several binary classi cation problems, as many as classes, which decides if an example is, or is ....

Schapire, R. E. and Singer, Y. (1998). Improved boosting algorithms using con dence-rated predictions. In 11 Annual Conference on Computational Learning Theory (COLT-98), pages 80-91. ACM.


Multiclass Alternating Decision Trees - Holmes, Pfahringer, Kirkby.. (2002)   (1 citation)  (Correct)

....gained prominence [3, 10] Like many classi cation algorithms, most boosting procedures are formulated for the binary classi cation setting. Schapire and Singer generalize AdaBoost to the multiclass setting producing several alternative procedures of which the best (empirically) is AdaBoost. MH [14]. This version of AdaBoost covers the multilabel setting where an instance can have more than one class label as well as the multiclass setting where an instance can have a single class label taken from a set of (more than two) labels. Alternating decision trees are induced using a real valued ....

....covers the multilabel setting where an instance can have more than one class label as well as the multiclass setting where an instance can have a single class label taken from a set of (more than two) labels. Alternating decision trees are induced using a real valued formulation of AdaBoost [14]. At each boosting iteration three nodes are added to the tree. A splitter node that attempts to split sets of instances into pure subsets and two prediction nodes, one for each of the splitter node s subsets. The position of this new splitter node is determined by examining all predictor nodes ....

Robert E. Schapire and Yoram Singer. Improved boosting algorithms using con dence-rated predictions. In Proc. 11th Conf. on Computational Learing Theory, pages 80-91. ACM Press, 1998.


Optimizing the Induction of Alternating Decision Trees - Pfahringer, Holmes, Kirkby (2001)   (2 citations)  (Correct)

....paths that an instance successfully traverses. A positive sum implies membership of one class and a negative sum membership of the other. While the original algorithm was restricted to two class problems it appears that the algorithm can be extended to multiclass problems by using the framework of [7]. Each boosting iteration adds a test (weak hypothesis) and two predictor nodes to the tree. The test chosen to extend the tree is the one that minimizes a 2 function that measures the impurity of the test. The tree can be extended from any of its existing predictor nodes which means that for ....

Schapire, R.E., Singer, Y.: Improved boosting algorithms using condence-rated predictions. Machine Learning 37 (3) (1999) 297-336.


Geometric Bounds for Generalization in Boosting - Mannor, Meir (2001)   (Correct)

....on m, one can no longer guarantee a rate of convergence of order 1= p m. This observation motivates us to search for situations where may be chosen to be independent of m, while still guaranteeing that the empirical margin error (2) is small. 2. 2 Training error Boosting algorithms (e.g. [5, 7, 11, 12]) operate by successively constructing a sequence of weak learners based on a re weighted version on the data. The nal (composite) hypothesis is then formed by taking a weighted combination of the weak learners. Denote the weighted empirical error of the t th weak learner h t by t , where t ....

R.E. Schapire and Y. Singer. Improved boosting algorithms using condence-rated predictions. Machine Learning, 37(3):297-336, 1999.


Automatic Multi-Lingual Information Extraction - Peng (2001)   (Correct)

....induce a decision list of contextual rules using the supervised learning method; and also label examples using the current contextual rules to induce a decision list of spelling rules; These procedures continues until the rule number reaches some constant. CoBoost was based on AdaBoost algorithm [37, 82] and the Co Training algorithm [7] EM [31] based method is a common approach for unsupervised learning. 8 5 Promising Research Problems There are two main problems in i.e. Performance: how well the IE system behaves on the task ( precision recall ) Portability: how dicult to adapt the ....

Schapire, R. and Singer, Y. Improved boosting algorithms using condence-rated predictions. In Machine Learning, 37(3):297-336, 1999


The interaction of stability and weakness in AdaBoost - Kutin, Niyogi (2001)   (Correct)

....G t (x) P t s=1 s h s (x) The output of the algorithm is a classi er H T constructed from G T (see Section 5.1. 1) Note that we allow our hypotheses h i to take values in [0; 1] For such continuous hypotheses, the choice of t and t above may not be optimal; see 22 Schapire and Singer [10] for a discussion of this question. However, since we need a concrete de nition for our analysis, we stick with the original choices of t and t described above. 5.1.1 Constructing the nal hypothesis The remaining question is how to construct H T from G T . Our goal is that H T be a ....

R. Schapire and Y. Singer. Improved boosting algorithms using condence-rated predictions. Machine Learning, 37(3):297-336, 1999.


SCAI Experiments on TREC-9 - Yu-Hwan Kim Sun   (Correct)

....the previous weak learner. The combined classi er is composed by weighted voting of weak learners. In the original AdaBoost it is restricted that each hypothesis or weak learner can produce an output 1 or 1. A more sophisticated version of the AdaBoost algorithm is AdaBoost with con dence ratio [3]. It interprets the sign of the weak learner , say t (x) as the predicted label ( 1 or 1) to be assigned to instance, say x, and the magnitude j t (x)j as the con dence in this prediction. Thus if t (x) is close to or far from zero, it is interpreted as a low or high con dence prediction. ....

Schapire, R. E. and Singer, Y, Improved Boosting Algorithms Using Condence-rated Predictions, Machine Learning 37(3), pp. 297-336, 1999. 7


Learning Algorithms for Enclosing Points in Bregmanian Spheres - Crammer, Singer (2003)   Self-citation (Singer)   (Correct)

No context found.

R. E. Schapire and Y. Singer. Improved boosting algorithms using con dence-rated predictions. Machine Learning, 37(3):1-40, 1999.


A New Family of Online Algorithms for Category Ranking - Crammer, Singer (2002)   (7 citations)  Self-citation (Singer)   (Correct)

....the family of algorithms uses Multiclass Multilabel feedback we refer to the various variants as the MMP algorithm. A few learning algorithms for multi labeled data have been devised in the machine learning community. Two notable examples are a multilabel version of AdaBoost called AdaBoost.MH [10] and a multilabel generalization of Vapnik s Support Vector Machines by Elissee and Weston [2] These two multilabel algorithms take the same general approach by reducing a multilabel problem into multiple binary problems by comparing all pairs of labels. Our starting point is similar as we use ....

R. E. Schapire and Y. Singer. Improved boosting algorithms using con dence-rated predictions. Machine Learning, 37(3):1-40, 1999.


Fabio Aiolli - Dipartimento Di Informatica   (Correct)

No context found.

R. E. Schapire and Y. Singer, \Improved boosting algorithms using con dencerated predictions," ####### ########,vol. 37, no. 3, pp. 1-40, 1999.


An Improved Boosting Algorithm and Its Application to Text.. - Sebastiani, al. (2000)   (5 citations)  (Correct)

No context found.

Robert E. Schapire and Yoram Singer. Improved boosting algorithms using con dence-rated predictions. Machine Learning, 37(3):297-336, 1999.


Discretizing Continuous Attributes in AdaBoost for.. - Nardiello.. (2003)   (Correct)

No context found.

R. E. Schapire and Y. Singer. Improved boosting algorithms using con dence-rated predictions. Machine Learning, 37(3):297-336, 1999.


Discriminative Reranking for Natural Language Parsing - Collins, Koo (2000)   (35 citations)  (Correct)

No context found.

Schapire, Robert E. and Yoram Singer. (1999). Improved boosting algorithms using con dence-rated predictions. Machine Learning, 37(3):297-336, 1999 Schapire, Robert E. and Yoram Singer. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135-168, 2000.


An Efficient SMO-like Algorithm for Multiclass SVM - Aiolli, Sperduti   (Correct)

No context found.

R. E. Schapire and Y. Singer, \Improved boosting algorithms using con dencerated predictions," ####### ########,vol. 37, no. 3, pp. 1-40, 1999.


Learning Multi-label Alternating Decision Trees from.. - De Comite, Gilleron..   (Correct)

No context found.

Robert F. Schapire and Yoram Singer. Improved boosting algorithms using con dence-rated predictions. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98), pages 80-91, New York, July 24-26 1998. ACM Press.


Boosting as Entropy Projection - Kivinen, Warmuth (1999)   (18 citations)  (Correct)

No context found.

R. E. Schapire and Y. Singer. Improved boosting algorithms using condence-rated predictions. In Proc. 11th Annu. Conf. on Comput. Learning Theory, pages 80--91. ACM, New York, 1998.


Discriminative Learning for Label Sequences via Boosting - Altun, Hofmann, Johnson (2002)   (2 citations)  (Correct)

No context found.

R. Schapire and Y. Singer. Improved boosting algorithms using con dence-rated predictions. Machine Learning, 37(3):297-336, 1999.


Aggregated Estimators And Empirical Complexity For Least Square.. - Audibert   (Correct)

No context found.

Robert E. Schapire and Yoram Singer, Improved boosting algorithms using con dence-rated predictions, (1998), 80-91.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC