• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Ensemble methods in machine learning (0)

by T G Dietterich
Venue:LNCS
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 625
Next 10 →

Extremely Randomized Trees

by Pierre Geurts - MACHINE LEARNING , 2003
"... This paper presents a new learning algorithm based on decision tree ensembles. In opposition to the classical decision tree induction method, the trees of the ensemble are built by selecting the tests during their induction fully at random. This extreme ..."
Abstract - Cited by 267 (49 self) - Add to MetaCart
This paper presents a new learning algorithm based on decision tree ensembles. In opposition to the classical decision tree induction method, the trees of the ensemble are built by selecting the tests during their induction fully at random. This extreme

Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy.

by Ludmila I Kuncheva , Editor: Robert E Schapire - Machine Learning, , 2003
"... Abstract. Diversity among the members of a team of classifiers is deemed to be a key issue in classifier combination. However, measuring diversity is not straightforward because there is no generally accepted formal definition. We have found and studied ten statistics which can measure diversity am ..."
Abstract - Cited by 238 (0 self) - Add to MetaCart
Abstract. Diversity among the members of a team of classifiers is deemed to be a key issue in classifier combination. However, measuring diversity is not straightforward because there is no generally accepted formal definition. We have found and studied ten statistics which can measure diversity among binary classifier outputs (correct or incorrect vote for the class label): four averaged pairwise measures (the Q statistic, the correlation, the disagreement and the double fault) and six non-pairwise measures (the entropy of the votes, the difficulty index, the Kohavi-Wolpert variance, the interrater agreement, the generalized diversity, and the coincident failure diversity). Four experiments have been designed to examine the relationship between the accuracy of the team and the measures of diversity, and among the measures themselves. Although there are proven connections between diversity and accuracy in some special cases, our results raise some doubts about the usefulness of diversity measures in building classifier ensembles in real-life pattern recognition problems.
(Show Context)

Citation Context

...r, uncorrelated classifiers reduce the added error by a factor of 1/L , and negatively correlated classifiers reduce the error even further. But note that there is a limit on the largest absolute value of a negative pairwise correlation among L classifiers. Tumer and Ghosh (1996b, 1999) do not mention the case of negative correlation although it clearly supports their thesis that the smaller the correlation, the better the ensemble. A negative correlation between the continuousvalued outputs has been sought, predominantly by altering the available training set or parameters of the classifier (Dietterich, 2000a; Hashem, 1999; Krogh & Vedelsby, 1995; Liu & Yao, 1999; Opitz & Shavlik, 1999; Parmanto et al., 1996; Giacinto & Roli, 2001; Rosen, 1996; Sharkey & Sharkey, 1997; Skalak, 1996; Tumer & Ghosh, 1999). – When classifiers output class labels, the classification error can be decomposed into bias and variance terms (also called ‘spread’) (Bauer & Kohavi, 1999; Breiman, 1999; Kohavi & Wolpert, 1996) or into bias and spread terms. In both cases the second term can be taken as the diversity of the ensemble. These results have been used to study the behavior of classifier ensembles in terms of the bia...

Incorporating Contextual Information in Recommender Systems Using a Multidimensional Approach

by Gediminas Adomavicius, Ramesh Sankaranarayanan, Shahana Sen, Alexander Tuzhilin - ACM Transactions on Information Systems , 2005
"... The paper presents a multidimensional (MD) approach to recommender systems that can provide recommendations based on additional contextual information besides the typical information on users and items used in most of the current recommender systems. This approach supports multiple dimensions, exten ..."
Abstract - Cited by 236 (9 self) - Add to MetaCart
The paper presents a multidimensional (MD) approach to recommender systems that can provide recommendations based on additional contextual information besides the typical information on users and items used in most of the current recommender systems. This approach supports multiple dimensions, extensive profiling, and hierarchical aggregation of recommendations. The paper also presents a multidimensional rating estimation method capable of selecting two-dimensional segments of ratings pertinent to the recommendation context and applying standard collaborative filtering or other traditional two-dimensional rating estimation techniques to these segments. A comparison of the multidimensional and two-dimensional rating estimation approaches is made, and the tradeoffs between the two are studied. Moreover, the paper introduces a combined rating estimation method that identifies the situations where the MD approach outperforms the standard two-dimensional approach and uses the MD approach in those situations and the standard two-dimensional approach elsewhere. Finally, the paper presents a pilot empirical study of the combined approach, using a multidimensional movie recommender system that was developed for implementing this approach and testing its performance. 1 1.

Round Robin Classification

by Johannes Fürnkranz , 2002
"... In this paper, we discuss round robin classification (aka pairwise classification), a technique for handling multi-class problems with binary classifiers by learning one classifier for each pair of classes. We present an empirical evaluation of the method, implemented as a wrapper around the Ripp ..."
Abstract - Cited by 109 (18 self) - Add to MetaCart
In this paper, we discuss round robin classification (aka pairwise classification), a technique for handling multi-class problems with binary classifiers by learning one classifier for each pair of classes. We present an empirical evaluation of the method, implemented as a wrapper around the Ripper rule learning algorithm, on 20 multi-class datasets from the UCI database repository. Our results show that the technique is very likely to improve Ripper's classification accuracy without having a high risk of decreasing it. More importantly, we give a general theoretical analysis of the complexity of the approach and show that its run-time complexity is below that of the commonly used one-against-all technique.

Ensemble Selection from Libraries of Models

by Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, Alex Ksikes - In Proceedings of the 21st International Conference on Machine Learning , 2004
"... We present a method for constructing ensembles from libraries of thousands of models. Model libraries are generated using different learning algorithms and parameter settings. Forward stepwise selection is used to add to the ensemble the models that maximize its performance. Ensemble selection allow ..."
Abstract - Cited by 94 (4 self) - Add to MetaCart
We present a method for constructing ensembles from libraries of thousands of models. Model libraries are generated using different learning algorithms and parameter settings. Forward stepwise selection is used to add to the ensemble the models that maximize its performance. Ensemble selection allows ensembles to be optimized to performance metric such as accuracy, cross entropy, mean precision, or ROC Area. Experiments with seven test problems and ten metrics demonstrate the benefit of ensemble selection.

Constructing Diverse Classifier Ensembles using Artificial Training Examples

by Prem Melville, Raymond J. Mooney - In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence
"... Ensemble methods like bagging and boosting that combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. This paper presents a new ..."
Abstract - Cited by 78 (9 self) - Add to MetaCart
Ensemble methods like bagging and boosting that combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. This paper presents a new method for generating ensembles that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier and bagging, and also obtains higher accuracy than boosting early in the learning curve when training data is limited.

Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining

by Alexander Strehl , 2002
"... ..."
Abstract - Cited by 71 (0 self) - Add to MetaCart
Abstract not found

Knowledge transfer via multiple model local structure mapping

by Jing Gao, Wei Fan, Jing Jiang, Jiawei Han - In International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV , 2008
"... The effectiveness of knowledge transfer using classification algorithms depends on the difference between the distribution that generates the training examples and the one from which test examples are to be drawn. The task can be especially difficult when the training examples are from one or severa ..."
Abstract - Cited by 59 (11 self) - Add to MetaCart
The effectiveness of knowledge transfer using classification algorithms depends on the difference between the distribution that generates the training examples and the one from which test examples are to be drawn. The task can be especially difficult when the training examples are from one or several domains different from the test domain. In this paper, we propose a locally weighted ensemble framework to combine multiple models for transfer learning, where the weights are dynamically assigned according to a model’s predictive power on each test example. It can integrate the advantages of various learning algorithms and the labeled information from multiple training domains into one unified classification model, which can then be applied on a different domain. Importantly, different from many previously proposed methods, none of the base learning method is required to be specifically designed for transfer learning. We show the optimality of a locally weighted ensemble framework as a general approach to combine multiple models for domain transfer. We then propose an implementation of the local weight assignments by mapping the structures of a model onto the structures of the test domain, and then weighting each model locally according to its consistency with the neighborhood structure around the test example. Experimental results on text classification, spam filtering and intrusion detection data sets demonstrate significant improvements in classification accuracy gained by the framework. On a transfer learning task of newsgroup message categorization, the proposed locally weighted ensemble framework achieves 97 % accuracy when the best single model predicts correctly only on 73 % of the test examples. In summary, the improvement in accuracy is over 10 % and up to 30 % across different problems.
(Show Context)

Citation Context

...ngle source of information and try to learn a global single model that adapts well to the test set. Constructing a good ensemble of classifiers has been an active research area in supervised learning =-=[12]-=-. By combining decisions from individual classifiers, ensembles can usually reduce variance and achieve higher accuracy than individual classifiers. Such methods include Bayesian averaging [17], baggi...

Finding Consistent Clusters in Data Partitions

by Ana Fred - IN PROC. 3D INT. WORKSHOP ON MULTIPLE CLASSIFIER , 2001
"... Given an arbitrary data set, to which no particular parametrical, statistical or geometrical structure can be assumed, different clustering algorithms will in general produce different data partitions. In fact, several partitions can also be obtained by using a single clustering algorithm due to d ..."
Abstract - Cited by 55 (6 self) - Add to MetaCart
Given an arbitrary data set, to which no particular parametrical, statistical or geometrical structure can be assumed, different clustering algorithms will in general produce different data partitions. In fact, several partitions can also be obtained by using a single clustering algorithm due to dependencies on initialization or the selection of the value of some design parameter. This paper addresses the problem of finding consistent clusters in data partitions, proposing the analysis of the most common associations performed in a majority voting scheme. Combination of clustering results are performed by transforming data partitions into a co-association sample matrix, which maps coherent associations. This matrix is then used to extract the underlying consistent clusters. The proposed methodology is evaluated in the context of k-means clustering, a new clustering algorithm- voting-k-means, being presented. Examples, using both simulated and real data, show how this majority voting combination scheme simultaneously handles the problems of selecting the number of clusters, and dependency on initialization. Furthermore, resulting clusters are not constrained to be hyper-spherically shaped.
(Show Context)

Citation Context

...or by simple split of the feature space for dimensionality reasons. A first aspect in combining classifiers is the production of an ensemble of classifiers. Methods for constructing ensembles include =-=[3]-=-: manipulation of the training samples, such as bootstrapping (Bagging), reweighing the data (boosting) or using random subspaces; manipulation of the labelling of data, an example of which is error-c...

ASSAM: A Tool for Semi-Automatically Annotating Semantic Web Services

by Andreas Heß, Eddie Johnston, Nicholas Kushmerick - In Intl. Semantic Web Conf. (ISWC , 2004
"... The semantic Web Services vision requires that each service be annotated with semantic metadata. Manually creating such metadata is tedious and error-prone, and many software engineers, accustomed to tools that automatically generate WSDL, might not want to invest the additional e#ort. We theref ..."
Abstract - Cited by 52 (4 self) - Add to MetaCart
The semantic Web Services vision requires that each service be annotated with semantic metadata. Manually creating such metadata is tedious and error-prone, and many software engineers, accustomed to tools that automatically generate WSDL, might not want to invest the additional e#ort. We therefore propose ASSAM, a tool that assists a user in creating semantic metadata for Web Services. ASSAM is intended for service consumers who want to integrate a number of services and therefore must annotate them according to some shared ontology. ASSAM is also relevant for service producers who have deployed a Web Service and want to make it compatible with an existing ontology. ASSAM's capabilities to automatically create semantic metadata are supported by two machine learning algorithms. First, we have developed an iterative relational classification algorithm for semantically classifying Web Services, their operations, and input and output messages. Second, to aggregate the data returned by multiple semantically related Web Services, we have developed a schema mapping algorithm that is based on an ensemble of string distance metrics.
(Show Context)

Citation Context

...thm. In their approach, one single classifier is trained on all (intrinsic and extrinsic) features. In a variety of tasks, ensembles of several classifiers have been shown to be more effective (e.g., =-=[7]). For t-=-his reason, we train two separate classifiers, one on the intrinsic features (“A”) and one on the extrinsic features (“B”), and vote together their predictions. Another advantage of combining ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University