Results 1  10
of
20
That Elusive Diversity in Classifier Ensembles
 Lecture Notes in Computer Science
, 2003
"... Is "useful diversity" a myth? Many experiments and the little available theory on diversity in classifier ensembles are either inconclusive, too heavily assumptionbound or openly nonsupportive of the intuition that diverse classifiers fare better than nondivers ones. Although a roug ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
Is "useful diversity" a myth? Many experiments and the little available theory on diversity in classifier ensembles are either inconclusive, too heavily assumptionbound or openly nonsupportive of the intuition that diverse classifiers fare better than nondivers ones. Although a rough general tendency was confirmed in our previous studies, no prominent link appeared between diversity of the ensemble and its accuracy.
Classifier Ensembles: Select RealWorld Applications
, 2008
"... Broad classes of statistical classification algorithms have been developed and applied successfully to a wide range of real world domains. In general, ensuring that the particular classification algorithm matches the properties of the data is crucial in providing results that meet the needs of the p ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Broad classes of statistical classification algorithms have been developed and applied successfully to a wide range of real world domains. In general, ensuring that the particular classification algorithm matches the properties of the data is crucial in providing results that meet the needs of the particular application domain. One way in which the impact of this algorithm/application match can be alleviated is by using ensembles of classifiers, where a variety of classifiers (either different types of classifiers or different instantiations of the same classifier) are pooled before a final classification decision is made. Intuitively, classifier ensembles allow the different needs of a difficult problem to be handled by classifiers suited to those particular needs. Mathematically, classifier ensembles provide an extra degree of freedom in the classical bias/variance tradeoff, allowing solutions that would be difficult (if not impossible) to reach with only a single classifier. Because of these advantages, classifier ensembles have been applied to many difficult real world problems. In this paper, we survey select applications of ensemble methods to problems that have historically been most representative of the difficulties in classification. In particular, we survey applications of ensemble methods to remote sensing, person recognition, one vs. all recognition, and medicine.
A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing
 In Proc. NSF Workshop on Next Generation Data Mining
, 2002
"... This paper examines the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. This problem is an abstraction of scenarios where different organizations have grouped some ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
(Show Context)
This paper examines the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. This problem is an abstraction of scenarios where different organizations have grouped some or all elements of a common underlying population, possibly using different features, algorithms or clustering criteria. Moreover, due to real life constraints such as proprietary techniques, legal restrictions, different data ownerships etc, it is not feasible to pool all the data into a central location and then apply clustering techniques: the only information that can be shared are the symbolic cluster labels. The cluster ensemble problem is formalized as a combinatorial optimization problem that obtains a consensus function in terms of shared mutual information among individual solutions. Three effective and efficient techniques for obtaining highquality consensus functions are described and studied empirically for the following qualitatively different application scenarios: (i) where the original clusters were formed based on nonidentical sets of features, (ii) where the original clustering algorithms were applied to nonidentical sets of objects and (iii) when the individual solutions provide varying numbers of clusters. Promising results are obtained in all the three situations for synthetic as well as real data sets, even under severe restrictions on data and knowledge sharing.
On the Value of Ensemble Effort Estimation
"... Background: Despite decades of research, there is no consensus on which software effort estimation methods produce the most accurate models. Aim: Prior work has reported that, given M estimation methods, no single method consistently outperforms all others. Perhaps rather than recommending one estim ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Background: Despite decades of research, there is no consensus on which software effort estimation methods produce the most accurate models. Aim: Prior work has reported that, given M estimation methods, no single method consistently outperforms all others. Perhaps rather than recommending one estimation method as best, it is wiser to generate estimates from ensembles of multiple estimation methods. Method: 9 learners were combined with 10 preprocessing options to generate 9 × 10 = 90 solomethods. These were applied to 20 data sets and evaluated using 7 error measures. This identified the best n (in our case n = 13) solomethods that showed stable performance across multiple datasets and error measures. The top 2, 4, 8 and 13 solomethods were then combined to generate 12 multimethods, which were then compared to the solomethods. Results: (i) The top 10 (out of 12) multimethods significantly outperformed all 90 solomethods. (ii) The error rates of the multimethods were significantly less than the solomethods. (iii) The ranking of the best multimethod was remarkably stable. Conclusion: While there is no best single effort estimation method, there exist best combinations of such effort estimation methods.
2006b. Experimental comparison of cluster ensemble methods
 In International Conference on Information Fusion. 1–7
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Architectures for Detecting and Solving Conflicts: TwoStage Classification and Support Vector Classifiers
, 2003
"... In the majority of cases, a properly trained classi er or ensemble of classi ers may yield acceptable recognition results. However, in some cases recognition will fail due to typical conicts that are encountered, like the confusion between [A] and [H] or [U] and [V]. In this paper, two architectur ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
In the majority of cases, a properly trained classi er or ensemble of classi ers may yield acceptable recognition results. However, in some cases recognition will fail due to typical conicts that are encountered, like the confusion between [A] and [H] or [U] and [V]. In this paper, two architectures for the recognition of handwritten text are described. The key issue for each of these systems is to detect the event of a possible conict and subsequently attempt to solve that particular problem. Both systems exploit a twostage classi cation method. In the event that the rststage classi ers are not certain about the result, the secondstage system engages a set of support vector classi ers for re ning the output hypothesis.
Selecting diversifying heuristics for cluster ensembles
 In 7th international workshop on multiple classifier systems (MCS), Prague, Czech Republic
, 2007
"... Abstract. Cluster ensembles are deemed to be better than single clustering algorithms for discovering complex or noisy structures in data. Various heuristics for constructing such ensembles have been examined in the literature, e.g., random feature selection, weak clusterers, random projections, et ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Cluster ensembles are deemed to be better than single clustering algorithms for discovering complex or noisy structures in data. Various heuristics for constructing such ensembles have been examined in the literature, e.g., random feature selection, weak clusterers, random projections, etc. Typically, one heuristic is picked at a time to construct the ensemble. To increase diversity of the ensemble, several heuristics may be applied together. However, not any combination may be beneficial. Here we apply a standard genetic algorithm (GA) to select from 7 standard heuristics for kmeans cluster ensembles. The ensemble size is also encoded in the chromosome. In this way the data is forced to guide the selection of heuristics as well as the ensemble size. Eighteen moderatesize datasets were used: 4 artificial and 14 real. The results resonate with our previous findings in that high diversity is not necessarily a prerequisite for high accuracy of the ensemble. No particular combination of heuristics appeared to be consistently chosen across all datasets, which justifies the existing variety of cluster ensembles. Among the most often selected heuristics were random feature extraction, random feature selection and random number of clusters assigned for each ensemble member. Based on the experiments, we recommend that the current practice of using one or two heuristics for building kmeans cluster ensembles should be revised in favour of using 35 heuristics.1
Bayes Error Rate Estimation Using Classifier Ensembles
, 2003
"... The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating this rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating this rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian classconditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, are combined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensemble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult fourclass problem involving underwater acoustic data, and two problems from the Proben1 benchmarks. For data sets with known Bayes error, the combinerbased methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the reallife data sets for which the true Bayes rates are unknown.
Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings
"... Abstract. We recently introduced the idea of solving cluster ensembles using a Weighted Shared nearest neighbors Graph (WSnnG). Preliminary experiments have shown promising results in terms of integrating different clusterings into a combined one, such that the natural cluster structure of the data ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We recently introduced the idea of solving cluster ensembles using a Weighted Shared nearest neighbors Graph (WSnnG). Preliminary experiments have shown promising results in terms of integrating different clusterings into a combined one, such that the natural cluster structure of the data can be revealed. In this paper, we further study and extend the basic WSnnG. First, we introduce the use of fixed number of nearest neighbors in order to reduce the size of the graph. Second, we use refined weights on the edges and vertices of the graph. Experiments show that it is possible to capture the similarity relationships between the data patterns on a compact refined graph. Furthermore, the quality of the combined clustering based on the proposed WSnnG surpasses the average quality of the ensemble and that of an alternative clustering combining method based on partitioning of the patterns ’ coassociation matrix. 1