Results 1  10
of
109
In defense of onevsall classification
 Journal of Machine Learning Research
, 2004
"... Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This the ..."
Abstract

Cited by 318 (0 self)
 Add to MetaCart
Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This thesis is interesting in that it disagrees with a large body of recent published work on multiclass classification. We support our position by means of a critical review of the existing literature, a substantial collection of carefully controlled experimental work, and theoretical arguments.
Classifier Chains for Multilabel Classification
"... Abstract. The widely known binary relevance method for multilabel classification, which considers each label as an independent binary problem, has been sidelined in the literature due to the perceived inadequacy of its labelindependence assumption. Instead, most current methods invest considerable ..."
Abstract

Cited by 162 (13 self)
 Add to MetaCart
Abstract. The widely known binary relevance method for multilabel classification, which considers each label as an independent binary problem, has been sidelined in the literature due to the perceived inadequacy of its labelindependence assumption. Instead, most current methods invest considerable complexity to model interdependencies between labels. This paper shows that binary relevancebased methods have much to offer, especially in terms of scalability to large datasets. We exemplify this with a novel chaining method that can model label correlations while maintaining acceptable computational complexity. Empirical evaluation over a broad range of multilabel datasets with a variety of evaluation metrics demonstrates the competitiveness of our chaining method against related and stateoftheart methods, both in terms of predictive performance and time complexity. 1
Distributional Word Clusters vs. Words for Text Categorization
 Journal of Machine Learning Research
, 2003
"... We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This wordcluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representati ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This wordcluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representation of documents. When combined with the classification power of the SVM, this method yields high performance in text categorization. This novel combination of SVM with wordcluster representation is compared with SVMbased categorization using the simpler bagofwords (BOW) representation. The comparison is performed over three known datasets. On one of these datasets (the 20 Newsgroups) the method based on word clusters significantly outperforms the wordbased representation in terms of categorization accuracy or representation efficiency. On the two other sets (Reuters21578 and WebKB) the wordbased representation slightly outperforms the wordcluster representation. We investigate the potential reasons for this behavior and relate it to structural differences between the datasets.
Multilabel classification via calibrated label ranking
 MACH LEARN
, 2008
"... Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a ..."
Abstract

Cited by 69 (10 self)
 Add to MetaCart
Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a suitable extension of label ranking that incorporates the calibrated scenario and substantially extends the expressive power of these approaches. In particular, our extension suggests a conceptually novel technique for extending the common learning by pairwise comparison approach to the multilabel scenario, a setting previously not being amenable to the pairwise decomposition technique. The key idea of the approach is to introduce an artificial calibration label that, in each example, separates the relevant from the irrelevant labels. We show that this technique can be viewed as a combination of pairwise preference learning and the conventional relevance classification technique, where a separate classifier is trained to predict whether a label is relevant or not. Empirical results in the area of text categorization, image classification and gene analysis underscore the merits of the calibrated model in comparison to stateoftheart multilabel learning methods.
Investigation of the random forest framework for classification of hyperspectral data
 5.12 (0.15) 4.97 (0.04) Wetland 6.74 (0.33) 6.40 (0.47) 6.12 (0.22) 5.72 (0.22) 5.44 (0.08) Between 6.40 (0.26) 6.30 (0.19) 6.37 (0.11) 6.53 (0.07) 6.63 (0.05) Upland 6.30 (0.70) 6.02 (0.86) 6.64 (0.50) 6.42 (0.81) 5.60 (0.86) Wetland 5.42 (0.55) 5.34 (0.
"... Abstract—Statistical classification of byperspectral data is challenging because the inputs are high in dimension and represent multiple classes that are sometimes quite mixed, while the amount and quality of ground truth in the form of labeled data is typically limited. The resulting classifiers ar ..."
Abstract

Cited by 54 (12 self)
 Add to MetaCart
(Show Context)
Abstract—Statistical classification of byperspectral data is challenging because the inputs are high in dimension and represent multiple classes that are sometimes quite mixed, while the amount and quality of ground truth in the form of labeled data is typically limited. The resulting classifiers are often unstable and have poor generalization. This paper investigates two approaches based on the concept of random forests of classifiers implemented within a binary hierarchical multiclassifier system, with the goal of achieving improved generalization of the classifier in analysis of hyperspectral data, particularly when the quantity of training data is limited. A new classifier is proposed that incorporates bagging of training samples and adaptive random subspace feature selection within a binary hierarchical classifier (BHC), such that the number of features that is selected at each node of the tree is dependent on the quantity of associated training data. Results are compared to a random forest implementation based on the framework of classification and regression trees. For both methods, classification results obtained from experiments on data acquired
Pairwise Preference Learning and Ranking
 Proceedings of the 14th European Conference on Machine Learning
, 2003
"... We consider supervised learning of a ranking function, which is a mapping from instances to total orders over a set of labels (options). The training information consists of examples with partial (and possibly inconsistent) information about their associated rankings. From these, we induce a rank ..."
Abstract

Cited by 53 (11 self)
 Add to MetaCart
We consider supervised learning of a ranking function, which is a mapping from instances to total orders over a set of labels (options). The training information consists of examples with partial (and possibly inconsistent) information about their associated rankings. From these, we induce a ranking function by reducing the original problem to a number of binary classification problems, one for each pair of labels. The main objective of this work is to investigate the tradeoff between the quality of the induced ranking function and the computational complexity of the algorithm, both depending on the amount of preference information given for each example. To this end, we present theoretical results on the complexity of pairwise preference learning.
Nonlinear Models Using Dirichlet Process Mixtures
"... We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, nonparametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relations ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, nonparametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classification problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into subpopulations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data.
TildeCRF: Conditional random fields for logical sequences
 In Proceedings of the 15th European Conference on Machine Learning (ECML06
, 2006
"... Abstract. Conditional Random Fields (CRFs) provide a powerful instrument for labeling sequences. So far, however, CRFs have only been considered for labeling sequences over flat alphabets. In this paper, we describe TildeCRF, the first method for training CRFs on logical sequences, i.e., sequences o ..."
Abstract

Cited by 37 (17 self)
 Add to MetaCart
(Show Context)
Abstract. Conditional Random Fields (CRFs) provide a powerful instrument for labeling sequences. So far, however, CRFs have only been considered for labeling sequences over flat alphabets. In this paper, we describe TildeCRF, the first method for training CRFs on logical sequences, i.e., sequences over an alphabet of logical atoms. TildeCRF’s key idea is to use relational regression trees in Dietterich et al.’s gradient tree boosting approach. Thus, the CRF potential functions are represented as weighted sums of relational regression trees. Experiments show a significant improvement over established results achieved with hidden Markov models and Fisher kernels for logical sequences. 1
New results on error correcting output codes of kernel machines
 IEEE Transactions on Neural Networks
, 2004
"... Abstract—We study the problem of multiclass classification within the framework of error correcting output codes (ECOC) using marginbased binary classifiers. Specifically, we address two important open problems in this context: decoding and model selection. The decoding problem concerns how to map ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We study the problem of multiclass classification within the framework of error correcting output codes (ECOC) using marginbased binary classifiers. Specifically, we address two important open problems in this context: decoding and model selection. The decoding problem concerns how to map the outputs of the classifiers into class codewords. In this paper we introduce a new decoding function that combines the margins through an estimate of their class conditional probabilities. Concerning model selection, we present new theoretical results bounding the leaveoneout (LOO) error of ECOC of kernel machines, which can be used to tune kernel hyperparameters. We report experiments using support vector machines as the base binary classifiers, showing the advantage of the proposed decoding function over other functions of the margin commonly used in practice. Moreover, our empirical evaluations on model selection indicate that the bound leads to good estimates of kernel parameters. Index Terms—Error correcting output codes (ECOC), machine learning, statistical learning theory, support vector machines. I.