Results 1  10
of
41
Clustering: Science or art
 NIPS 2009 Workshop on Clustering Theory
, 2009
"... This paper deals with the question whether the quality of different clustering algorithms can be compared by a general, scientifically sound procedure which is independent of particular clustering algorithms. In our opinion, the major obstacle is the difficulty to evaluate a clustering algorithm wit ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
This paper deals with the question whether the quality of different clustering algorithms can be compared by a general, scientifically sound procedure which is independent of particular clustering algorithms. In our opinion, the major obstacle is the difficulty to evaluate a clustering algorithm without taking into account the context: why does the user cluster his data in the first place, and what does he want to do with the clustering afterwards? We suggest that clustering should not be treated as an applicationindependent mathematical problem, but should always be studied in the context of its enduse. Different techniques to evaluate clustering algorithms have to be developed for different uses of clustering. To simplify this procedure it will be useful to build a “taxonomy of clustering problems ” to identify clustering applications which can be treated in a unified way. Preamble Every year, dozens of papers on clustering algorithms get published. Researchers continuously invent new clustering algorithms and work on improving existing ones.
Composite multiclass losses.
 In Neural Information Processing Systems,
, 2011
"... Abstract We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a "proper composite loss", which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We subsum ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
(Show Context)
Abstract We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a "proper composite loss", which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We subsume results on "classification calibration" by relating it to properness. We determine the stationarity condition, Bregman representation, ordersensitivity, and quasiconvexity of multiclass proper losses. We then characterise the existence and uniqueness of the composite representation for multiclass losses. We show how the composite representation is related to other core properties of a loss: mixability, admissibility and (strong) convexity of multiclass losses which we characterise in terms of the Hessian of the Bayes risk. We show that the simple integral representation for binary proper losses can not be extended to multiclass losses but offer concrete guidance regarding how to design different loss functions. The conclusion drawn from these results is that the proper composite representation is a natural and convenient tool for the design of multiclass loss functions.
A unified view of performance metrics: Translating threshold choices into expected classification loss
 Journal of Machine Learning Research
, 2012
"... Many performance metrics have been introduced in the literature for the evaluation of classification performance, each of them with different origins and areas of application. These metrics include accuracy, unweighted accuracy, the area under the ROC curve or the ROC convex hull, the mean absolute ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Many performance metrics have been introduced in the literature for the evaluation of classification performance, each of them with different origins and areas of application. These metrics include accuracy, unweighted accuracy, the area under the ROC curve or the ROC convex hull, the mean absolute error and the Brier score or mean squared error (with its decomposition into refinement and calibration). One way of understanding the relations among these metrics is by means of variable operating conditions (in the form of misclassification costs and/or class distributions). Thus, a metric may correspond to some expected loss over different operating conditions. One dimension for the analysis has been the distribution for this range of operating conditions, leading to some important connections in the area of proper scoring rules. We demonstrate in this paper that there is an equally important dimension which has so far received much less attention in the analysis of performance metrics. This dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, scoreuniform, scoredriven, ratedriven and optimal, among others. By calculating the expected loss obtained with these threshold choice methods for a uniform
MIXABILITY IS BAYES RISK CURVATURE RELATIVE TO LOG LOSS
"... Given K codes, a standard result from source coding tells us how to design a single universal code with codelengths within log(K) bits of the best code, on any data sequence. Translated to the online learning setting of prediction with expert advice, this result implies that for logarithmic loss one ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Given K codes, a standard result from source coding tells us how to design a single universal code with codelengths within log(K) bits of the best code, on any data sequence. Translated to the online learning setting of prediction with expert advice, this result implies that for logarithmic loss one can guarantee constant regret, which does not grow with the number of outcomes that need to be predicted. In this setting, it is known for which other losses the same guarantee can be given: these are the losses that are mixable. We show that among the mixable losses, log loss is special: in fact, one may understand the class of mixable losses as those that behave like log loss in an essential way. More specifically, a loss is mixable if and only if the curvature of its Bayes risk is at least as large as the curvature of the Bayes risk for log loss (for which the Bayes risk equals the entropy). 1.
Surrogate losses and regret bounds for costsensitive classification with exampledependent costs
, 2011
"... ..."
Surrogate regret bounds for bipartite ranking via strongly proper losses.
 Journal of Machine Learning Research,
, 2014
"... Abstract The problem of bipartite ranking, where instances are labeled positive or negative and the goal is to learn a scoring function that minimizes the probability of misranking a pair of positive and negative instances (or equivalently, that maximizes the area under the ROC curve), has been wi ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract The problem of bipartite ranking, where instances are labeled positive or negative and the goal is to learn a scoring function that minimizes the probability of misranking a pair of positive and negative instances (or equivalently, that maximizes the area under the ROC curve), has been widely studied in recent years. A dominant theoretical and algorithmic framework for the problem has been to reduce bipartite ranking to pairwise classification; in particular, it is well known that the bipartite ranking regret can be formulated as a pairwise classification regret, which in turn can be upper bounded using usual regret bounds for classification problems. Recently,
Surrogate regret bounds for the area under the ROC curve via strongly proper losses
 In COLT
, 2013
"... The area under the ROC curve (AUC) is a widely used performance measure in machine learning, and has been widely studied in recent years particularly in the context of bipartite ranking. A dominant theoretical and algorithmic framework for AUC optimization/bipartite ranking has been to reduce the ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
The area under the ROC curve (AUC) is a widely used performance measure in machine learning, and has been widely studied in recent years particularly in the context of bipartite ranking. A dominant theoretical and algorithmic framework for AUC optimization/bipartite ranking has been to reduce the problem to pairwise classification; in particular, it is well known that the AUC regret can be formulated as a pairwise classification regret, which in turn can be upper bounded using usual regret bounds for binary classification. Recently, Kotlowski et al. (2011) showed AUC regret bounds in terms of the regret associated with ‘balanced ’ versions of the standard (nonpairwise) logistic and exponential losses. In this paper, we obtain such (nonpairwise) surrogate regret bounds for the AUC in terms of a broad class of proper (composite) losses that we term strongly proper. Our proof technique is considerably simpler than that of Kotlowski et al. (2011), and relies on properties of proper (composite) losses as elucidated recently by Reid and Williamson (2009, 2010, 2011) and others. Our result yields explicit surrogate bounds (with no hidden balancing terms) in terms of a variety of strongly proper losses, including for example logistic, exponential, squared and squared hinge losses. An important consequence is that standard algorithms minimizing a (nonpairwise) strongly proper loss, such as logistic regression and boosting algorithms (assuming a universal function class and appropriate regularization), are in fact AUCconsistent; moreover, our results allow us to quantify the AUC regret in terms of the corresponding surrogate regret. We also obtain tighter surrogate regret bounds under certain lownoise conditions via a recent result of Clémençon and Robbiano (2011).
Proper losses for learning from partial labels
"... This paper discusses the problem of calibrating posterior class probabilities from partially labelled data. Each instance is assumed to be labelled as belonging to one of several candidate categories, at most one of them being true. We generalize the concept of proper loss to this scenario, we estab ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
This paper discusses the problem of calibrating posterior class probabilities from partially labelled data. Each instance is assumed to be labelled as belonging to one of several candidate categories, at most one of them being true. We generalize the concept of proper loss to this scenario, we establish a necessary and sufficient condition for a loss function to be proper, and we show a direct procedure to construct a proper loss for partial labels from a conventional proper loss. The problem can be characterized by the mixing probability matrix relating the true class of the data and the observed labels. The full knowledge of this matrix is not required, and losses can be constructed that are proper for a wide set of mixing probability matrices. 1
Saliency Detection via Divergence Analysis: A Unified Perspective
"... A number of bottomup saliency detection algorithms have been proposed in the literature. Since these have been developed from intuition and principles inspired by psychophysical studies of human vision, the theoretical relations among them are unclear. In this paper, we present a unifying perspecti ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
A number of bottomup saliency detection algorithms have been proposed in the literature. Since these have been developed from intuition and principles inspired by psychophysical studies of human vision, the theoretical relations among them are unclear. In this paper, we present a unifying perspective. Saliency of an image area is defined in terms of divergence between certain feature distributions estimated from the central part and its surround. We show that various, seemingly different saliency estimation algorithms are in fact closely related. We also discuss some commonly used centersurround selection strategies. Experiments with two datasets are presented to quantify the relative advantages of these algorithms. 1