Results 1 - 10
of
126
Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods
- ADVANCES IN LARGE MARGIN CLASSIFIERS
, 1999
"... The output of a classifier should be a calibrated posterior probability to enable post-processing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. Howev ..."
Abstract
-
Cited by 503 (0 self)
- Add to MetaCart
The output of a classifier should be a calibrated posterior probability to enable post-processing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. However, training with a maximum likelihood score will produce non-sparse kernel machines. Instead, we train an SVM, then train the parameters of an additional sigmoid function to map the SVM outputs into probabilities. This chapter compares classification error rate and likelihood scores for an SVM plus sigmoid versus a kernel method trained with a regularized likelihood error function. These methods are tested on three data-mining-style data sets. The SVM+sigmoid yields probabilities of comparable quality to the regularized maximum likelihood kernel method, while still retaining the sparseness of the SVM.
An Efficient Boosting Algorithm for Combining Preferences
, 1999
"... The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting ..."
Abstract
-
Cited by 383 (13 self)
- Add to MetaCart
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting algorithm for combining preferences called RankBoost. We also describe an efficient implementation of the algorithm for certain natural cases. We discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different WWW search strategies, each of which is a query expansion for a given domain. For this task, we compare the performance of RankBoost to the individual search strategies. The second experiment is a collaborative-filtering task for making movie recommendations. Here, we present results comparing RankBoost to nearest-neighbor and regression algorithms.
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vec ..."
Abstract
-
Cited by 380 (5 self)
- Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vector machine' (RVM), a model of identical functional form to the popular and state-of-the-art `support vector machine' (SVM). We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive accurate prediction models which typically utilise dramatically fewer basis functions than a comparable SVM while oering a number of additional advantages. These include the benets of probabilistic predictions, automatic estimation of `nuisance' parameters, and the facility to utilise arbitrary basis functions (e.g. non-`Mercer' kernels).
Contour and Texture Analysis for Image Segmentation
, 2001
"... This paper provides an algorithm for partitioning grayscale images into disjoint regions of coherent brightness and texture. Natural images contain both textured and untextured regions, so the cues of contour and texture differences are exploited simultaneously. Contours are treated in the interveni ..."
Abstract
-
Cited by 233 (27 self)
- Add to MetaCart
This paper provides an algorithm for partitioning grayscale images into disjoint regions of coherent brightness and texture. Natural images contain both textured and untextured regions, so the cues of contour and texture differences are exploited simultaneously. Contours are treated in the intervening contour framework, while texture is analyzed using textons. Each of these cues has a domain of applicability, so to facilitate cue combination we introduce a gating operator based on the texturedness of the neighborhood at a pixel. Having obtained a local measure of how likely two nearby pixels are to belong to the same region, we use the spectral graph theoretic framework of normalized cuts to find partitions of the image into regions of coherent texture and brightness. Experimental results on a wide range of images are shown.
Ultraconservative Online Algorithms for Multiclass Problems
- Journal of Machine Learning Research
, 2001
"... In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarity-score between each prototype and the input instance and th ..."
Abstract
-
Cited by 175 (18 self)
- Add to MetaCart
In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarity-score between each prototype and the input instance and then sets the predicted label to be the index of the prototype achieving the highest similarity. To design and analyze the learning algorithms in this paper we introduce the notion of ultraconservativeness. Ultraconservative algorithms are algorithms that update only the prototypes attaining similarity-scores which are higher than the score of the correct label's prototype. We start by describing a family of additive ultraconservative algorithms where each algorithm in the family updates its prototypes by finding a feasible solution for a set of linear constraints that depend on the instantaneous similarity-scores. We then discuss a specific online algorithm that seeks a set of prototypes which have a small norm. The resulting algorithm, which we term MIRA (for Margin Infused Relaxed Algorithm) is ultraconservative as well. We derive mistake bounds for all the algorithms and provide further analysis of MIRA using a generalized notion of the margin for multiclass problems.
Model-Based Clustering, Discriminant Analysis, and Density Estimation
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract
-
Cited by 171 (23 self)
- Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", "Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
Geodesic Active Regions and Level Set Methods for Supervised Texture Segmentation
- INTERNATIONAL JOURNAL OF COMPUTER VISION
, 2002
"... This paper presents a novel variational framework to deal with frame partition problems in Computer Vision. This framework exploits boundary and region-based segmentation modules under a curve-based optimization objective function. The task of supervised texture segmentation is considered to demonst ..."
Abstract
-
Cited by 152 (8 self)
- Add to MetaCart
This paper presents a novel variational framework to deal with frame partition problems in Computer Vision. This framework exploits boundary and region-based segmentation modules under a curve-based optimization objective function. The task of supervised texture segmentation is considered to demonstrate the potentials of the proposed framework. The textured feature space is generated by filtering the given textured images using isotropic and anisotropic filters, and analyzing their responses as multi-component conditional probability density functions. The texture segmentation is obtained by unifying region and boundary-based information as an improved Geodesic Active Contour Model. The defined objective function is minimized using a gradient-descent method where a level set approach is used to implement the obtained PDE. According to this PDE, the curve propagation towards the final solution is guided by boundary and region-based segmentation forces, and is constrained by a regularity force. The level set implementation is performed using a fast front propagation algorithm where topological changes are naturally handled. The performance of our method is demonstrated on a variety of synthetic and real textured frames.
Probabilistic Data Association Methods for Tracking Multiple and Compound Visual Objects
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2000
"... We describe a framework that explicitly reasons about data association to improve tracking performance in many difficult visual environments. A hierarchy of tracking strategies results from ascribing ambiguous or missing data to: (1) noise-like visual occurrences; (2) persistent, known scene element ..."
Abstract
-
Cited by 89 (2 self)
- Add to MetaCart
We describe a framework that explicitly reasons about data association to improve tracking performance in many difficult visual environments. A hierarchy of tracking strategies results from ascribing ambiguous or missing data to: (1) noise-like visual occurrences; (2) persistent, known scene elements (i.e. other tracked objects); or (3) persistent, unknown scene elements. First, we introduce a randomized tracking algorithm adapted from an existing probabilistic data association filter (PDAF) that is resistant to clutter and follows agile motion. The algorithm is applied to three different tracking modalities -- homogeneous regions, textured regions, and snakes -- and extensibly defined for straightforward inclusion of other methods. Second, we add the capacity to track multiple objects by adapting to vision a joint PDAF which oversees correspondence choices between same-modality trackers and image features. We then derive a related technique that allows mixed tracker modalities and handles object...
A Data-Clustering Algorithm On Distributed Memory Multiprocessors
- In Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence
, 2000
"... To cluster increasingly massive data sets that are common today in data and text mining, we propose a parallel implementation of the k-means clustering algorithm based on the message passing model. The proposed algorithm exploits the inherent data-parallelism in the k-means algorithm. We analyticall ..."
Abstract
-
Cited by 79 (1 self)
- Add to MetaCart
To cluster increasingly massive data sets that are common today in data and text mining, we propose a parallel implementation of the k-means clustering algorithm based on the message passing model. The proposed algorithm exploits the inherent data-parallelism in the k-means algorithm. We analytically show that the speedup and the scaleup of our algorithm approach the optimal as the number of data points increases. We implemented our algorithm on an IBM POWERparallel SP2 with a maximum of 16 nodes. On typical test data sets, we observe nearly linear relative speedups, for example, 15.62 on 16 nodes, and essentially linear scaleup in the size of the data set and in the number of clusters desired. For a 2 gigabyte test data set, our implementation drives the 16 node SP2 at more than 1.8 gigaflops. Keywords: k-means, data mining, massive data sets, message-passing, text mining. 1 Introduction Data sets measuring in gigabytes and even terabytes are now quite common in data and text minin...
Decision templates for multiple classifier fusion: an experimental comparison
- Pattern Recognition
, 2001
"... Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for ada ..."
Abstract
-
Cited by 77 (7 self)
- Add to MetaCart
Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for adapting the class combiner to the application. c decision templates (one per class) are estimated with the same training set that is used for the set of classifiers. These templates are then matched to the decision profile of new incoming objects by some similarity measure. We compare 11 versions of our model with 14 other techniques for classifier fusion on the Satimage and Phoneme datasets from the database ELENA. Our results show that decision templates based on integral type measures of similarity are superior to the other schemes on both data sets.

