Results 1  10
of
681
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
, 2010
"... ..."
(Show Context)
Distance metric learning for large margin nearest neighbor classification
 In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract

Cited by 695 (14 self)
 Add to MetaCart
(Show Context)
We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3 % on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds
 Journal of Machine Learning Research
, 2003
"... The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. ..."
Abstract

Cited by 385 (10 self)
 Add to MetaCart
The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation.
Unsupervised Learning of Image Manifolds by Semidefinite Programming
, 2004
"... Can we detect low dimensional structure in high dimensional data sets of images and video? The problem of dimensionality reduction arises often in computer vision and pattern recognition. In this paper, we propose a new solution to this problem based on semidefinite programming. Our algorithm can be ..."
Abstract

Cited by 270 (9 self)
 Add to MetaCart
Can we detect low dimensional structure in high dimensional data sets of images and video? The problem of dimensionality reduction arises often in computer vision and pattern recognition. In this paper, we propose a new solution to this problem based on semidefinite programming. Our algorithm can be used to analyze high dimensional data that lies on or near a low dimensional manifold. It overcomes certain limitations of previous work in manifold learning, such as Isomap and locally linear embedding. We illustrate the algorithm on easily visualized examples of curves and surfaces, as well as on actual images of faces, handwritten digits, and solid objects.
The Entire Regularization Path for the Support Vector Machine
, 2004
"... The Support Vector Machine is a widely used tool for classification. Many efficient implementations exist for fitting a twoclass SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a ..."
Abstract

Cited by 204 (11 self)
 Add to MetaCart
(Show Context)
The Support Vector Machine is a widely used tool for classification. Many efficient implementations exist for fitting a twoclass SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a default value for the cost parameter, often leading to the least restrictive model. In this paper we argue that the choice of the cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. We illustrate our algorithm on some examples, and use our representation to give further insight into the range of SVM solutions.
Probability product kernels
 Journal of Machine Learning Research
, 2004
"... The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product i ..."
Abstract

Cited by 180 (9 self)
 Add to MetaCart
(Show Context)
The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product is then evaluated as the integral of the product of pairs of distributions. The kernel is straightforward to evaluate for all exponential family models such as multinomials and Gaussians and yields interesting nonlinear kernels. Furthermore, the kernel is computable in closed form for latent distributions such as mixture models, hidden Markov models and linear dynamical systems. For intractable models, such as switching linear dynamical systems, structured meanfield approximations can be brought to bear on the kernel evaluation. For general distributions, even if an analytic expression for the kernel is not feasible, we show a straightforward sampling method to evaluate it. Thus, the kernel permits discriminative learning methods, including support vector machines, to exploit the properties, metrics and invariances of the generative models we infer from each datum. Experiments are shown using multinomial models for text, hidden Markov models for biological data sets and linear dynamical systems for time series data.
Maximum margin clustering
 Advances in Neural Information Processing Systems 17
, 2005
"... We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the ha ..."
Abstract

Cited by 136 (4 self)
 Add to MetaCart
(Show Context)
We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hardclustering constraints can be relaxed to a softclustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semisupervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1
Gaussian processes for ordinal regression
 Journal of Machine Learning Research
, 2004
"... We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation ..."
Abstract

Cited by 116 (4 self)
 Add to MetaCart
(Show Context)
We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and realworld data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.
Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs
 Proceedingsof theSIGKDD Conference. Paris,France
, 2009
"... Malicious Web sites are a cornerstone of Internet criminal activities. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. In this paper, we describe an approach to this problem based on automated URL classification, using statistical me ..."
Abstract

Cited by 106 (7 self)
 Add to MetaCart
(Show Context)
Malicious Web sites are a cornerstone of Internet criminal activities. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. In this paper, we describe an approach to this problem based on automated URL classification, using statistical methods to discover the telltale lexical and hostbased properties of malicious Web site URLs. These methods are able to learn highly predictive models by extracting and automatically analyzing tens of thousands of features potentially indicative of suspicious URLs. The resulting classifiers obtain 95–99 % accuracy, detecting large numbers of malicious Web sites from their URLs, with only modest false positives.
Graph Cut based Inference with Cooccurrence Statistics
"... Abstract. Markov and Conditional random fields (CRFs) used in computer vision typically model only local interactions between variables, as this is computationally tractable. In this paper we consider a class of global potentials defined over all variables in the CRF. We show how they can be readily ..."
Abstract

Cited by 100 (13 self)
 Add to MetaCart
(Show Context)
Abstract. Markov and Conditional random fields (CRFs) used in computer vision typically model only local interactions between variables, as this is computationally tractable. In this paper we consider a class of global potentials defined over all variables in the CRF. We show how they can be readily optimised using standard graph cut algorithms at little extra expense compared to a standard pairwise field. This result can be directly used for the problem of class based image segmentation which has seen increasing recent interest within computer vision. Here the aim is to assign a label to each pixel of a given image from a set of possible object classes. Typically these methods use random fields to model local interactions between pixels or superpixels. One of the cues that helps recognition is global object cooccurrence statistics, a measure of which classes (such as chair or motorbike) are likely to occur in the same image together. There have been several approaches proposed to exploit this property, but all of them suffer from different limitations and typically carry a high computational cost, preventing their application on large images. We find that the new model we propose produces an improvement in the labelling compared to just using a pairwise model. 1