Results 1 - 10
of
32
Distance metric learning for large margin nearest neighbor classification
- In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract
-
Cited by 177 (7 self)
- Add to MetaCart
We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3 % on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Learning low-rank kernel matrices
- In ICML
, 2006
"... Kernel learning plays an important role in many machine learning tasks. However, algorithms for learning a kernel matrix often scale poorly, with running times that are cubic in the number of data points. In this paper, we propose efficient algorithms for learning lowrank kernel matrices; our algori ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
Kernel learning plays an important role in many machine learning tasks. However, algorithms for learning a kernel matrix often scale poorly, with running times that are cubic in the number of data points. In this paper, we propose efficient algorithms for learning lowrank kernel matrices; our algorithms scale linearly in the number of data points and quadratically in the rank of the kernel. We introduce and employ Bregman matrix divergences for rank-deficient matrices—these divergences are natural for our problem since they preserve the rank as well as positive semi-definiteness of the kernel matrix. Special cases of our framework yield faster algorithms for various existing kernel learning problems. Experimental results demonstrate the effectiveness of our algorithms in learning both low-rank and full-rank kernels. 1.
A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification
- In Proc. of CVPR
, 2004
"... In video object classification, insufficient labeled data may at times be easily augmented with pairwise constraints on sample points, i.e, whether they are in the same class or not. In this paper, we proposed a discriminative learning approach which incorporates pairwise constraints into a conventi ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
In video object classification, insufficient labeled data may at times be easily augmented with pairwise constraints on sample points, i.e, whether they are in the same class or not. In this paper, we proposed a discriminative learning approach which incorporates pairwise constraints into a conventional margin-based learning framework. The proposed approach offers several advantages over existing approaches dealing with pairwise constraints. First, as opposed to learning distance metrics, the new approach derives its classification power by directly modeling the decision boundary. Second, most previous work handles labeled data by converting them to pairwise constraints and thus leads to much more computation. The proposed approach can handle pairwise constraints together with labeled data so that the computation is greatly reduced. The proposed approach is evaluated on a people classification task with two surveillance video datasets.
Formulating context-dependent similarity functions
- In ACM International Conference on Multimedia (MM
, 2005
"... Tasks of information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a context-dependent (also application-, data-, and user-dependent) way. In this paper, we present a novel method, which learns ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Tasks of information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a context-dependent (also application-, data-, and user-dependent) way. In this paper, we present a novel method, which learns a distance function by capturing the nonlinear relationships among contextual information provided by the application, data, or user. We show that through a process called the “kernel trick, ” such nonlinear relationships can be learned efficiently in a projected space. In addition to using the kernel trick, we propose two algorithms to further enhance efficiency and effectiveness of function learning. For efficiency, we propose a SMO-like solver to achieve O(N 2) learning performance. For effectiveness, we propose using unsupervised learning in an innovative way to address the challenge of lack of labeled data (contextual information). Theoretically, we substantiate that our method is both sound and optimal. Empirically, we demonstrate that our method is effective and useful.
An efficient algorithm for local distance metric learning
- in Proceedings of AAAI
, 2006
"... Learning application-specific distance metrics from labeled data is critical for both statistical classification and information retrieval. Most of the earlier work in this area has focused on finding metrics that simultaneously optimize compactness and separability in a global sense. Specifically, ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Learning application-specific distance metrics from labeled data is critical for both statistical classification and information retrieval. Most of the earlier work in this area has focused on finding metrics that simultaneously optimize compactness and separability in a global sense. Specifically, such distance metrics attempt to keep all of the data points in each class close together while ensuring that data points from different classes are separated. However, particularly when classes exhibit multimodal data distributions, these goals conflict and thus cannot be simultaneously satisfied. This paper proposes a Local Distance Metric (LDM) that aims to optimize local compactness and local separability. We present an efficient algorithm that employs eigenvector analysis and bound optimization to learn the LDM from training data in a probabilistic framework. We demonstrate that LDM achieves significant improvements in both classification and retrieval accuracy compared to global distance learning and kernel-based KNN.
Kernel relevant component analysis for distance metric learning
- In IEEE International Joint Conference on Neural Networks (IJCNN
, 2005
"... Abstract — Defining a good distance measure between patterns is of crucial importance in many classification and clustering algorithms. Recently, relevant component analysis (RCA) is proposed which offers a simple yet powerful method to learn this distance metric. However, it is confined to linear t ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Abstract — Defining a good distance measure between patterns is of crucial importance in many classification and clustering algorithms. Recently, relevant component analysis (RCA) is proposed which offers a simple yet powerful method to learn this distance metric. However, it is confined to linear transforms in the input space. In this paper, we show that RCA can also be kernelized, which then results in significant improvements when nonlinearities are needed. Moreover, it becomes applicable to distance metric learning for structured objects that have no natural vectorial representation. Besides, it can be used in an incremental setting. Performance of this kernel method is evaluated on both toy and real-world data sets with encouraging results. I.
BoostCluster: Boosting Clustering by Pairwise Constraints
"... Data clustering is an important task in many disciplines. A large number of studies have attempted to improve clustering by using the side information that is often encoded as pairwise constraints. However, these studies focus on designing special clustering algorithms that can effectively exploit t ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Data clustering is an important task in many disciplines. A large number of studies have attempted to improve clustering by using the side information that is often encoded as pairwise constraints. However, these studies focus on designing special clustering algorithms that can effectively exploit the pairwise constraints. We present a boosting framework for data clustering, termed as BoostCluster, that is able to iteratively improve the accuracy of any given clustering algorithm by exploiting the pairwise constraints. The key challenge in designing a boosting framework for data clustering is how to influence an arbitrary clustering algorithm with the side information since clustering algorithms by definition are unsupervised. The proposed framework addresses this problem by dynamically generating new data representations at each iteration that are, on the one hand, adapted to the clustering results at previous iterations by the given algorithm, and on the other hand consistent with the given side information. Our empirical study shows that the proposed boosting framework is effective in improving the performance of a number of popular clustering algorithms (Kmeans, partitional SingleLink, spectral clustering), and its performance is comparable to the state-of-the-art algorithms for data clustering with side information.
Formulating Distance Functions via the Kernel Trick
- In Conf. on Knowledge Discovery and Data Mining (KDD
, 2005
"... Tasks of data mining and information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a contextdependent (also application-, data-, and user-dependent) way. In this paper, we propose to learn a di ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Tasks of data mining and information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a contextdependent (also application-, data-, and user-dependent) way. In this paper, we propose to learn a distance function by capturing the nonlinear relationships among contextual information provided by the application, data, or user. We show that through a process called the “kernel trick, ” such nonlinear relationships can be learned efficiently in a projected space. Theoretically, we substantiate that our method is both sound and optimal. Empirically, using several datasets and applications, we demonstrate that our method is effective and useful.
Low-Rank Kernel Learning with Bregman Matrix Divergences
"... In this paper, we study low-rank matrix nearness problems, with a focus on learning lowrank positive semidefinite (kernel) matrices for machine learning applications. We propose efficient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. E ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this paper, we study low-rank matrix nearness problems, with a focus on learning lowrank positive semidefinite (kernel) matrices for machine learning applications. We propose efficient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. Existing algorithms for learning kernel matrices often scale poorly, with running times that are cubic in the number of data points. We employ Bregman matrix divergences as the measures of nearness—these divergences are natural for learning low-rank kernels since they preserve rank as well as positive semidefiniteness. Special cases of our framework yield faster algorithms for various existing learning problems, and experimental results demonstrate that our algorithms can effectively learn both low-rank and full-rank kernel matrices.

