Results 1  10
of
29
Large margin multitask metric learning
 In NIPS. 2010
"... Multitask learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multitask learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Althoug ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
(Show Context)
Multitask learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multitask learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multitask svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multitask setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multitask learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multitask learning. We evaluate the resulting multitask lmnn on realworld insurance data and speech classification problems and show that it consistently outperforms singletask kNN under several metrics and stateoftheart MTL classifiers. 1
Learning via gaussian herding
 In NIPS
"... Abstract We introduce a new family of online learning algorithms based upon constraining the velocity flow over a distribution of weight vectors. In particular, we show how to effectively herd a Gaussian weight vector distribution by trading off velocity constraints with a loss function. By uniform ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Abstract We introduce a new family of online learning algorithms based upon constraining the velocity flow over a distribution of weight vectors. In particular, we show how to effectively herd a Gaussian weight vector distribution by trading off velocity constraints with a loss function. By uniformly bounding this loss function, we demonstrate how to solve the resulting optimization analytically. We compare the resulting algorithms on a variety of real world datasets, and demonstrate how these algorithms achieve stateoftheart robust performance, especially with high label noise in the training data.
OM2: An Online Multiclass Multikernel Learning Algorithm
"... Efficient learning from massive amounts of information is a hot topic in computer vision. Available training sets contain many examples with several visual descriptors, a setting in which current batch approaches are typically slow and does not scale well. In this work we introduce a theoretically m ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Efficient learning from massive amounts of information is a hot topic in computer vision. Available training sets contain many examples with several visual descriptors, a setting in which current batch approaches are typically slow and does not scale well. In this work we introduce a theoretically motivated and efficient online learning algorithm for the Multi Kernel Learning (MKL) problem. For this algorithm we prove a theoretical bound on the number of multiclass mistakes made on any arbitrary data sequence. Moreover, we empirically show that its performance is on par, or better, than standard batch MKL (e.g. SILP, SimpleMKL) algorithms. 1.
Similarity Learning for Provably Accurate Sparse Linear Classification
 In ICML
, 2012
"... In recent years, the crucial importance of metrics in machine learning algorithms has led to an increasing interest for optimizing distance and similarity functions. Most of the state of the art focus on learning Mahalanobis distances (requiring to fulfill a constraint of positive semidefiniteness) ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
In recent years, the crucial importance of metrics in machine learning algorithms has led to an increasing interest for optimizing distance and similarity functions. Most of the state of the art focus on learning Mahalanobis distances (requiring to fulfill a constraint of positive semidefiniteness) for use in a local kNN algorithm. However, no theoretical link is established between the learned metrics and their performance in classification. In this paper, we make use of the formal framework of (ǫ,γ,τ)good similarities introduced by Balcan et al. to design an algorithm for learning a non PSD linear similarity optimized in a nonlinear feature space, which is then used to build a global linear classifier. We show that our approach has uniform stability and derive a generalization bound on the classification error. Experiments performed on various datasets confirm the effectiveness of our approach compared to stateoftheart methods and provide evidence that (i) it is fast, (ii) robust to overfitting and (iii) produces very sparse classifiers. 1.
Online Incremental Feature Learning with Denoising Autoencoders
"... While determining model complexity is an important problem in machine learning, many feature learning algorithms rely on crossvalidation to choose an optimal number of features, which is usually challenging for online learning from a massive stream of data. In this paper, we propose an incremental ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
While determining model complexity is an important problem in machine learning, many feature learning algorithms rely on crossvalidation to choose an optimal number of features, which is usually challenging for online learning from a massive stream of data. In this paper, we propose an incremental feature learning algorithm to determine the optimal model complexity for largescale, online datasets based on the denoising autoencoder. This algorithm is composed of two processes: adding features and merging features. Specifically, it adds new features to minimize the objective function’s residual and merges similar features to obtain a compact feature representation and prevent overfitting. Our experiments show that the proposed model quickly converges to the optimal number of features in a largescale online setting. In classification tasks, our model outperforms the (nonincremental) denoising autoencoder, and deep networks constructed from our algorithm perform favorably compared to deep belief networks and stacked denoising autoencoders. Further, the algorithm is effective in recognizing new patterns when the data distribution changes over time in the massive online data stream. 1
Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning
"... We describe a latent variable model for supervised dimensionality reduction and distance metric learning. The model discovers linear projections of high dimensional data that shrink the distance between similarly labeled inputs and expand the distance between differently labeled ones. The model’s co ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We describe a latent variable model for supervised dimensionality reduction and distance metric learning. The model discovers linear projections of high dimensional data that shrink the distance between similarly labeled inputs and expand the distance between differently labeled ones. The model’s continuous latent variables locate pairs of examples in a latent space of lower dimensionality. The model differs significantly from classical factor analysis in that the posterior distribution over these latent variables is not always multivariate Gaussian. Nevertheless we show that inference is completely tractable and derive an ExpectationMaximization (EM) algorithm for parameter estimation. We also compare the model to other approaches in distance metric learning. The model’s main advantage is its simplicity: at each iteration of the EM algorithm, the distance metric is reestimated by solving an unconstrained leastsquares problem. Experiments show that these simple updates are highly effective. 1
Convex Optimizations for Distance Metric Learning and Pattern Classification
"... The goal of machine learning is to build automated systems that can classify and recognize complex patterns in data. Not surprisingly, the representation of the data plays an important role in determining what types of patterns can be automatically discovered. Many algorithms for machine learning as ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The goal of machine learning is to build automated systems that can classify and recognize complex patterns in data. Not surprisingly, the representation of the data plays an important role in determining what types of patterns can be automatically discovered. Many algorithms for machine learning assume that the data are represented as elements in a metric space. For example, in popular algorithms such as nearestneighbor classification, vector quantization, and kernel density estimation, the metric distances between different examples provide a measure of their dissimilarity [1]. The performance of these algorithms can depend sensitively on the manner in which distances are measured. When data are represented as points in a multidimensional vector space, simple Euclidean distances are often used to measure the dissimilarity between different examples. However, such distances often do not yield reliable judgments; in addition, they cannot highlight the distinctive features that play a role in certain types of classification, but not others. For example, consider two schemes for clustering images of faces: one by age, one by gender. Images can be represented as points in a multidimensional vector space in many ways—for example, by enumerating their pixel values, or by computing color histograms. However the images are represented, different components of these feature vectors are likely to be relevant for clustering by age versus clustering by gender. Naturally, for these different types of clustering, we need different ways of measuring dissimilarity; in particular, we need different metrics for computing distances between feature vectors. This article describes two algorithms for learning such distance metrics based on recent developments in convex optimization. 1
Face Identification Using Referencebased Features with Message Passing Model
, 2012
"... In this paper, we propose a system for face identification. Given two query face images, our task is to tell whether or not they are of the same person. The main contribution of this paper comes from two aspects: (1) We adopt the oneshot similarity kernel [35] for learning the similarity of two fac ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper, we propose a system for face identification. Given two query face images, our task is to tell whether or not they are of the same person. The main contribution of this paper comes from two aspects: (1) We adopt the oneshot similarity kernel [35] for learning the similarity of two face images. The learned similarity measures are then used to map a face image to reference images. (2) We propose a graphbased method for selecting an optimal set of reference images. Instead of directly working on the image features, we use the learned similarity to the reference images as the new features and compute the corresponding matching score of the two query images. Our approach is effective and easy to implement. We show encouraging and favorable results on the “Labeled Faces in the Wild” a challenging data set of faces.
Learning to Match Images in LargeScale Collections
"... Abstract. Many computer vision applications require computing structure and feature correspondence across a large, unorganized image collection. This is a computationally expensive process, because the graph of matching image pairs is unknown in advance, and so methods for quickly and accurately pre ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Many computer vision applications require computing structure and feature correspondence across a large, unorganized image collection. This is a computationally expensive process, because the graph of matching image pairs is unknown in advance, and so methods for quickly and accurately predicting which of the O(n 2) pairs of images match are critical. Image comparison methods such as bagofwords models or global features are often used to predict similar pairs, but can be very noisy. In this paper, we propose a new image matching method that uses discriminative learning techniques—applied to training data gathered automatically during the image matching process—to gradually compute a better similarity measure for predicting whether two images in a given collection overlap. By using such a learned similarity measure, our algorithm can select image pairs that are more likely to match for performing further feature matching and geometric verification, improving the overall efficiency of the matching process. Our approach processes a set of images in an iterative manner, alternately performing pairwise feature matching and learning an improved similarity measure. Our experiments show that our learned measures can significantly improve match prediction over the standard tfidfweighted similarity and more recent unsupervised techniques even with small amounts of training data, and can improve the overall speed of the image matching process by more than a factor of two. 1