Results 1  10
of
105
The pyramid match kernel: Discriminative classification with sets of image features
 IN ICCV
, 2005
"... Discriminative learning is challenging when examples are sets of features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernelbased classification methods can learn complex decision boundaries, but a kernel over unordered set inputs must somehow solve for correspondenc ..."
Abstract

Cited by 544 (29 self)
 Add to MetaCart
(Show Context)
Discriminative learning is challenging when examples are sets of features, and the sets vary in cardinality and lack any sort of meaningful ordering. Kernelbased classification methods can learn complex decision boundaries, but a kernel over unordered set inputs must somehow solve for correspondences – generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function which maps unordered feature sets to multiresolution histograms and computes a weighted histogram intersection in this space. This “pyramid match” computation is linear in the number of features, and it implicitly finds correspondences based on the finest resolution histogram cell where a matched pair first appears. Since the kernel does not penalize the presence of extra features, it is robust to clutter. We show the kernel function is positivedefinite, making it valid for use in learning algorithms whose optimal solutions are guaranteed only for Mercer kernels. We demonstrate our algorithm on object recognition tasks and show it to be accurate and dramatically faster than current approaches.
The pyramid match kernel: Efficient learning with sets of features
 Journal of Machine Learning Research
, 2007
"... In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kern ..."
Abstract

Cited by 136 (10 self)
 Add to MetaCart
(Show Context)
In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multiresolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.
Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—We address the problem of comparing sets of images for object recognition, where the sets may represent variations in an object’s appearance due to changing camera pose and lighting conditions. Canonical Correlations (also known as principal or canonical angles), which can be thought of as ..."
Abstract

Cited by 130 (11 self)
 Add to MetaCart
(Show Context)
Abstract—We address the problem of comparing sets of images for object recognition, where the sets may represent variations in an object’s appearance due to changing camera pose and lighting conditions. Canonical Correlations (also known as principal or canonical angles), which can be thought of as the angles between two ddimensional subspaces, have recently attracted attention for image set matching. Canonical correlations offer many benefits in accuracy, efficiency, and robustness compared to the two main classical methods: parametric distributionbased and nonparametric samplebased matching of sets. Here, this is first demonstrated experimentally for reasonably sized data sets using existing methods exploiting canonical correlations. Motivated by their proven effectiveness, a novel discriminative learning method over sets is proposed for set classification. Specifically, inspired by classical Linear Discriminant Analysis (LDA), we develop a linear discriminant function that maximizes the canonical correlations of withinclass sets and minimizes the canonical correlations of betweenclass sets. Image sets transformed by the discriminant function are then compared by the canonical correlations. Classical orthogonal subspace method (OSM) is also investigated for the similar purpose and compared with the proposed method. The proposed method is evaluated on various object recognition problems using face image sets with arbitrary motion captured under different illuminations and image sets of 500 general objects taken at different views. The method is also applied to object category recognition using ETH80 database. The proposed method is shown to outperform the stateoftheart methods in terms of accuracy and efficiency. Index Terms—Object recognition, face recognition, image sets, canonical correlation, principal angles, canonical correlation analysis, linear discriminant analysis, orthogonal subspace method. Ç 1
Face Recognition with Image Sets Using Manifold Density Divergence
, 2005
"... In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semiparametric model for learning probability densities confined to h ..."
Abstract

Cited by 108 (16 self)
 Add to MetaCart
In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semiparametric model for learning probability densities confined to highly nonlinear but intrinsically lowdimensional manifolds. The model leads to a statistical formulation of the recognition problem in terms of minimizing the divergence between densities estimated on these manifolds. The proposed method is evaluated on a large data set, acquired in realistic imaging conditions with severe illumination variation. Our algorithm is shown to match the best and outperform other stateoftheart algorithms in the literature, achieving 94% recognition rate on average.
ManifoldManifold Distance with Application to Face Recognition Based
 on Image Set,” Proc. 2008 IEEE Conference on Computer Vision and Pattern Recognition
, 2008
"... In this paper, we address the problem of classifying image sets, each of which contains images belonging to the same class but covering large variations in, for instance, viewpoint and illumination. We innovatively formulate the problem as the computation of ManifoldManifold Distance (MMD), i.e., c ..."
Abstract

Cited by 75 (7 self)
 Add to MetaCart
(Show Context)
In this paper, we address the problem of classifying image sets, each of which contains images belonging to the same class but covering large variations in, for instance, viewpoint and illumination. We innovatively formulate the problem as the computation of ManifoldManifold Distance (MMD), i.e., calculating the distance between nonlinear manifolds each representing one image set. To compute MMD, we also propose a novel manifold learning approach, which expresses a manifold by a collection of local linear models, each depicted by a subspace. MMD is then converted to integrating the distances between pair of subspaces respectively from one of the involved manifolds. The proposed MMD method is evaluated on the task of Face Recognition based on Image Set (FRIS). In FRIS, each known subject is enrolled with a set of facial images and modeled as a gallery manifold, while a testing subject is modeled as a probe manifold, which is then matched against all the gallery manifolds by MMD. Identification is achieved by seeking the minimum MMD. Experimental results on two public face databases, Honda/UCSD and CMU MoBo, demonstrate that the proposed MMD method outperforms the competing methods. 1.
C.: Efficient match kernels between sets of features for visual recognition
 In: NIPS (2009
"... sminchisescu.ins.unibonn.de In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bagofwords representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local f ..."
Abstract

Cited by 62 (17 self)
 Add to MetaCart
(Show Context)
sminchisescu.ins.unibonn.de In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bagofwords representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. However, it is impractical to use such kernels for large datasets due to their significant computational cost. To address this problem, we propose efficient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. The local feature maps are learned so their inner products preserve, to the best possible, the values of the specified kernel function. Classifiers based on EMK are linear both in the number of images and in the number of local features. We demonstrate that EMK are extremely efficient and achieve the current state of the art in three difficult computer vision datasets: Scene15, Caltech101 and Caltech256. 1
Object Classification from a Single Example Utilizing Class Relevance Metrics
 In Advances in Neural Information Processing Systems (NIPS
, 2004
"... We describe a framework for learning an object classifier from a single example, by emphasizing relevant dimensions using available examples of related classes. Learning to accurately classify objects from a single training example is often unfeasible due to overfitting effects. However, if the ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
(Show Context)
We describe a framework for learning an object classifier from a single example, by emphasizing relevant dimensions using available examples of related classes. Learning to accurately classify objects from a single training example is often unfeasible due to overfitting effects. However, if the instance representation provides that the distance between each two instances of the same class is smaller than the distance between any two instances from different classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. We therefore suggest a two stage strategy. First, learn a metric over the instances that achieves the distance criterion mentioned above, from available examples of other related classes. Then, using the single examples, define a nearest neighbor classifier where distance is evaluated by the learned class relevance metric. Finding a metric that emphasizes the relevant dimensions for classification might not be possible when restricted to linear projections. We therefore make use of a kernel based metric learning algorithm. Our setting encodes object instances as sets of locality based descriptors and adopts an appropriate image kernel for the class relevance metric learning. The proposed framework for learning from a single example is demonstrated in a synthetic setting and on a character classification task.
Face Recognition in Subspaces
 IN: S.Z. LI, A.K. JAIN (EDS.), HANDBOOK OF FACE RECOGNITION
, 2004
"... Images of faces, represented as highdimensional pixel arrays, often belong to a manifold of intrinsically low dimension. Face recognition, and computer vision research in general, has witnessed a growing interest in techniques that capitalize on this observation, and apply algebraic and statisti ..."
Abstract

Cited by 44 (0 self)
 Add to MetaCart
Images of faces, represented as highdimensional pixel arrays, often belong to a manifold of intrinsically low dimension. Face recognition, and computer vision research in general, has witnessed a growing interest in techniques that capitalize on this observation, and apply algebraic and statistical tools for extraction and analysis of the underlying manifold. In this chapter we describe in roughly chronological order techniques that identify, parameterize and analyze linear and nonlinear subspaces, from the original Eigenfaces technique to the recently introduced Bayesian method for probabilistic similarity analysis, and discuss comparative experimental evaluation of some of these techniques. We also discuss practical issues related to the application of subspace methods for varying pose, illumination and expression.
Sparse Approximated Nearest Points for Image Set Classification
"... Classification based on image sets has recently attracted great research interest as it holds more promise than single image based classification. In this paper, we propose an efficient and robust algorithm for image set classification. An image set is represented as a triplet: a number of image sam ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
(Show Context)
Classification based on image sets has recently attracted great research interest as it holds more promise than single image based classification. In this paper, we propose an efficient and robust algorithm for image set classification. An image set is represented as a triplet: a number of image samples, their mean and an affine hull model. The affine hull model is used to account for unseen appearances in the form of affine combinations of sample images. We introduce a novel betweenset distance called Sparse Approximated Nearest Point (SANP) distance. Unlike existing methods, the dissimilarity of two sets is measured as the distance between their nearest points, which can be sparsely approximated from the image samples of their respective set. Different from standard sparse modeling of a single image, this novel sparse formulation for the image set enforces sparsity on the sample coefficients rather than the model coefficients and jointly optimizes the nearest points as well as their sparse approximations. A convex formulation for searching the optimal SANP between two sets is proposed and the accelerated proximal gradient method is adapted to efficiently solve this optimization. Experimental evaluation was performed on the Honda, MoBo and Youtube datasets. Comparison with existing techniques shows that our method consistently achieves better results. Figure 1. Sparse Approximated Nearest Points (SANPs) of two image sets. Given the affine hull models (µi,Ui) and (µj,Uj) of two image sets, the points on each set can be represented as a linear combination of bases plus the mean image. They can also be represented as the linear combination of sample images. The SANPs are dynamically chosen by the joint optimization which simultaneously searches for sparse approximated points (maximize sparsity of sample coefficients) that are the nearest (minimize distance) between the two sets. The optimal SANPs of the two image sets are shown in the center, each of which is sparsely approximated by the sample images marked with red boxes. 1.
Binetcauchy kernels on dynamical systems and its application to the analysis of dynamic scenes
 International Journal of Computer Vision
, 2005
"... Abstract. We derive a family of kernels on dynamical systems by applying the BinetCauchy theorem to trajectories of states. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, ..."
Abstract

Cited by 42 (15 self)
 Add to MetaCart
(Show Context)
Abstract. We derive a family of kernels on dynamical systems by applying the BinetCauchy theorem to trajectories of states. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. In the case of linear timeinvariant systems, we derive explicit formulae for computing the proposed BinetCauchy kernels by solving Sylvester equations, and relate the proposed kernels to existing kernels based on cepstrum coefficients and subspace angles. Besides their theoretical appeal, these kernels can be used efficiently in the comparison of video sequences of dynamic scenes that can be modeled as the output of a linear timeinvariant dynamical system. One advantage of our kernels is that they take the initial conditions of the dynamical systems into account. As a first example, we use our kernels to compare video sequences of dynamic textures. As a second example, we apply our kernels to the problem of clustering short clips of a movie. Experimental evidence shows superior performance of our kernels. Keywords: BinetCauchy theorem, ARMA models and dynamical systems, Sylvester