Results 1 - 10
of
23
Attribute and Simile Classifiers for Face Verification
- In IEEE International Conference on Computer Vision (ICCV
, 2009
"... We present two novel methods for face verification. Our first method – “attribute ” classifiers – uses binary classifiers trained to recognize the presence or absence of describable aspects of visual appearance (e.g., gender, race, and age). Our second method – “simile ” classifiers – removes the ma ..."
Abstract
-
Cited by 325 (14 self)
- Add to MetaCart
(Show Context)
We present two novel methods for face verification. Our first method – “attribute ” classifiers – uses binary classifiers trained to recognize the presence or absence of describable aspects of visual appearance (e.g., gender, race, and age). Our second method – “simile ” classifiers – removes the manual labeling required for attribute classification and instead learns the similarity of faces, or regions of faces, to specific reference people. Neither method requires costly, often brittle, alignment between image pairs; yet, both methods produce compact visual descriptions, and work on real-world images. Furthermore, both the attribute and simile classifiers improve on the current state-of-the-art for the LFW data set, reducing the error rates compared to the current best by 23.92 % and 26.34%, respectively, and 31.68 % when combined. For further testing across pose, illumination, and expression, we introduce a new data set – termed PubFig – of real-world images of public figures (celebrities and politicians) acquired from the internet. This data set is both larger (60,000 images) and deeper (300 images per individual) than existing data sets of its kind. Finally, we present an evaluation of human performance. 1.
Is that you? Metric learning approaches for face identification
- In ICCV
, 2009
"... Face identification is the problem of determining whether two face images depict the same person or not. This is difficult due to variations in scale, pose, lighting, background, expression, hairstyle, and glasses. In this paper we present two methods for learning robust distance measures: (a) a log ..."
Abstract
-
Cited by 159 (8 self)
- Add to MetaCart
(Show Context)
Face identification is the problem of determining whether two face images depict the same person or not. This is difficult due to variations in scale, pose, lighting, background, expression, hairstyle, and glasses. In this paper we present two methods for learning robust distance measures: (a) a logistic discriminant approach which learns the metric from a set of labelled image pairs (LDML) and (b) a nearest neighbour approach which computes the probability for two images to belong to the same class (MkNN). We evaluate our approaches on the Labeled Faces in the Wild data set, a large and very challenging data set of faces from Yahoo! News. The evaluation protocol for this data set defines a restricted setting, where a fixed set of positive and negative image pairs is given, as well as an unrestricted one, where faces are labelled by their identity. We are the first to present results for the unrestricted setting, and show that our methods benefit from this richer training data, much more so than the current state-of-the-art method. Our results of 79.3 % and 87.5 % correct for the restricted and unrestricted setting respectively, significantly improve over the current state-of-the-art result of 78.5%. Confidence scores obtained for face identification can be used for many applications e.g. clustering or recognition from a single training example. We show that our learned metrics also improve performance for these tasks. 1.
Probabilistic models for inference about iden‐ tity
- IEEE TPAMI
, 2012
"... Abstract—Many face recognition algorithms use “distance-based ” methods: feature vectors are extracted from each face and distances in feature space are compared to determine matches. In this paper we argue for a fundamentally different approach. We consider each image as having been generated from ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Many face recognition algorithms use “distance-based ” methods: feature vectors are extracted from each face and distances in feature space are compared to determine matches. In this paper we argue for a fundamentally different approach. We consider each image as having been generated from several underlying causes, some of which are due to identity (latent identity variables, or LIVs) and some of which are not. In recognition we evaluate the probability that two faces have the same underlying identity cause. We make these ideas concrete by developing a series of novel generative models which incorporate both within-individual and between-individual variation. We consider both the linear case where signal and noise are represented by a subspace, and the non-linear case where an arbitrary face manifold can be described and noise is position-dependent. We also develop a “tied ” version of the algorithm that allows explicit comparison of faces across quite different viewing conditions. We demonstrate that our model produces results that are comparable or better than the state of the art for both frontal face recognition and face recognition under varying pose.
Building a classification cascade for visual identification from one example
- In International Conference on Computer Vision (ICCV
, 2005
"... Object identification (OID) is specialized recognition where the category is known (e.g. cars) and the algorithm recognizes an object’s exact identity (e.g. Bob’s BMW). Two special challenges characterize OID. (1) Inter-class variation is often small (many cars look alike) and may be dwarfed by illu ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
(Show Context)
Object identification (OID) is specialized recognition where the category is known (e.g. cars) and the algorithm recognizes an object’s exact identity (e.g. Bob’s BMW). Two special challenges characterize OID. (1) Inter-class variation is often small (many cars look alike) and may be dwarfed by illumination or pose changes. (2) There may be many classes but few or just one positive “training ” examples per class. Due to (1), a solution must locate possibly subtle objectspecific salient features (a door handle) while avoiding distracting ones (a specular highlight). However, (2) rules out direct techniques of feature selection. We describe an online algorithm that takes one model image from a known category and builds an efficient “same ” vs. “different ” classification cascade by predicting the most discriminative feature set for that object. Our method not only estimates the saliency and scoring function for each candidate feature, but also models the dependency between features, building an ordered feature sequence unique to a specific model image, maximizing cumulative information content. Learned stopping thresholds make the classifier very efficient. To make this possible, category-specific characteristics are learned automatically in an off-line training procedure from labeled
Talking Pictures: Temporal Grouping and Dialog-Supervised Person Recognition
"... We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of ‘supervision’ are the dialog ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of ‘supervision’ are the dialog cues: first, second and third person references (such as “I’m Jack”, “Hey, Jack! ” and “Jack left”). While this kind of supervision is sparse and indirect, we exploit multiple modalities and their interactions (appearance, dialog, mouth movement, synchrony, continuityediting cues) to effectively resolve identities through local temporal grouping followed by global weakly supervised recognition. We propose a novel temporal grouping model that partitions face tracks across multiple shots while respecting appearance, geometric and film-editing cues and constraints. In this model, states represent partitions of the k most recent face tracks, and transitions represent compatibility of consecutive partitions. We present dynamic programming inference and discriminative learning for the model. The individual face tracks are subsequently assigned a name by learning a classifier from partial label constraints. The weakly supervised classifier incorporates multiple-instance constraints from dialog cues as well as soft grouping constraints from our temporal grouping. We evaluate both the temporal grouping and final character naming on several hours of TV and movies. 1.
Human Action Recognition from a Single Clip per Action
"... Learning-based approaches for human action recognition often rely on large training sets. Most of these approaches do not perform well when only a few training samples are available. In this paper, we consider the problem of human action recognition from a single clip per action. Each clip contains ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Learning-based approaches for human action recognition often rely on large training sets. Most of these approaches do not perform well when only a few training samples are available. In this paper, we consider the problem of human action recognition from a single clip per action. Each clip contains at most 25 frames. Using a patch based motion descriptor and matching scheme, we can achieve promising results on three different action datasets with a single clip as the template. Our results are comparable to previously published results using much larger training sets. We also present a method for learning a transferable distance function for these patches. The transferable distance function learning extracts generic knowledge of patch weighting from previous training sets, and can be applied to videos of new actions without further learning. Our experimental results show that the transferable distance function learning not only improves the recognition accuracy of the single clip action recognition, but also significantly enhances the efficiency of the matching scheme. 1.
Efficient Human Action Detection using a Transferable Distance Function
"... Abstract. In this paper, we address the problem of efficient human action detection with only one template. We choose the standard slidingwindow approach to scan the template video against test videos, and the template video is represented by patch-based motion features. Using generic knowledge lear ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we address the problem of efficient human action detection with only one template. We choose the standard slidingwindow approach to scan the template video against test videos, and the template video is represented by patch-based motion features. Using generic knowledge learnt from previous training sets, we weight the patches on the template video, by a transferable distance function. Based on the patch weighting, we propose a cascade structure which can efficiently scan the template video over test videos. Our method is evaluated on a human action dataset with cluttered background, and a ballet video with complex human actions. The experimental results show that our cascade structure not only achieves very reliable detection, but also can significantly improve the efficiency of patch-based human action detection, with an order of magnitude improvement in efficiency. 1
Object fingerprints for content analysis with applications to street landmark localization
- In ACM Multimedia
, 2008
"... An object can be a basic unit for multimedia content analysis. Besides similarity among common objects, each object has its own unique characteristics which we cannot find in other surrounding objects in multimedia data. We call such unique characteristics object fingerprints. In this paper, we prop ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
An object can be a basic unit for multimedia content analysis. Besides similarity among common objects, each object has its own unique characteristics which we cannot find in other surrounding objects in multimedia data. We call such unique characteristics object fingerprints. In this paper, we propose a novel approach to extract and match object fingerprints for multimedia content analysis. In particular, we focus on the problem of street landmark localization from images. Instead of modeling and matching a street landmark as a whole, our proposed approach extracts the landmark’s object fingerprints in a given image and match to a new image or video in order to localize the landmark. We formulate matching the landmark’s object fingerprints as a classification problem solved by a cascade of 1NN classifiers. We develop a street landmark localization system that combines salient region detection, segmentation, and object fingerprint extraction techniques for the purpose. To evaluate, we have compiled a novel dataset which consists of 15 U.S. street landmarks ’ images and videos. Our experiments on this dataset show superior performance to state-of-the-art recognition algorithms [20, 33]. The proposed approach can also be well generalized to other objects of interest and content analysis tasks. We demonstrate the feasibility through the application of our approach to refine web image search results and obtained encouraging results.
Face Identification Using Reference-based Features with Message Passing Model
, 2012
"... In this paper, we propose a system for face identification. Given two query face images, our task is to tell whether or not they are of the same person. The main contribution of this paper comes from two aspects: (1) We adopt the one-shot similarity kernel [35] for learning the similarity of two fac ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper, we propose a system for face identification. Given two query face images, our task is to tell whether or not they are of the same person. The main contribution of this paper comes from two aspects: (1) We adopt the one-shot similarity kernel [35] for learning the similarity of two face images. The learned similarity measures are then used to map a face image to reference images. (2) We propose a graph-based method for selecting an optimal set of reference images. Instead of directly working on the image features, we use the learned similarity to the reference images as the new features and compute the corresponding matching score of the two query images. Our approach is effective and easy to implement. We show encouraging and favorable results on the “Labeled Faces in the Wild”- a challenging data set of faces.