Results 1 - 10
of
10
Reverse Training: An efficient Approach for Image Set Classification
"... Abstract. This paper introduces a new approach, called reverse train-ing, to efficiently extend binary classifiers for the task of multi-class image set classification. Unlike existing binary to multi-class extension strate-gies, which require multiple binary classifiers, the proposed approach is ve ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. This paper introduces a new approach, called reverse train-ing, to efficiently extend binary classifiers for the task of multi-class image set classification. Unlike existing binary to multi-class extension strate-gies, which require multiple binary classifiers, the proposed approach is very efficient since it trains a single binary classifier to optimally discrim-inate the class of the query image set from all others. For this purpose, the classifier is trained with the images of the query set (labelled positive) and a randomly sampled subset of the training data (labelled negative). The trained classifier is then evaluated on rest of the training images. The class of these images with their largest percentage classified as positive is predicted as the class of the query image set. The confidence level of the prediction is also computed and integrated into the proposed approach to further enhance its robustness and accuracy. Extensive experiments and comparisons with existing methods show that the proposed approach achieves state of the art performance for face and object recognition on a number of datasets.
A hierarchical training and identification method using gaussian process models for face recognition in videos
- In The 11th IEEE International Conference on Automatic Face and Gesture Recognition
"... Abstract — In a video based face identification task, a se-quence of frames can be utilized to identify the subject in the video. The information extracted from frames can provide samples of the subject in different head poses and facial ex-pressions and under various lighting conditions which enric ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract — In a video based face identification task, a se-quence of frames can be utilized to identify the subject in the video. The information extracted from frames can provide samples of the subject in different head poses and facial ex-pressions and under various lighting conditions which enriches the training process. However, some of these frames may not be useful for identification due to noise from various sources (such as occlusion, low resolution, and face tracking errors). It is important to reduce the effect of noisy samples by designing a representation structure that is capable of alleviating the noise in each sequence, complemented by developing a recognition procedure that rejects the wrong decisions affected by noise. In this paper we propose a sequence representation called Ensemble of Abstract Sequence Representatives (EASR) that is aimed at reducing the effect of noisy frames in a sequence. EASRs are used to guide the sampling process in a learning scheme called specialization – generalization which is used to train an ensemble of binary Gaussian Process (GP) models. Identification is done using: (i) the similarity between the EASRs of the gallery and probe images, and (ii) the label provided by the ensemble of GP classifier models. Evaluation of our approach on three publicly available benchmark datasets demonstrates significantly better performance compared to the state-of-the-art. I.
Accio: A Data Set for Face Track Retrieval in Movies Across Age
"... Video face recognition is a very popular task and has come a long way. The primary challenges such as illumination, resolution and pose are well studied through multiple data sets. However there are no video-based data sets dedicated to study the effects of aging on facial appearance. We present a c ..."
Abstract
- Add to MetaCart
(Show Context)
Video face recognition is a very popular task and has come a long way. The primary challenges such as illumination, resolution and pose are well studied through multiple data sets. However there are no video-based data sets dedicated to study the effects of aging on facial appearance. We present a challenging face track data set, Harry Potter Movies Aging Data set (Accio1), to study and develop age invariant face recognition methods for videos. Our data set not only has strong challenges of pose, illumination and distractors, but also spans a period of ten years providing substantial variation in facial appearance. We propose two primary tasks: within and across movie face track retrieval; and two protocols which differ in their freedom to use external data. We present baseline results for the retrieval performance using a state-of-the-art face track descrip-tor. Our experiments show clear trends of reduction in performance as the age gap between the query and database increases. We will make the data set publicly available for further exploration in age-invariant video face recognition. 1.
FACE-BASED ACTIVE AUTHENTICATION ONMOBILE DEVICES
"... As mobile devices are becoming more ubiquitous, it becomes impor-tant to continuously verify the identity of the user during all interac-tions rather than just at login time. This paper investigates the effec-tiveness of methods for fully-automatic face recognition in solving the Active Authenticati ..."
Abstract
- Add to MetaCart
(Show Context)
As mobile devices are becoming more ubiquitous, it becomes impor-tant to continuously verify the identity of the user during all interac-tions rather than just at login time. This paper investigates the effec-tiveness of methods for fully-automatic face recognition in solving the Active Authentication (AA) problem for smartphones. We report the results of face authentication using videos recorded by the front camera. The videos were acquired while the users were performing a number of tasks under three different ambient conditions to cap-ture the type of variations caused by the ’mobility ’ of the devices. An inspection of these videos reveal a combination of favorable and challenging properties unique to smartphone face videos. In addition to the variations caused by the mobility of the device, other chal-lenges in the dataset include partial faces, occasional pose changes, blur and face/fiducial points localization errors. We evaluate still image and image set-based authentication algorithms using intensity features extracted around fiducial points. The recognition rates drop dramatically when enrollment and test videos come from different sessions. We will make the dataset and the computed features pub-licly available to help the design of algorithms that are more robust to variations due to factors mentioned above. Index Terms — Face recognition, mobile devices, active authen-tication, biometrics recognition. 1.
Patch Set Based Representation for Alignment-Free Image Set Classification
"... Abstract—This paper presents a patch set based sparse repre-sentation for image set classification. Compared with image-based image set representation, our patch set based representation is alignment-free and thus has an advantage for the tasks like video-based face recognition, image set based obje ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—This paper presents a patch set based sparse repre-sentation for image set classification. Compared with image-based image set representation, our patch set based representation is alignment-free and thus has an advantage for the tasks like video-based face recognition, image set based object recognition, and video-based hand gesture recognition, where precious alignment is usually difficult or even impossible due to large variance in view angle or pose. Specifically, to bypass the alignment issue, we propose to adopt the patch based image set representation by dividing each image within each set into patches, then we cluster all the training patches into multiple clusters and classify the test patches based on the cluster centers of training patches. The labels of test patches within each cluster are inferred from a Patch Set based Sparse Representation for Classification (PS-SRC), and the labels of all test patches from all the clusters are then aggregated to predict a single label for the test set. Experimental results on video-based face recognition datasets (CMU-MoBo and Youtube Celebrities), image set based object recognition dataset (ETH-80) and video-based hand gesture recognition dataset (Kinect Hand Gesture) demonstrate that our proposed method consistently outperforms all existing ones, and the improvement is very significant on the Youtube Celebrities and Kinect Hand Gesture datasets. Moreover, we also quantitatively show the robustness of our method to misalignment on the Mutli-PIE dataset. Index Terms—image set classification; patch set based repre-sentation; alignment-free; video-based face recognition. I.
Multi-observation Face Recognition in Videos based on Label Propagation
"... In order to deal with the huge amount of content gener-ated by social media, especially for indexing and retrieval purposes, the focus shifted from single object recognition to multi-observation object recognition. Of particular inter-est is the problem of face recognition (used as primary cue for p ..."
Abstract
- Add to MetaCart
(Show Context)
In order to deal with the huge amount of content gener-ated by social media, especially for indexing and retrieval purposes, the focus shifted from single object recognition to multi-observation object recognition. Of particular inter-est is the problem of face recognition (used as primary cue for persons ’ identity assessment), since it is highly required by popular social media search engines like Facebook and Youtube. Recently, several approaches for graph-based la-bel propagation were proposed. However, the associated graphs were constructed in an ad-hoc manner (e.g., using the KNN graph) that cannot cope properly with the rapid and frequent changes in data appearance, a phenomenon intrinsically related with video sequences. In this paper, we propose a novel approach for efficient and adaptive graph construction, based on a two-phase scheme: (i) the first phase is used to adaptively find the neighbors of a sample and also to find the adequate weights for the minimization function of the second phase; (ii) in the second phase, the selected neighbors along with their corresponding weights are used to locally and collaboratively estimate the sparse affinity matrix weights. Experimental results performed on Honda Video Database (HVDB) and a subset of video sequences extracted from the popular TV-series ’Friends’ show a distinct advantage of the proposed method over the existing standard graph construction methods. 1.
Hierarchical Hybrid Statistic based Video Binary Code and Its Application to Face Retrieval in TV-Series
"... Abstract — We address the problem of video face retrieval in TV-Series, which searches video clips based on the presence of particular character, given one video clip of his/hers. This is tremendously challenging because on one hand, faces in TV-Series are captured in largely uncontrolled conditions ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — We address the problem of video face retrieval in TV-Series, which searches video clips based on the presence of particular character, given one video clip of his/hers. This is tremendously challenging because on one hand, faces in TV-Series are captured in largely uncontrolled conditions with complex appearance variations, and on the other hand retrieval task typically needs highly efficient representation with low time and space complexity. To handle such problems, we propose a compact and discriminative binary representation for the huge body of video data based on a novel hierarchical hybrid statistic. Our method, named Hierarchical Hybrid Statistic based Video Binary Code (HHSVBC), first utilizes different parameterized Fisher Vectors (FVs) as frame representation that can encode multi-granularity low level variation information within the frame, and then models the video by its frame covariance matrix to capture high level variation information among video frames. To incorporate discriminative information and obtain more compact video signature, the high-dimensional video representation is further encoded to a much lower-dimensional binary vector, which finally yields the proposed HHSVBC. Specifically, each bit of the code, is produced via supervised learning in a max margin framework, which aims to make a trade-off between code discriminability and stability. Face retrieval experiments on two challenging large scale TV-Series video databases demonstrate the competitiveness of the proposed HHSVBC over state-of-the-art retrieval methods. I.
THE REQUIREMENTS FOR THE DEGREE OF
"... The wealth of information extracted from a sequence of frames in a video provides samples of the subject in different illuminations, head poses, and facial expressions. However, various sources can impose noise on data (e.g., occlusion, low resolution, and face detection failure). In this thesis, a ..."
Abstract
- Add to MetaCart
The wealth of information extracted from a sequence of frames in a video provides samples of the subject in different illuminations, head poses, and facial expressions. However, various sources can impose noise on data (e.g., occlusion, low resolution, and face detection failure). In this thesis, a novel framework is proposed that employs the well-studied concepts in quantum probability theory to design a representation structure capable of making inferences with multiple sources of uncertainty. The dual extension of this framework is aimed at reducing the effect of noisy frames in a video. It is also used to guide the sampling process in a novel learning scheme, called specialization generalization, which is designed to support efficient learning, as well as neutralizing the effect of noisy samples in the identification process. The contributions of this thesis are not method-specific and can be utilized for enhancement of other face identification approaches in the literature. i Contents Abstract i
1Robust Face Recognition via Adaptive Sparse Representation
"... Abstract—Sparse Representation (or coding) based Classifica-tion (SRC) has gained great success in face recognition in recent years. However, SRC emphasizes the sparsity too much and over-looks the correlation information which has been demonstrated to be critical in real-world face recognition prob ..."
Abstract
- Add to MetaCart
Abstract—Sparse Representation (or coding) based Classifica-tion (SRC) has gained great success in face recognition in recent years. However, SRC emphasizes the sparsity too much and over-looks the correlation information which has been demonstrated to be critical in real-world face recognition problems. Besides, some work considers the correlation but overlooks the discriminative ability of sparsity. Different from these existing techniques, in this paper, we propose a framework called Adaptive Sparse Rep-resentation based Classification (ASRC) in which sparsity and correlation are jointly considered. Specifically, when the samples are of low correlation, ASRC selects the most discriminative samples for representation, like SRC; when the training samples are highly correlated, ASRC selects most of the correlated and discriminative samples for representation, rather than choosing some related samples randomly. In general, the representation model is adaptive to the correlation structure, which benefits from both `1-norm and `2-norm. Extensive experiments conducted on publicly available data sets verify the effectiveness and robustness of the proposed algorithm by comparing it with state-of-the-art methods. Index Terms—sparse representation based classification, trace Lasso, correlation, face recognition. I.
Face Recognition in Videos by Label Propagation
"... Abstract—We consider the problem of automatic identification of faces in videos such as movies, given a dictionary of known faces from a public or an alternate database. This has applications in video indexing, content based search, surveillance, and real time recognition on wearable computers. We p ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—We consider the problem of automatic identification of faces in videos such as movies, given a dictionary of known faces from a public or an alternate database. This has applications in video indexing, content based search, surveillance, and real time recognition on wearable computers. We propose a two stage approach for this problem. First, we recognize the faces in a video using a sparse representation framework using l1-minimization and select a few key-frames based on a robust confidence measure. We then use transductive learning to propagate the labels from the key-frames to the remaining frames by incorporating constraints simultaneously in temporal and feature spaces. This is in contrast to some of the previous approaches where every test frame/track is identified independently, ignoring the correlation between the faces in video tracks. Having a few key frames belonging to few subjects for label propagation rather than a large dictionary of actors reduces the amount of confusion. We evaluate the performance of our algorithm on Movie Trailer face dataset and five movie clips, and achieve a significant improvement in labeling accuracy compared to previous approaches. I.