Results 1 -
8 of
8
Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-based Classification
"... This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popularℓ 1-minimization ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popularℓ 1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to theℓ 1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our method matches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8 % in average precision. 1.
Video face matching using subset selection and clustering of probabilistic multi-region histograms
- In International Conference on Image and Vision Computing New Zealand (IVCNZ
, 2010
"... Balancing computational efficiency with recognition accuracy is one of the major challenges in real-world video-based face recognition. A significant design decision for any such system is whether to process and use all possible faces detected over the video frames, or whether to select only a few ‘ ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Balancing computational efficiency with recognition accuracy is one of the major challenges in real-world video-based face recognition. A significant design decision for any such system is whether to process and use all possible faces detected over the video frames, or whether to select only a few ‘best ’ faces. This paper presents a video face recognition system based on probabilistic Multi-Region Histograms to characterise performance trade-offs in: (i) selecting a subset of faces compared to using all faces, and (ii) combining information from all faces via clustering. Three face selection metrics are evaluated for choosing a subset: face detection confidence, random subset, and sequential selection. Experiments on the recently introduced MOBIO dataset indicate that the usage of all faces through clustering always outperformed selecting only a subset of faces. The experiments also show that the face selection metric based on face detection confidence generally provides better recognition performance than random or sequential sampling. Moreover, the optimal number of faces varies drastically across selection metric and subsets of MOBIO. Given the trade-offs between computational effort, recognition accuracy and robustness, it is recommended that face feature clustering would be most advantageous in batch processing (particularly for video-based watchlists), whereas face selection methods should be limited to applications with significant computational restrictions.
Hierarchical Hybrid Statistic based Video Binary Code and Its Application to Face Retrieval in TV-Series
"... Abstract — We address the problem of video face retrieval in TV-Series, which searches video clips based on the presence of particular character, given one video clip of his/hers. This is tremendously challenging because on one hand, faces in TV-Series are captured in largely uncontrolled conditions ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — We address the problem of video face retrieval in TV-Series, which searches video clips based on the presence of particular character, given one video clip of his/hers. This is tremendously challenging because on one hand, faces in TV-Series are captured in largely uncontrolled conditions with complex appearance variations, and on the other hand retrieval task typically needs highly efficient representation with low time and space complexity. To handle such problems, we propose a compact and discriminative binary representation for the huge body of video data based on a novel hierarchical hybrid statistic. Our method, named Hierarchical Hybrid Statistic based Video Binary Code (HHSVBC), first utilizes different parameterized Fisher Vectors (FVs) as frame representation that can encode multi-granularity low level variation information within the frame, and then models the video by its frame covariance matrix to capture high level variation information among video frames. To incorporate discriminative information and obtain more compact video signature, the high-dimensional video representation is further encoded to a much lower-dimensional binary vector, which finally yields the proposed HHSVBC. Specifically, each bit of the code, is produced via supervised learning in a max margin framework, which aims to make a trade-off between code discriminability and stability. Face retrieval experiments on two challenging large scale TV-Series video databases demonstrate the competitiveness of the proposed HHSVBC over state-of-the-art retrieval methods. I.
Face Video Retrieval with Image Query via Hashing across Euclidean Space and Riemannian Manifold
"... Retrieving videos of a specific person given his/her face image as query becomes more and more appealing for applications like smart movie fast-forwards and suspec-t searching. It also forms an interesting but challenging computer vision task, as the visual data to match, i.e., still image and video ..."
Abstract
- Add to MetaCart
(Show Context)
Retrieving videos of a specific person given his/her face image as query becomes more and more appealing for applications like smart movie fast-forwards and suspec-t searching. It also forms an interesting but challenging computer vision task, as the visual data to match, i.e., still image and video clip are usually represented quite different-ly. Typically, face image is represented as point (i.e., vector) in Euclidean space, while video clip is seemingly modeled as a point (e.g., covariance matrix) on some particular Rie-mannian manifold in the light of its recent promising suc-cess. It thus incurs a new hashing-based retrieval problem of matching two heterogeneous representations, respective-ly in Euclidean space and Riemannian manifold. This work makes the first attempt to embed the two heterogeneous s-paces into a common discriminant Hamming space. Specifi-cally, we propose Hashing across Euclidean space and Rie-mannian manifold (HER) by deriving a unified framework to firstly embed the two spaces into corresponding repro-ducing kernel Hilbert spaces, and then iteratively optimize the intra- and inter-space Hamming distances in a max-margin framework to learn the hash functions for the two spaces. Extensive experiments demonstrate the impressive superiority of our method over the state-of-the-art competi-tive hash learning methods. 1.
TAMING WILD FACES: WEB-SCALE, OPEN-UNIVERSE FACE IDENTIFICATION IN STILL AND VIDEO IMAGERY
"... ii With the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e. ..."
Abstract
- Add to MetaCart
(Show Context)
ii With the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e.g. social photo sharing and movie trailers, where people of interest are recognized and all others are ignored. In such a scenario, we must obtain high precision in recognizing the known identities, while accurately rejecting those of no interest. Recent advancements in face recognition research has seen Sparse Representation-based Classification (SRC) advance to the forefront of competing methods. However, its drawbacks, slow speed and sensitivity to variations in pose, illumination, and occlusion, have hindered its wide-spread applicability. The contributions of this dissertation are three-fold: 1. For still-image data, we propose a novel Linearly Approximated Sparse Representation-based Classification (LASRC) algorithm that uses linear regression to perform sample se-lection for l1-minimization, thus harnessing the speed of least-squares and the robustness of
Video surveillance Face recognition
, 2015
"... Watch-list screening Single sample per person Face tracking Online and incremental learning Adaptive appearance modeling a b s t r a c t Systems for still-to-video face recognition (FR) seek to detect the presence of target individuals based on reference facial still images or mug-shots. These syste ..."
Abstract
- Add to MetaCart
(Show Context)
Watch-list screening Single sample per person Face tracking Online and incremental learning Adaptive appearance modeling a b s t r a c t Systems for still-to-video face recognition (FR) seek to detect the presence of target individuals based on reference facial still images or mug-shots. These systems encounter several challenges in video surveillance applications due to variations in capture conditions (e.g., pose, scale, illumination, blur and expression) and to camera inter-operability. Beyond these issues, few reference stills are available during enrollment to design representative facial models of target individuals. Systems for still-to-video FR must therefore rely on adaptation, multiple face representation, or synthetic generation of reference stills to enhance the intra-class variability of face models. Moreover, many FR systems only match high quality faces captured in video, which further reduces the probability of detecting target individuals. Instead of matching faces captured through segmentation to reference stills, this paper exploits Adaptive Appearance Model Tracking (AAMT) to
Multi-observation Face Recognition in Videos based on Label Propagation
"... In order to deal with the huge amount of content gener-ated by social media, especially for indexing and retrieval purposes, the focus shifted from single object recognition to multi-observation object recognition. Of particular inter-est is the problem of face recognition (used as primary cue for p ..."
Abstract
- Add to MetaCart
(Show Context)
In order to deal with the huge amount of content gener-ated by social media, especially for indexing and retrieval purposes, the focus shifted from single object recognition to multi-observation object recognition. Of particular inter-est is the problem of face recognition (used as primary cue for persons ’ identity assessment), since it is highly required by popular social media search engines like Facebook and Youtube. Recently, several approaches for graph-based la-bel propagation were proposed. However, the associated graphs were constructed in an ad-hoc manner (e.g., using the KNN graph) that cannot cope properly with the rapid and frequent changes in data appearance, a phenomenon intrinsically related with video sequences. In this paper, we propose a novel approach for efficient and adaptive graph construction, based on a two-phase scheme: (i) the first phase is used to adaptively find the neighbors of a sample and also to find the adequate weights for the minimization function of the second phase; (ii) in the second phase, the selected neighbors along with their corresponding weights are used to locally and collaboratively estimate the sparse affinity matrix weights. Experimental results performed on Honda Video Database (HVDB) and a subset of video sequences extracted from the popular TV-series ’Friends’ show a distinct advantage of the proposed method over the existing standard graph construction methods. 1.
LI: COMPACT VIDEO CODE FOR ROBUST FACE RETRIEVAL IN TV-SERIES 1 Compact Video Code and Its Application to Robust Face Retrieval in TV-Series
"... We address the problem of video face retrieval in TV-Series which searches video clips based on the presence of specific character, given one video clip of his/hers. This is tremendously challenging because on one hand, faces in TV-Series are captured in large-ly uncontrolled conditions with complex ..."
Abstract
- Add to MetaCart
(Show Context)
We address the problem of video face retrieval in TV-Series which searches video clips based on the presence of specific character, given one video clip of his/hers. This is tremendously challenging because on one hand, faces in TV-Series are captured in large-ly uncontrolled conditions with complex appearance variations, and on the other hand retrieval task typically needs efficient representation with low time and space complexi-ty. To handle this problem, we propose a compact and discriminative representation for the huge body of video data, named Compact Video Code (CVC). Our method first mod-els the video clip by its sample (i.e., frame) covariance matrix to capture the video data variations in a statistical manner. To incorporate discriminative information and obtain more compact video signature, the high-dimensional covariance matrix is further encod-ed as a much lower-dimensional binary vector, which finally yields the proposed CVC. Specifically, each bit of the code, i.e., each dimension of the binary vector, is produced via supervised learning in a max margin framework, which aims to make a balance be-tween the discriminability and stability of the code. Face retrieval experiments on two challenging TV-Series video databases demonstrate the competitiveness of the proposed CVC over state-of-the-art retrieval methods. In addition, as a general video matching algorithm, CVC is also evaluated in traditional video face recognition task on a standard Internet database, i.e., YouTube Celebrities, showing its quite promising performance by using an extremely compact code with only 128 bits. 1