Results 1 - 10
of
17
L.: Deepface: Closing the gap to human-level performance in face verification
- In: IEEE CVPR
, 2014
"... In modern face recognition, the conventional pipeline consists of four stages: detect ⇒ align ⇒ represent ⇒ clas-sify. We revisit both the alignment step and the representa-tion step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face represe ..."
Abstract
-
Cited by 103 (4 self)
- Add to MetaCart
(Show Context)
In modern face recognition, the conventional pipeline consists of four stages: detect ⇒ align ⇒ represent ⇒ clas-sify. We revisit both the alignment step and the representa-tion step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network. This deep network involves more than 120 million parameters using several locally connected layers without weight shar-ing, rather than the standard convolutional layers. Thus we trained it on the largest facial dataset to-date, an iden-tity labeled dataset of four million facial images belong-ing to more than 4,000 identities. The learned representa-tions coupling the accurate model-based alignment with the large facial database generalize remarkably well to faces in unconstrained environments, even with a simple classifier. Our method reaches an accuracy of 97.25 % on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 25%, closely approach-ing human-level performance. 1.
Deep learning face representation by joint identification-verification
- in Advances in Neural Information Processing Systems
, 2014
"... The key challenge of face recognition is to develop effective feature repre-sentations for reducing intra-personal variations while enlarging inter-personal differences. In this paper, we show that it can be well solved with deep learning and using both face identification and verification signals a ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
(Show Context)
The key challenge of face recognition is to develop effective feature repre-sentations for reducing intra-personal variations while enlarging inter-personal differences. In this paper, we show that it can be well solved with deep learning and using both face identification and verification signals as supervision. The Deep IDentification-verification features (DeepID2) are learned with carefully designed deep convolutional networks. The face identification task increases the inter-personal variations by drawing DeepID2 extracted from different identities apart, while the face verification task reduces the intra-personal variations by pulling DeepID2 extracted from the same identity together, both of which are essential to face recognition. The learned DeepID2 features can be well generalized to new identities unseen in the training data. On the challenging LFW dataset [11], 99.15 % face verification accuracy is achieved. Compared with the best deep learning result [21] on LFW, the error rate has been significantly reduced by 67%. 1
Surpassing human-level face verification performance on LFW with GaussianFace
, 2014
"... Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problem is exacerbated when we rely unrealistically on a single training data source, which is often insufficient to cover the intrinsically c ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problem is exacerbated when we rely unrealistically on a single training data source, which is often insufficient to cover the intrinsically complex face variations. This paper pro-poses a principled multi-task learning approach based on Discriminative Gaussian Process Latent Variable Model, named GaussianFace, to enrich the diversity of training data. In comparison to existing methods, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification in an unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. Extensive experiments demonstrate the effectiveness of the proposed model in learning from diverse data sources and generalize to unseen domain. Specifically, the accuracy of our algorithm achieves an impressive accuracy rate of 98.52 % on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark [23]. For the first time, the human-level performance in face verification (97.53%) [28] on LFW is surpassed. 1 1.
Eigen-PEP for Video Face Recognition
"... Abstract. To effectively solve the problem of large scale video face recognition, we argue for a comprehensive, compact, and yet flexible rep-resentation of a face subject. It shall comprehensively integrate the visual information from all relevant video frames of the subject in a compact form. It s ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
Abstract. To effectively solve the problem of large scale video face recognition, we argue for a comprehensive, compact, and yet flexible rep-resentation of a face subject. It shall comprehensively integrate the visual information from all relevant video frames of the subject in a compact form. It shall also be flexible to be incrementally updated, incorporating new or retiring obsolete observations. In search for such a representa-tion, we present the Eigen-PEP that is built upon the recent success of the probabilistic elastic part (PEP) model. It first integrates the informa-tion from relevant video sources by a part-based average pooling through the PEP model, which produces an intermediate high dimensional, part-based, and pose-invariant representation. We then compress the inter-mediate representation through principal component analysis, and only a number of principal eigen dimensions are kept (as small as 100). We evaluate the Eigen-PEP representation both for video-based face ver-ification and identification on the YouTube Faces Dataset and a new Celebrity-1000 video face dataset, respectively. On YouTube Faces, we further improve the state-of-the-art recognition accuracy. On Celebrity-1000, we lead the competing baselines by a significant margin while of-fering a scalable solution that is linear with respect to the number of subjects. (a) LFW (b) YouTube Faces (c) Celebrity-1000 Fig. 1. Sample images in three unconstrained face recognition datasets: the image-
Unconstrained face recognition: Identifying a person of interest from a media collection
, 2014
"... Abstract—As face recognition applications progress from con-strained sensing and cooperative subjects scenarios (e.g., driver’s license and passport photos) to unconstrained scenarios with uncooperative subjects (e.g., video surveillance), new challenges are encountered. These challenges are due to ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
Abstract—As face recognition applications progress from con-strained sensing and cooperative subjects scenarios (e.g., driver’s license and passport photos) to unconstrained scenarios with uncooperative subjects (e.g., video surveillance), new challenges are encountered. These challenges are due to variations in ambient illumination, image resolution, background clutter, facial pose, expression, and occlusion. In forensic investigations where the goal is to identify a “person of interest, ” often based on low quality face images and videos, we need to utilize whatever source of information is available about the person. This could include one or more video tracks, multiple still images captured by bystanders (using, for example, their mobile phones), 3D face models, and verbal descriptions of the subject provided by witnesses. These verbal descriptions can be used to generate a face sketch and provide ancillary information about the person of interest (e.g., gender, race, and age). While traditional face matching methods take single media (i.e., a still face image, video track, or face sketch) as input, our work considers using the entire gamut of media collection as a probe to generate a single candidate list for the person of interest. We show that the proposed approach boosts the likelihood of correctly identifying the person of interest through the use of different fusion schemes, 3D face models, and incorporation of quality measures for fusion and video frame selection. Index Terms—Unconstrained face recognition, uncooperative subjects, media collection, quality-based fusion, still face image, video track, 3D face model, face sketch, demographics I.
Exploring the Geo-Dependence of Human Face Appearance
"... The expected appearance of a human face depends strongly on age, ethnicity and gender. While these relation-ships are well-studied, our work explores the little-studied dependence of facial appearance on geographic location. To support this effort, we constructed GeoFaces, a large dataset of geotagg ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
The expected appearance of a human face depends strongly on age, ethnicity and gender. While these relation-ships are well-studied, our work explores the little-studied dependence of facial appearance on geographic location. To support this effort, we constructed GeoFaces, a large dataset of geotagged face images. We examine the geo-dependence of Eigenfaces and use two supervised methods for extracting geo-informative features. The first, canoni-cal correlation analysis, is used to find location-dependent component images as well as the spatial direction of most significant face appearance change. The second, linear dis-criminant analysis, is used to find countries with relatively homogeneous, yet distinctive, facial appearance. 1.
Deeply learned face representations are sparse, selective, and robust
"... This paper designs a high-performance deep convo-lutional network (DeepID2+) for face recognition. It is learned with the identification-verification supervisory signal. By increasing the dimension of hidden repre-sentations and adding supervision to early convolutional layers, DeepID2+ achieves new ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
This paper designs a high-performance deep convo-lutional network (DeepID2+) for face recognition. It is learned with the identification-verification supervisory signal. By increasing the dimension of hidden repre-sentations and adding supervision to early convolutional layers, DeepID2+ achieves new state-of-the-art on LFW and YouTube Faces benchmarks. Through empirical studies, we have discovered three properties of its deep neural activations critical for the high performance: sparsity, selectiveness and robustness. (1) It is observed that neural activations are moderately sparse. Moderate sparsity maximizes the discriminative power of the deep net as well as the distance between images. It is surprising that DeepID2+ still can achieve high recognition accuracy even after the neural responses are binarized. (2) Its neurons in higher layers are highly selective to identities and identity-related attributes. We can identify different subsets of neurons which are either constantly excited or inhibited when different identities or attributes are present. Although DeepID2+ is not taught to distinguish attributes during training, it has implicitly learned such high-level concepts. (3) It is much more robust to occlusions, although occlusion patterns are not included in the training set. 1.
Webscale training for face identification
, 2014
"... Scaling machine learning methods to massive datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web. Face recognition is a task of great practical interest for which (i) very large labeled datasets exist, containing billions of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Scaling machine learning methods to massive datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web. Face recognition is a task of great practical interest for which (i) very large labeled datasets exist, containing billions of images; (ii) the number of classes can reach tens of millions or more; and (iii) complex features are nec-essary in order to encode subtle differences between subjects, while maintaining invariance to factors such as pose, illumination, and aging. We present an elab-orate pipeline that consists of a crucial network compression step followed by a new bootstrapping scheme for selecting a challenging subset of the dataset for efficient training of a higher capacity network. By using this approach, we are able to greatly improve face recognition accuracy on the widely used LFW bench-mark. Moreover, as performance on supervised face verification (1:1) benchmarks saturates, we propose to shift the attention of the research community to the unsu-pervised Probe-Gallery (1:N) identification benchmarks. On this task, we bridge between the literature and the industry, for the first time, by directly comparing with the state of the art Commercially-Off-The-Shelf system and show a sizable leap in performance. Lastly, we demonstrate an intriguing trade-off between the number of training samples and the optimal size of the network. 1
Deep Learning in Visual Computing and Signal Processing
"... Deep learning is a subfield of machine learning, which aims to learn a hierarchy of features from input data. Nowadays, researchers have intensively investigated deep learning algorithms for solving challenging problems in many areas such as image classification, speech recognition, signal processi ..."
Abstract
- Add to MetaCart
Deep learning is a subfield of machine learning, which aims to learn a hierarchy of features from input data. Nowadays, researchers have intensively investigated deep learning algorithms for solving challenging problems in many areas such as image classification, speech recognition, signal processing, and natural language processing. In this study, we not only review typical deep learning algorithms in computer vision and signal processing but also provide detailed information on how to apply deep learning to specific areas such as road crack detection, fault diagnosis, and human activity detection. Besides, this study also discusses the challenges of designing and training deep neural networks.
Face Recognition Using Smoothed High-Dimensional Representation
"... Abstract. Recent studies have underlined the significance of high-dimensional features and their compression for face recognition. Partly motivated by these findings, we propose a novel method for building unsupervised face represen-tations based on binarized descriptors and efficient compression by ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Recent studies have underlined the significance of high-dimensional features and their compression for face recognition. Partly motivated by these findings, we propose a novel method for building unsupervised face represen-tations based on binarized descriptors and efficient compression by soft assign-ment and unsupervised dimensionality reduction. For binarized descriptors, we consider Binarized Statistical Image Features (BSIF) which is a learning based descriptor computing a binary code for each pixel by thresholding the outputs of a linear projection between a local image patch and a set of independent basis vectors estimated from a training data set using independent component anal-ysis. In this work, we propose application specific learning to train a separate BSIF descriptor for each of the local face regions. Then, our method constructs a high-dimensional representation from an input face by collecting histograms of BSIF codes in a blockwise manner. Before dropping the dimension to get a more compressed representation, an important step in the pipeline of our method is soft feature assignment where the region histograms of the binarized codes are smoothed using kernel density estimation achieved by a simple and fast matrix-vector product. In detail, we provide a thorough evaluation on FERET and LFW benchmarks comparing our face representation method to the state-of-the-art in face recognition showing enhanced performance on FERET and promising re-sults on LFW. 1