Results 1 - 10
of
10
Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments
"... Abstract — Face recognition has benefitted greatly from the many databases that have been produced to study it. Most of these databases have been created under controlled conditions to facilitate the study of specific parameters on the face recognition problem. These parameters include such variable ..."
Abstract
-
Cited by 81 (6 self)
- Add to MetaCart
Abstract — Face recognition has benefitted greatly from the many databases that have been produced to study it. Most of these databases have been created under controlled conditions to facilitate the study of specific parameters on the face recognition problem. These parameters include such variables as position, pose, lighting, expression, background, camera quality, occlusion, age, and gender. While there are many applications for face recognition technology in which one can control the parameters of image acquisition, there are also many applications in which the practitioner has little or no control over such parameters. This database is provided as an aid in studying the latter, unconstrained, face recognition problem. The database represents an initial attempt to provide a set of labeled face photographs spanning the range of conditions typically encountered by people in their everyday lives. The database exhibits “natural ” variability in pose, lighting, focus, resolution, facial expression, age, gender, race, accessories, make-up, occlusions, background, and photographic quality. Despite this variability, the images in the database are presented in a simple and consistent format for maximum ease of use. In addition to describing the details of the database and its acquisition, we provide specific experimental paradigms for which the database is suitable. This is done in an effort to make research performed with the database as consistent and comparable as possible. I.
Decomposing a Scene into Geometric and Semantically Consistent Regions
"... High-level, or holistic, scene understanding involves reasoning about objects, regions, and the 3D relationships between them. This requires a representation above the level of pixels that can be endowed with high-level attributes such as class of object/region, its orientation, and (rough 3D) locat ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
High-level, or holistic, scene understanding involves reasoning about objects, regions, and the 3D relationships between them. This requires a representation above the level of pixels that can be endowed with high-level attributes such as class of object/region, its orientation, and (rough 3D) location within the scene. Towards this goal, we propose a region-based model which combines appearance and scene geometry to automatically decompose a scene into semantically meaningful regions. Our model is defined in terms of a unified energy function over scene appearance and structure. We show how this energy function can be learned from data and present an efficient inference technique that makes use of multiple over-segmentations of the image to propose moves in the energy-space. We show, experimentally, that our method achieves state-of-the-art performance on the tasks of both multi-class image segmentation and geometric reasoning. Finally, by understanding region classes and geometry, we show how our model can be used as the basis for 3D reconstruction of the scene. 1.
Non-homogeneous content-driven video-retargeting
- In ICCV’07
"... Video retargeting is the process of transforming an existing video to fit the dimensions of an arbitrary display. A compelling retargeting aims at preserving the viewers ’ experience by maintaining the information content of important regions in the frame, whilst keeping their aspect ratio. An effic ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
Video retargeting is the process of transforming an existing video to fit the dimensions of an arbitrary display. A compelling retargeting aims at preserving the viewers ’ experience by maintaining the information content of important regions in the frame, whilst keeping their aspect ratio. An efficient algorithm for video retargeting is introduced. It consists of two stages. First, the frame is analyzed to detect the importance of each region in the frame. Then, a transformation that respects the analysis shrinks less important regions more than important ones. Our analysis is fully automatic and based on local saliency, motion detection and object detectors. The performance of the proposed algorithm is demonstrated on a variety of video sequences, and compared to the state of the art in image retargeting. 1.
Region-based Segmentation and Object Detection
"... Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference c ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1
Towards unconstrained face recognition
- In In The Sixth IEEE Computer Society Workshop on Perceptual Organization in Computer Vision IEEE CVPR
, 2008
"... In this paper, we argue that the most difficult face recognition problems (unconstrained face recognition) will be solved by simultaneously leveraging the solutions to multiple vision problems including segmentation, alignment, pose estimation, and the estimation of other hidden variables such as ge ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper, we argue that the most difficult face recognition problems (unconstrained face recognition) will be solved by simultaneously leveraging the solutions to multiple vision problems including segmentation, alignment, pose estimation, and the estimation of other hidden variables such as gender and hair color. While in theory a single unified principle could solve all these problems simultaneously in a giant hidden variable model, we believe that such an approach will be computationally, and more importantly, statistically, intractable. Instead, we promote studying the interactions among mid-level vision features, such as segmentations and pose estimates, as a route toward solving very difficult recognition problems. In this paper, we discuss and provide results showing how pose and face segmentations mutually influence each other, and provide a surprisingly simple method for estimating pose from segmentations. 1.
On the design of robust classifiers for computer vision
"... The design of robust classifiers, which can contend with the noisy and outlier ridden datasets typical of computer vision, is studied. It is argued that such robustness requires loss functions that penalize both large positive and negative margins. The probability elicitation view of classifier desi ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
The design of robust classifiers, which can contend with the noisy and outlier ridden datasets typical of computer vision, is studied. It is argued that such robustness requires loss functions that penalize both large positive and negative margins. The probability elicitation view of classifier design is adopted, and a set of necessary conditions for the design of such losses is identified. These conditions are used to derive a novel robust Bayes-consistent loss, denoted Tangent loss, and an associated boosting algorithm, denoted TangentBoost. Experiments with data from the computer vision problems of scene classification, object tracking, and multiple instance learning show that TangentBoost consistently outperforms previous boosting algorithms. 1.
Domestic Interaction on a Segway Base
"... Abstract. To be useful in a home environment, an assistive robot needs to be capable of a broad range of interactive activities such as locating objects, following specific people, and distinguishing among different people. This paper presents a Segway-based robot that successfully performed all of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. To be useful in a home environment, an assistive robot needs to be capable of a broad range of interactive activities such as locating objects, following specific people, and distinguishing among different people. This paper presents a Segway-based robot that successfully performed all of these tasks en route to a second place finish in the RoboCup@Home 2007 competition. The main contribution is a complete description and analysis of the robot system and its implemented algorithms that enabled the robot’s successful human-robot interaction in this broad and challenging forum. We describe in detail a novel person recognition algorithm, a key component of our overall success, that included two co-trained classifiers, each focusing on different aspects of the person (face and shirt color). 1
Wide-angle Micro Sensors for Vision on a Tight Budget
"... Achieving computer vision on micro-scale devices is a challenge. On these platforms, the power and mass constraints are severe enough for even the most common computations (matrix manipulations, convolution, etc.) to be difficult. This paper proposes and analyzes a class of miniature vision sensors ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Achieving computer vision on micro-scale devices is a challenge. On these platforms, the power and mass constraints are severe enough for even the most common computations (matrix manipulations, convolution, etc.) to be difficult. This paper proposes and analyzes a class of miniature vision sensors that can help overcome these constraints. These sensors reduce power requirements through template-based optical convolution, and they enable a wide field-of-view within a small form through a novel optical design. We describe the trade-offs between the field of view, volume, and mass of these sensors and we provide analytic tools to navigate the design space. We also demonstrate milli-scale prototypes for computer vision tasks such as locating edges, tracking targets, and detecting faces. 1.
Difference Images
, 2005
"... � Generative: model p(z, s) = p(s) p(z | s); then use Bayes ’ rule to infer p(s | z). � Discriminative: model p(s | z) directly. �Here we could think of s as the existence of a pedestrian at an image location, z as the image. �We cannot generate new images of pedestrians, but why would we want to? ..."
Abstract
- Add to MetaCart
� Generative: model p(z, s) = p(s) p(z | s); then use Bayes ’ rule to infer p(s | z). � Discriminative: model p(s | z) directly. �Here we could think of s as the existence of a pedestrian at an image location, z as the image. �We cannot generate new images of pedestrians, but why would we want to? Modeling p(s | z) � Don’t really need the actual posterior distribution. � Just need to know where the pedestrians are. � Idea: loop through patches of image at various scales with a pedestrian classifier. Building the Classifier � Feature-based approach. � Define feature values with simple rectangle filter responses and a threshold. � Define the classifier with a sum of selected features and a threshold. � Use AdaBoost (Freund and Schapire, 1995) learning.
Conditional Random Fields for Multi-Camera Object Detection
"... We formulate a model for multi-class object detection in a multi-camera environment. From our knowledge, this is the first time that this problem is addressed taken into account different object classes simultaneously. Given several images of the scene taken from different angles, our system estimat ..."
Abstract
- Add to MetaCart
We formulate a model for multi-class object detection in a multi-camera environment. From our knowledge, this is the first time that this problem is addressed taken into account different object classes simultaneously. Given several images of the scene taken from different angles, our system estimates the ground plane location of the objects from the output of several object detectors applied at each viewpoint. We cast the problem as an energy minimization modeled with a Conditional Random Field (CRF). Instead of predicting the presence of an object at each image location independently, we simultaneously predict the labeling of the entire scene. Our CRF is able to take into account occlusions between objects and contextual constraints among them. We propose an effective iterative strategy that renders tractable the underlying optimization problem, and learn the parameters of the model with the max-margin paradigm. We evaluate the performance of our model on several challenging multi-camera pedestrian detection datasets namely PETS 2009 [5] and EPFL terrace sequence [9]. We also introduce a new dataset in which multiple classes of objects appear simultaneously in the scene. It is here where we show that our method effectively handles occlusions in the multi-class case. 1.

