Results 11  20
of
235
Learning methods for generic object recognition with invariance to pose and lighting
 In Proceedings of CVPR’04
, 2004
"... We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniformcolored toys under 36 angles, 9 azimuths, and 6 lighting co ..."
Abstract

Cited by 254 (18 self)
 Add to MetaCart
(Show Context)
We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniformcolored toys under 36 angles, 9 azimuths, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: fourlegged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Lowresolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks, operating on raw pixels or on PCAderived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13 % for SVM and 7 % for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 14 % error. A realtime version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second. 1
Discriminant Analysis of Principal Components for Face Recognition
, 1998
"... . In this paper we describe a face recognition method based on PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis). The method consists of two steps: first we project the face image from the original vector space to a face subspace via PCA, second we use LDA to obtain a linear ..."
Abstract

Cited by 247 (12 self)
 Add to MetaCart
. In this paper we describe a face recognition method based on PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis). The method consists of two steps: first we project the face image from the original vector space to a face subspace via PCA, second we use LDA to obtain a linear classifier. The basic idea of combining PCA and LDA is to improve the generalization capability of LDA when only few samples per class are available. Using FERET dataset we demonstrate a significant improvement when principal components rather than original images are fed to the LDA classifier. The hybrid classifier using PCA and LDA provides a useful framework for other image recognition tasks as well. 1 Introduction The problem of automatic face recognition is a composite task that involves detection and location of faces in a cluttered background, normalization, recognition and verification. Depending on the nature of the application, e.g. sizes of training and testing database, clutter...
Support vector machines: Training and applications
 A.I. MEMO 1602, MIT A. I. LAB
, 1997
"... The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Laboratories [3, 6, 8, 24]. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and MultiLayer Perc ..."
Abstract

Cited by 221 (3 self)
 Add to MetaCart
(Show Context)
The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Laboratories [3, 6, 8, 24]. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and MultiLayer Perceptron classifiers. The main idea behind the technique is to separate the classes with a surface that maximizes the margin between them. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle [23]. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Since Structural Risk Minimization is an inductive principle that aims at minimizing a bound on the generalization error of a model, rather than minimizing the Mean Square Error over the data set (as Empirical Risk Minimization methods do), training a SVM to obtain the maximum margin classi er requires a different objective function. This objective function is then optimized by solving a largescale quadratic programming problem with linear and box constraints. The problem is considered challenging, because the quadratic form is completely dense, so the memory
Rotation invariant neural networkbased face detection
, 1998
"... In this paper, we present a neural networkbased face detection system. Unlike similar systems which are limited to detecting upright, frontal faces, this system detects faces at any degree of rotation in the image plane. The system employs multiple networks; a “router ” network first processes each ..."
Abstract

Cited by 217 (4 self)
 Add to MetaCart
(Show Context)
In this paper, we present a neural networkbased face detection system. Unlike similar systems which are limited to detecting upright, frontal faces, this system detects faces at any degree of rotation in the image plane. The system employs multiple networks; a “router ” network first processes each input window to determine its orientation and then uses this information to prepare the window for one or more “detector ” networks. We present the training methods for both types of networks. We also perform sensitivity analysis on the networks, and present empirical results on a large test set. Finally, we present preliminary results for detecting faces rotated out of the image plane, such as profiles and semiprofiles. 1.
Unsupervised learning of invariant feature hierarchies with application to object recognition.” CVPR, 2007. 1 Data Driven HMC Algorithm. DDHMC (motionbased proposals) 1: Initialize chain with τo 2: for i = 1 to nsamples do 3: // 1. DataDriven: Get Propo
 Initialize the Acceptance, H(qo, po), and the Proposal, H ′ (qo, po ) Hamiltonians , τq) 14: po = DMotion(τ ′ i , τq) 15: qo = DF orm(τ ′ i , τq) 16: draw po ∼ N (0, 1) 17: // 2. Perturbation on H ′ using Leapfrog 18: for j=1 to l do 13: qo = DF orm(τ ′ i
"... We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a pointwise sigmoid nonlinearity, and a featurepooling layer that compute ..."
Abstract

Cited by 188 (17 self)
 Add to MetaCart
(Show Context)
We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a pointwise sigmoid nonlinearity, and a featurepooling layer that computes the max of each filter output within adjacent windows. A second level of larger and more invariant features is obtained by training the same algorithm on patches of features from the first level. Training a supervised classifier on these features yields 0.64 % error on MNIST, and 54 % average recognition rate on Caltech 101
Estimating 3D Hand Pose From a Cluttered Image
, 2003
"... A method is proposed that can generate a ranked list of plausible threedimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of s ..."
Abstract

Cited by 172 (7 self)
 Add to MetaCart
(Show Context)
A method is proposed that can generate a ranked list of plausible threedimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of synthetic hand images. In contrast to previous approaches, the system can function in the presence of clutter, thanks to two novel cluttertolerant indexing methods. First, a computationally efficient approximation of the imagetomodel chamfer distance is obtained by embedding binary edge images into a highdimensional Euclidean space. Second, a generalpurpose, probabilistic line matching method identifies those line segment correspondences between model and input images that are the least likely to have occurred by chance. The performance of this cluttertolerant approach is demonstrated in quantitative experiments with hundreds of real hand images.
Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis
, 1999
"... Detecting and recognizing human faces automatically in digital images strongly enhance contentbased video indexing systems. In this paper, a novel scheme for human faces detection in color images under nonconstrained scene conditions, such as the presence of a complex background and uncontrolled il ..."
Abstract

Cited by 134 (4 self)
 Add to MetaCart
Detecting and recognizing human faces automatically in digital images strongly enhance contentbased video indexing systems. In this paper, a novel scheme for human faces detection in color images under nonconstrained scene conditions, such as the presence of a complex background and uncontrolled illumination, is presented. Color clustering and filtering using approximations of the YCbCr and HSV skin color subspaces are applied on the original image, providing quantized skin color regions. A merging stage is then iteratively performed on the set of homogeneous skin color regions in the color quantized image, in order to provide a set of potential face areas. Constraints related to shape and size of faces are applied, and face intensity texture is analyzed by performing a wavelet packet decomposition on each face area candidate in order to detect human faces. The wavelet coefficients of the band filtered images characterize the face texture and a set of simple statistical deviations is ...
Robust Principal Component Analysis for Computer Vision
, 2001
"... Principal Component Analysis (PCA) has been widely used for the representation of shape, appearance, and motion. One drawback of typical PCA methods is that they are least squares estimation techniques and hence fail to account for "outliers" which are common in realistic training sets. In ..."
Abstract

Cited by 133 (3 self)
 Add to MetaCart
Principal Component Analysis (PCA) has been widely used for the representation of shape, appearance, and motion. One drawback of typical PCA methods is that they are least squares estimation techniques and hence fail to account for "outliers" which are common in realistic training sets. In computer vision applications, outliers typically occur within a sample (image) due to pixels that are corrupted by noise, alignment errors, or occlusion. We review previous approaches for making PCA robust to outliers and present a new method that uses an intrasample outlier process to account for pixel outliers. We develop the theory of Robust Principal Component Analysis (RPCA) and describe a robust Mestimation algorithm for learning linear multivariate representations of high dimensional data such as images. Quantitative comparisons with traditional PCA and previous robust algorithms illustrate the benefits of RPCA when outliers are present. Details of the algorithm are described and a software implementation is being made publically available.
Diffusion snakes: introducing statistical shape knowledge into the MumfordShah functional
 J. OF COMPUTER VISION
, 2002
"... We present a modification of the MumfordShah functional and its cartoon limit which facilitates the incorporation of a statistical prior on the shape of the segmenting contour. By minimizing a single energy functional, we obtain a segmentation process which maximizes both the grey value homogeneit ..."
Abstract

Cited by 130 (15 self)
 Add to MetaCart
We present a modification of the MumfordShah functional and its cartoon limit which facilitates the incorporation of a statistical prior on the shape of the segmenting contour. By minimizing a single energy functional, we obtain a segmentation process which maximizes both the grey value homogeneity in the separated regions and the similarity of the contour with respect to a set of training shapes. We propose a closedform, parameterfree solution for incorporating invariance with respect to similarity transformations in the variational framework. We show segmentation results on artificial and realworld images with and without prior shape information. In the cases of noise, occlusion or strongly cluttered background the shape prior significantly improves segmentation. Finally we compare our results to those obtained by a level set implementation of geodesic active contours.
A Statistical Approach to 3D Object Detection Applied to Faces and Cars
, 2000
"... In this thesis, we describe a statistical method for 3D object detection. In this method, we decompose the 3D geometry of each object into a small number of viewpoints. For each viewpoint, we construct a decision rule that determines if the object is present at that specific orientation. Each decisi ..."
Abstract

Cited by 102 (1 self)
 Add to MetaCart
In this thesis, we describe a statistical method for 3D object detection. In this method, we decompose the 3D geometry of each object into a small number of viewpoints. For each viewpoint, we construct a decision rule that determines if the object is present at that specific orientation. Each decision rule uses the statistics of both object appearance and &quot;nonobject &quot; visual appearance. We represent each set of statistics using a product of histograms. Each histogram represents the joint statistics of a subset of wavelet coefficients and their position on the object. Our approach is to use many such histograms representing a wide variety of visual attributes. Using this method, we have developed the first algorithm that can reliably detect faces that vary from frontal view to full profile view and the first algorithm that can reliably detect cars over a wide range of viewpoints.