Mean shift: A robust approach toward feature space analysis
 In PAMI
, 2002
"... A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence ..."
Cited by 2401 (37 self)
A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and thus its utility in detecting the modes of the density. The equivalence of the mean shift procedure to the Nadaraya–Watson estimator from kernel regression and the robust Mestimators of location is also established. Algorithms for two lowlevel vision tasks, discontinuity preserving smoothing and image segmentation are described as applications. In these algorithms the only user set parameter is the resolution of the analysis, and either gray level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.
Layered Representation of Motion Video using Robust MaximumLikelihood Estimation of Mixture Models and MDL Encoding
, 1995
"... Representing and modeling the motion and spatial support of multiple objects and surfaces from motion video sequences is an important intermediate step towards dynamic image understanding. One such representation, called layered representation, has recently been proposed. Although a number of algori ..."
Cited by 204 (4 self)
Representing and modeling the motion and spatial support of multiple objects and surfaces from motion video sequences is an important intermediate step towards dynamic image understanding. One such representation, called layered representation, has recently been proposed. Although a number of algorithms have been developed for computing these representations, there has not been a consolidated effort into developing a precise mathematical formulation of the problem. This paper presents such a formulation based on maximum likelihood estimation of mixture models and the minimum description length (MDL) encoding principle. The three major issues in layered motion representation are: (i) how many motion models adequately describe image motion, (ii) what are the motion model parameters, and (iii) what is the spatial support layer for each motion model. In order to allow multiple models in the description of image motion, the likelihood function for change in intensity of a pixel is modeled a...
A Framework for Robust Subspace Learning
 International Journal of Computer Vision
, 2003
"... Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multilinear models. These models have been widely used for the representation of shape, appearance, motion, etc, in computer vision applications. ..."
Cited by 177 (10 self)
Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multilinear models. These models have been widely used for the representation of shape, appearance, motion, etc, in computer vision applications.
Compact Representations Of Videos Through Dominant And Multiple Motion Estimation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1996
"... An explosion of online image and video data in digital form is already well underway. With the exponential rise in interactive information exploration and dissemination through the WorldWide Web (WWW), the major inhibitors of rapid access to online video data are costs and management of capture an ..."
Cited by 172 (0 self)
An explosion of online image and video data in digital form is already well underway. With the exponential rise in interactive information exploration and dissemination through the WorldWide Web (WWW), the major inhibitors of rapid access to online video data are costs and management of capture and storage, lack of realtime delivery, and nonavailability of contentbased intelligent search and indexing techniques. The solutions for capture, storage and delivery maybe on the horizon or a little beyond. However, even with rapid delivery, the lack of efficient authoring and querying tools for visual contentbased indexing may still inhibit as widespread a use of video information as that of text and traditional tabular data is currently. In order to be able to nonlinearly browse and index into videos through visual content, it is necessary to develop authoring tools that can automatically separate moving objects and significant components of the scene, and represent these in a compact ...
Estimating Optical Flow in Segmented Images using Variableorder Parametric Models with Local Deformations
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1996
"... This paper presents a new model for estimating optical flow based on the motion of planar regions plus local deformations. The approach exploits brightness information to organize and constrain the interpretation of the motion by using segmented regions of piecewise smooth brightness to hypothesize ..."
Cited by 101 (4 self)
This paper presents a new model for estimating optical flow based on the motion of planar regions plus local deformations. The approach exploits brightness information to organize and constrain the interpretation of the motion by using segmented regions of piecewise smooth brightness to hypothesize planar regions in the scene. Parametric flow models are estimated in these regions in a two step process which first computes a coarse fit and estimates the appropriate parameterization of the motion of the region (two, six, or eight parameters). The initial fit is refined using a generalization of the standard areabased regression approaches. Since the assumption of planarity is likely to be violated, we allow local deformations from the planar assumption in the same spirit as physicallybased approaches which model shape using coarse parametric models plus local deformations. This parametric+deformation model exploits the strong constraints of parametric approaches while retaining the ada...
Cooperative Robust Estimation Using Layers of Support
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1991
"... We present an approach to the problem of representing images that contain multiple objects or surfaces. Rather than use an edgebased approach to represent the segmentation of a scene, we propose a multilayer estimation framework which uses support maps to represent the segmentation of the image in ..."
Cited by 96 (5 self)
We present an approach to the problem of representing images that contain multiple objects or surfaces. Rather than use an edgebased approach to represent the segmentation of a scene, we propose a multilayer estimation framework which uses support maps to represent the segmentation of the image into homogeneous chunks. This supportbased approach can represent objects that are split into disjoint regions, or have surfaces that are transparently interleaved. Our framework is based on an extension of robust estimation methods which provide a theoretical basis for supportbased estimation. The Minimum Description Length principle is used to decide how many support maps to use in describing a particular image. We show results applying this framework to heterogeneous interpolation and segmentation tasks on range and motion imagery. 1 Introduction Realworld perceptual systems must deal with complicated and cluttered environments. To succeed in such environments, a system must be able to r...
Modelbased 2D&3D Dominant Motion Estimation for Mosaicing and Video Representation
, 1995
"... It is fairly common in video sequences that a mostly fixed background (scene) is imaged with or without independently moving objects. The dominant background changes in the image plane mostly due to camera operations and motion (zoom, pan, tilt, track etc.). In this paper we address the problem of c ..."
Cited by 60 (3 self)
It is fairly common in video sequences that a mostly fixed background (scene) is imaged with or without independently moving objects. The dominant background changes in the image plane mostly due to camera operations and motion (zoom, pan, tilt, track etc.). In this paper we address the problem of computation of the dominant image transformation over time and demonstrate how this can be effectively used for efficient video representation through video mosaicing and image registration. We formulate the problem of dominant component estimation as that of modelbased robust estimation using Mestimators with direct, multiresolution methods. In addition to 2D affine and plane projective models, that have been used in the past, for describing image motion using direct methods, we also employ a true 3D model of motion and scene structure imaged with uncalibrated cameras. This model parameterizes the image motion as that due to a planar component and a parallax component. For rigid 3D scenes...
Robust Parameterized Component Analysis: Theory and Applications to 2D Facial Modeling
 Computer Vision and Image Understanding, 91:53 – 71
, 2002
"... Principal Component Analysis (PCA) has been successfully applied to construct linear models of shape, graylevel, and motion. In particular, PCA has been widely used to model the variation in the appearance of people's faces. We extend previous work on facial modeling for tracking faces in video ..."
Cited by 53 (12 self)
Principal Component Analysis (PCA) has been successfully applied to construct linear models of shape, graylevel, and motion. In particular, PCA has been widely used to model the variation in the appearance of people's faces. We extend previous work on facial modeling for tracking faces in video sequences as they undergo significant changes due to facial expressions. Here we develop personspecific facial appearance models (PSFAM), which use modular PCA to model complex intraperson appearance changes. Such models require aligned visual training data; in previous work, this has involved a time consuming and errorprone hand alignment and cropping process. Instead, we introduce parameterized component analysis to learn a subspace that is invariant to affine (or higher order) geometric transformations. The automatic learning of a PSFAM given a training image sequence is posed as a continuous optimization problem and is solved with a mixture of stochastic and deterministic techniques achieving subpixel accuracy.
Optical Flow Estimation
, 2005
"... This chapter provides a tutorial introduction to gradientbased optical flow estimation. We discuss leastsquares and robust estimators, iterative coarsetofine refinement, different forms of parametric motion models, different conservation assumptions, probabilistic formulations, and robust mixtur ..."
Cited by 41 (4 self)
