Results 11  20
of
175
An integrated Bayesian approach to layer extraction from image sequences
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... AbstractÐThis paper describes a Bayesian approach for modeling 3D scenes as a collection of approximately planar layers that are arbitrarily positioned and oriented in the scene. In contrast to much of the previous work on layerbased motion modeling, which computes layered descriptions of 2D image ..."
Abstract

Cited by 124 (19 self)
 Add to MetaCart
(Show Context)
AbstractÐThis paper describes a Bayesian approach for modeling 3D scenes as a collection of approximately planar layers that are arbitrarily positioned and oriented in the scene. In contrast to much of the previous work on layerbased motion modeling, which computes layered descriptions of 2D image motion, our work leads to a 3D description of the scene. There are two contributions within the paper. The first is to formulate the prior assumptions about the layers and scene within a Bayesian decision making framework which is used to automatically determine the number of layers and the assignment of individual pixels to layers. The second is algorithmic. In order to achieve the optimization, a Bayesian version of RANSAC is developed with which to initialize the segmentation. Then, a generalized expectation maximization method is used to find the MAP solution. Index TermsÐLayer extraction, segmentation, stereo matching, motion estimation. 1
A Layered Approach to Stereo Reconstruction
, 1998
"... We propose a framework for extracting structure from stereo which represents the scene as a collection of approximately planar layers. Each layer consists of an explicit 3D plane equation, a colored image with perpixel opacity (a sprite), and a perpixel depth offset relative to the plane. Initial ..."
Abstract

Cited by 116 (24 self)
 Add to MetaCart
We propose a framework for extracting structure from stereo which represents the scene as a collection of approximately planar layers. Each layer consists of an explicit 3D plane equation, a colored image with perpixel opacity (a sprite), and a perpixel depth offset relative to the plane. Initial estimates of the layers are recovered using techniques taken from parametric motion estimation. These initial estimates are then refined using a resynthesis algorithm which takes into account both occlusions and mixed pixels. Reasoning about such effects allows the recovery of depth and color information with high accuracy, even in partially occluded regions. Another important benefit of our framework is that the output consists of a collection of approximately planar regions, a representation which is far more appropriate than a dense depth map for many applications such as rendering and video parsing. 1 Introduction Although extracting scene structure using stereo has long been an activ...
Modelling and interpretation of architecture from several images
"... The modelling of 3dimensional (3D) environments has become a requirement for many applications in engineering design, virtual reality, visualisation and entertainment. However the scale and complexity demanded from such models has risen to the point where the acquisition of 3D models can require a ..."
Abstract

Cited by 111 (6 self)
 Add to MetaCart
(Show Context)
The modelling of 3dimensional (3D) environments has become a requirement for many applications in engineering design, virtual reality, visualisation and entertainment. However the scale and complexity demanded from such models has risen to the point where the acquisition of 3D models can require a vast amount of specialist time and equipment. Because of this much research has been undertaken in the computer vision community into automating all or part of the process of acquiring a 3D model from a sequence of images. This thesis focuses specifically on the automatic acquisition of architectural models from short image sequences. An architectural model is defined as a set of planes corresponding to walls which contain a variety of labelled primitives such as doors and windows. As well as a label defining its type, each primitive contains parameters defining its shape and texture. The key advantage of this representation is that the model defines not only geometry and texture, but also an interpretation of the scene. This is crucial as it enables reasoning about the scene; for instance, structure and texture can be inferred in areas of the model which are unseen in any
Object tracking with Bayesian estimation of dynamic layer representations
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2002
"... Decomposing video frames into coherent twodimensional motion layers is a powerful method for representing videos. Such a representation provides an intermediate description that enables applications such as object tracking, video summarization and visualization, video insertion, and spritebased v ..."
Abstract

Cited by 105 (6 self)
 Add to MetaCart
Decomposing video frames into coherent twodimensional motion layers is a powerful method for representing videos. Such a representation provides an intermediate description that enables applications such as object tracking, video summarization and visualization, video insertion, and spritebased video compression. Previous work on motion layer analysis has largely concentrated on twoframe or multiframe batch formulations. The temporal coherency of motion layers and the domain constraints on shapes have not been exploited. This paper introduces a complete dynamic motion layer representation in which spatial and temporal constraints on shape, motion, and layer appearance are modeled and estimated in a maximum a posteriori (MAP) framework using the generalized expectationmaximization (EM) algorithm. In order to limit the computational complexity of tracking arbitrarily shaped layer ownership, we propose a shape prior that parameterizes the representation of shape and prevents motion layers from evolving into arbitrary shapes. In this work, a Gaussian shape prior is chosen to specifically develop a near realtime tracker for vehicle tracking in aerial videos. However, the general idea of using a parametric shape representation as part of the state of a tracker is a powerful one that can be extended to other domains as well. Based on the dynamic layer representation, an iterative algorithm is developed for continuous object tracking over time. The proposed method has been successfully applied in an airborne vehicle tracking system. Its performance is compared with that of a correlationbased tracker and a motion changebased tracker to demonstrate the advantages of the new method. Examples of tracking when the backgrounds are cluttered and the vehicles undergo various rigid motions and complex interactions such as passing, turning, and stopandgo demonstrate the strength of the complete dynamic layer representation.
Twist Based Acquisition and Tracking of Animal and Human Kinematics
 INTERNATIONAL JOURNAL OF COMPUTER VISION
, 2004
"... This paper demonstrates a new visual motion estimation technique that is able to recover high degreeoffreedom articulated human body configurations in complex video sequences. We introduce the use and integration of a mathematical technique, the product of exponential maps and twist motions, into a ..."
Abstract

Cited by 97 (1 self)
 Add to MetaCart
This paper demonstrates a new visual motion estimation technique that is able to recover high degreeoffreedom articulated human body configurations in complex video sequences. We introduce the use and integration of a mathematical technique, the product of exponential maps and twist motions, into a differential motion estimation. This results in solving simple linear systems, and enables us to recover robustly the kinematic degreesoffreedom in noise and complex self occluded configurations. A new factorization technique lets us also recover the kinematic chain model itself. We are able to track several human walk cycles, several wallaby hop cycles, and two walk cycels of the famous movements of Eadweard Muybridge's motion studies from the last century. To the best of our knowledge, this is the first computer vision based system that is able to process such challenging footage.
Stereo Matching with Transparency and Matting
 IJCV
, 1998
"... This paper formulates and solves a new variant of the stereo correspondence problem: simultaneously recovering the disparities, true colors, and opacities of visible surface elements. This problem arises in newer applications of stereo reconstruction, such as view interpolation and the layering of r ..."
Abstract

Cited by 97 (17 self)
 Add to MetaCart
This paper formulates and solves a new variant of the stereo correspondence problem: simultaneously recovering the disparities, true colors, and opacities of visible surface elements. This problem arises in newer applications of stereo reconstruction, such as view interpolation and the layering of real imagery with synthetic graphics for special effects and virtual studio applications. While this problem is intrinsically more difficult than traditional stereo correspondence, where only the disparities are being recovered, it provides a principled way of dealing with commonly occurring problems such as occlusions and the handling of mixed (foreground/background) pixels near depth discontinuities. It also provides a novel means for separating foreground and background objects (matting), without the use of a special blue screen. We formulate the problem as the recovery of colors and opacities in a generalized 3D (x, y, d) disparity space, and solve the problem using a combination of initial evidence aggregation followed by iterative energy minimization.
Robustly Estimating Changes in Image Appearance
 Computer Vision and Image Understanding
, 2000
"... this paper we formulate a robust statistical framework for representing certain classes of appearance changes. In so doing we have three primary goals. First, we wish to "explain" appearance changes in an image sequence as resulting from a "mixture" of causes. Second, we wish to ..."
Abstract

Cited by 72 (2 self)
 Add to MetaCart
this paper we formulate a robust statistical framework for representing certain classes of appearance changes. In so doing we have three primary goals. First, we wish to "explain" appearance changes in an image sequence as resulting from a "mixture" of causes. Second, we wish to locate where particular types of appearance change are taking place in an image. And, third, we want to provide a framework that generalizes previous work on motion estimation.
Extraction of 2d motion trajectories and its application to hand gesture recognition
 PAMI
, 2002
"... AbstractÐWe present an algorithm for extracting and classifying twodimensional motion in an image sequence based on motion trajectories. First, a multiscale segmentation is performed to generate homogeneous regions in each frame. Regions between consecutive frames are then matched to obtain twovie ..."
Abstract

Cited by 68 (1 self)
 Add to MetaCart
(Show Context)
AbstractÐWe present an algorithm for extracting and classifying twodimensional motion in an image sequence based on motion trajectories. First, a multiscale segmentation is performed to generate homogeneous regions in each frame. Regions between consecutive frames are then matched to obtain twoview correspondences. Affine transformations are computed from each pair of corresponding regions to define pixel matches. Pixels matches over consecutive image pairs are concatenated to obtain pixellevel motion trajectories across the image sequence. Motion patterns are learned from the extracted trajectories using a timedelay neural network. We apply the proposed method to recognize 40 hand gestures of American Sign Language. Experimental results show that motion patterns of hand gestures can be extracted and recognized accurately using motion trajectories. Index TermsÐMotion segmentation, motion analysis, motion trajectory, American Sign Language, hand gesture recognition, timedelay neural network. 1
A Robust Subspace Approach to Layer Extraction
, 2002
"... Representing images with layers has many important applications, such as video compression, motion analysis, and 3D scene analysis. This paper presents a robust subspace approach to reliably extracting layers from images by taking advantages of the fact that homographies induced by planar patches in ..."
Abstract

Cited by 64 (6 self)
 Add to MetaCart
Representing images with layers has many important applications, such as video compression, motion analysis, and 3D scene analysis. This paper presents a robust subspace approach to reliably extracting layers from images by taking advantages of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace. Such subspace provides not only a feature space where layers in the image domain are mapped onto denser and betterdefined clusters, but also a constraint for detecting outliers in the local measurements, thus making the algorithm robust to outliers. By enforcing the subspace constraint, spatial and temporal redundancy from multiple frames are simultaneously utilized, and noise can be effectively reduced. Good layer descriptions are shown to be extracted in the experimental results.
Efficient spatiotemporal grouping using the Nyström method
 In Proc. IEEE Conf. Comput. Vision and Pattern Recognition
, 2001
"... Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation, but due to the computational demands, applications of such methods to spatiotemporal data have been slow to appear. For even a short video sequence, the set of all pairwise voxel similarities is ..."
Abstract

Cited by 60 (5 self)
 Add to MetaCart
(Show Context)
Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation, but due to the computational demands, applications of such methods to spatiotemporal data have been slow to appear. For even a short video sequence, the set of all pairwise voxel similarities is a huge quantity of data: one second of a � � ¢ � � sequence captured at Hz entails on the order of pairwise similarities. The contribution of this paper is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning, making it feasible to apply them to very large spatiotemporal grouping problems. Our approach is based on a technique for the numerical solution of eigenfunction problems known as the Nyström method. This method allows extrapolation of the complete grouping solution using only a small number of “typical ” samples. In doing so, we successfully exploit the fact that there are far fewer coherent groups in an image sequence than pixels. 1