Results 1 - 10
of
55
Graphcut Textures: Image and Video Synthesis Using Graph Cuts
, 2003
"... In this paper we introduce a new algorithm for image and video texture synthesis. In our approach, patch regions from a sample image or video are transformed and copied to the output and then stitched together along optimal seams to generate a new (and typically larger) output. In contrast to other ..."
Abstract
-
Cited by 267 (5 self)
- Add to MetaCart
In this paper we introduce a new algorithm for image and video texture synthesis. In our approach, patch regions from a sample image or video are transformed and copied to the output and then stitched together along optimal seams to generate a new (and typically larger) output. In contrast to other techniques, the size of the patch is not chosen a-priori, but instead a graph cut technique is used to determine the optimal patch region for any given offset between the input and output texture. Unlike dynamic programming, our graph cut technique for seam optimization is applicable in any dimension. We specifically explore it in 2D and 3D to perform video texture synthesis in addition to regular image synthesis. We present approximative offset search techniques that work well in conjunction with the presented patch size optimization. We show results for synthesizing regular, random, and natural images and videos. We also demonstrate how this method can be used to interactively merge different images to generate new scenes.
Dynamic Textures
, 2002
"... Dynamic textures are sequences of images of moving scenes that exhibit certain stationarity properties in time; these include sea-waves, smoke, foliage, whirlwind etc. We present a novel characterization of dynamic textures that poses the problems of modeling, learning, recognizing and synthesizing ..."
Abstract
-
Cited by 223 (14 self)
- Add to MetaCart
Dynamic textures are sequences of images of moving scenes that exhibit certain stationarity properties in time; these include sea-waves, smoke, foliage, whirlwind etc. We present a novel characterization of dynamic textures that poses the problems of modeling, learning, recognizing and synthesizing dynamic textures on a firm analytical footing. We borrow tools from system identification to capture the "essence" of dynamic textures; we do so by learning (i.e. identifying) models that are optimal in the sense of maximum likelihood or minimum prediction error variance. For the special case of second-order stationary processes, we identify the model sub-optimally in closed-form. Once learned, a model has predictive power and can be used for extrapolating synthetic sequences to infinite length with negligible computational cost. We present experimental evidence that, within our framework, even low-dimensional models can capture very complex visual phenomena.
Event-based analysis of video
- In Proc. CVPR
, 2001
"... Dynamic events can be regarded as long-term temporal objects, which are characterized by spatiotemporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences (possibly of different lengths) based on their behavioral content. This ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
Dynamic events can be regarded as long-term temporal objects, which are characterized by spatiotemporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences (possibly of different lengths) based on their behavioral content. This measure is non-parametric and can thus handle a wide range of dynamic events. Having an event-based distance measure between sequences, we use it for a variety of tasks, including: (i) event-based search and indexing into long video sequences (for “intelligent fast forward”), (ii) temporal segmentation of long video sequences based on behavioral content, and (iii) clustering events within long video sequence into event-consistent sub-sequences (i.e., into event-consistent “clusters”). These tasks are performed without prior knowledge of the types of events, their models, or their temporal extents. Our simple event representation and associated distance measure supports event-based search and indexing even when only one short example-clip is available. However, when multiple example-clips of the same event are available (either as a result of the clustering process, or supplied manually), these can be used to refine the event representation, the associated distance measure, and accordingly the quality of the detection and clustering process. 1
Dynamic Texture Segmentation
- In IEEE International Conference on Computer Vision
, 2003
"... We address the problem of segmenting a sequence of images of natural scenes into disjoint regions that are characterized by constant spatio-temporal statistics. We model the spatio-temporal dynamics in each region by Gauss-Markov models, and infer the model parameters as well as the boundary of the ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
We address the problem of segmenting a sequence of images of natural scenes into disjoint regions that are characterized by constant spatio-temporal statistics. We model the spatio-temporal dynamics in each region by Gauss-Markov models, and infer the model parameters as well as the boundary of the regions in a variational optimization framework. Numerical results demonstrate that -- in contrast to purely texture-based segmentation schemes -- our method is effective in segmenting regions that differ in their dynamics even when spatial statistics are identical.
Motion Recognition Using Nonparametric Image Motion Models Estimated from Temporal and Multiscale Co-Occurrence Statistics
- IEEE Trans. on Pattern Analysis and Machine Intelligence
, 2003
"... A new approach for motion characterization in image sequences is presented. It relies on the probabilistic modeling of temporal and scale cooccurrence distributions of local motion-related measurements directly computed over image sequences. Temporal multiscale Gibbs models allow us to handle bot ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
A new approach for motion characterization in image sequences is presented. It relies on the probabilistic modeling of temporal and scale cooccurrence distributions of local motion-related measurements directly computed over image sequences. Temporal multiscale Gibbs models allow us to handle both spatial and temporal aspects of image motion content within a unified statistical framework. Since this modeling mainly involves the scalar product between cooccurrence values and Gibbs potentials, we can formulate and address several fundamental issues: model estimation according to the ML criterion (hence, model training and learning) and motion classification. We have conducted motion recognition experiments over a large set of real image sequences comprising various motion types such as temporal texture samples, human motion examples, and rigid motion situations.
Probabilistic kernels for the classification of auto-regressive visual processes
- In IEEE Conference on Computer Vision and Pattern Recognition
, 2005
"... We present a framework for the classification of visual processes that are best modeled with spatio-temporal autoregressive models. The new framework combines the modeling power of a family of models known as dynamic textures and the generalization guarantees, for classification, of the support vect ..."
Abstract
-
Cited by 33 (13 self)
- Add to MetaCart
We present a framework for the classification of visual processes that are best modeled with spatio-temporal autoregressive models. The new framework combines the modeling power of a family of models known as dynamic textures and the generalization guarantees, for classification, of the support vector machine classifier. This combination is achieved by the derivation of a new probabilistic kernel based on the Kullback-Leibler divergence (KL) between Gauss-Markov processes. In particular, we derive the KL-kernel for dynamic textures in both 1) the image space, which describes both the motion and appearance components of the spatio-temporal process, and 2) the hidden state space, which describes the temporal component alone. Together, the two kernels cover a large variety of video classification problems, including the cases where classes can differ in both appearance and motion and the cases where appearance is similar for all classes and only motion is discriminant. Experimental evaluation on two databases shows that the new classifier achieves superior performance over existing solutions. 1.
Machine recognition of human activities: A survey
, 2008
"... The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the a ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
A Brief Survey of Dynamic Texture Description And Recognition
- Proc. Int’l Conf. Computer Recognition Systems
, 2005
"... This paper is a brief survey of approaches to description and recognition of dynamic textures. To our best knowledge, no such survey is currently available. Our survey is limited to temporal textures: we do not consider the other two classes of motion patterns. Even within DT area, our attention is ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
This paper is a brief survey of approaches to description and recognition of dynamic textures. To our best knowledge, no such survey is currently available. Our survey is limited to temporal textures: we do not consider the other two classes of motion patterns. Even within DT area, our attention is further limited to characterisation and recognition only. In particular, we do not address DT modelling and synthesis, except for the case when model parameters are used for recognition. (For recent work on synthesis, see [16, 7, 8, 33].) Basically, we will deal with dynamic texture descriptors, or features, that have the potential of being used for DT detection, segmentation, recognition and indexing in video
Binet-cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes
- International Journal of Computer Vision
, 2005
"... Abstract. We derive a family of kernels on dynamical systems by applying the Binet-Cauchy theorem to trajectories of states. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
Abstract. We derive a family of kernels on dynamical systems by applying the Binet-Cauchy theorem to trajectories of states. Our derivation provides a unifying framework for all kernels on dynamical systems currently used in machine learning, including kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. In the case of linear time-invariant systems, we derive explicit formulae for computing the proposed Binet-Cauchy kernels by solving Sylvester equations, and relate the proposed kernels to existing kernels based on cepstrum coefficients and subspace angles. Besides their theoretical appeal, these kernels can be used efficiently in the comparison of video sequences of dynamic scenes that can be modeled as the output of a linear time-invariant dynamical system. One advantage of our kernels is that they take the initial conditions of the dynamical systems into account. As a first example, we use our kernels to compare video sequences of dynamic textures. As a second example, we apply our kernels to the problem of clustering short clips of a movie. Experimental evidence shows superior performance of our kernels. Keywords: Binet-Cauchy theorem, ARMA models and dynamical systems, Sylvester
Statistical analysis of dynamic actions
- IEEE TRANS. PATTERN ANAL. MACH. INTELL
, 2006
"... Real-world action recognition applications require the development of systems which are fast, can handle a large variety of actions without a priori knowledge of the type of actions, need a minimal number of parameters and necessitate as short as possible learning stage. In this paper we suggest suc ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Real-world action recognition applications require the development of systems which are fast, can handle a large variety of actions without a priori knowledge of the type of actions, need a minimal number of parameters and necessitate as short as possible learning stage. In this paper we suggest such an approach. We regard dynamic activities as long-term temporal objects, which are characterized by spatio-temporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences which captures the similarities in their behavioral content. This measure is non-parametric and can thus handle a wide range of complex dynamic actions. Having a behavior-based distance measure between sequences, we use it for a variety of tasks, including: video indexing, temporal segmentation and action-based video clustering. These tasks are performed without prior knowledge of the types of actions, their models, or their temporal extents.

