Results 1 - 10
of
377
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have bee ..."
Abstract
-
Cited by 770 (3 self)
- Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Graphcut textures: Image and video synthesis using graph cuts
- ACM Transactions on Graphics, SIGGRAPH 2003
, 2003
"... This banner was generated by merging the source images in Figure 6 using our interactive texture merging technique. In this paper we introduce a new algorithm for image and video texture synthesis. In our approach, patch regions from a sample image or video are transformed and copied to the output a ..."
Abstract
-
Cited by 490 (9 self)
- Add to MetaCart
This banner was generated by merging the source images in Figure 6 using our interactive texture merging technique. In this paper we introduce a new algorithm for image and video texture synthesis. In our approach, patch regions from a sample image or video are transformed and copied to the output and then stitched together along optimal seams to generate a new (and typically larger) output. In contrast to other techniques, the size of the patch is not chosen a-priori, but instead a graph cut technique is used to determine the optimal patch region for any given offset between the input and output texture. Unlike dynamic programming, our graph cut technique for seam optimization is applicable in any dimension. We specifically explore it in 2D and 3D to perform video texture synthesis in addition to regular image synthesis. We present approximative offset search techniques that work well in conjunction with the presented patch size optimization. We show results for synthesizing regular, random, and natural images and videos. We also demonstrate how this method can be used to interactively merge different images to generate new scenes.
Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2007
"... Dynamic texture is an extension of texture to the temporal domain. Description and recognition of dynamic textures have attracted growing attention. In this paper, a novel approach for recognizing dynamic textures is proposed and its simplifications and extensions to facial image analysis are also ..."
Abstract
-
Cited by 291 (35 self)
- Add to MetaCart
Dynamic texture is an extension of texture to the temporal domain. Description and recognition of dynamic textures have attracted growing attention. In this paper, a novel approach for recognizing dynamic textures is proposed and its simplifications and extensions to facial image analysis are also considered. First, the textures are modeled with volume local binary patterns (VLBP), which are an extension of the LBP operator widely used in ordinary texture analysis, combining motion and appearance. To make the approach computationally simple and easy to extend, only the co-occurrences on three orthogonal planes (LBP-TOP) are then considered. A block-based method is also proposed to deal with specific dynamic events, such as facial expressions, in which local information and its spatial locations should also be taken into account. In experiments with two dynamic texture databases, DynTex and MIT, both the VLBP and LBP-TOP clearly outperformed the earlier approaches. The proposed block-based method was evaluated with the Cohn-Kanade facial expression database with excellent results. Advantages of our approach include local processing, robustness to monotonic gray-scale changes and simple computation.
Machine recognition of human activities: A survey
, 2008
"... The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the a ..."
Abstract
-
Cited by 218 (0 self)
- Add to MetaCart
(Show Context)
The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing—robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing—make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) “actions ” and 2) “activities. ” “Actions ” are characterized by simple motion patterns typically executed by a single human. “Activities ” are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.
Motion Texture: A Two-Level Statistical Model for Character Motion Synthesis
- ACM Transactions on Graphics
, 2002
"... In this paper, we describe a novel technique, called motion texture, for synthesizing complex human-figure motion (e.g., dancing) that is statistically similar to the original motion captured data. We de- fine motion texture as a set of motion textons and their distribution, which characterize the s ..."
Abstract
-
Cited by 211 (2 self)
- Add to MetaCart
(Show Context)
In this paper, we describe a novel technique, called motion texture, for synthesizing complex human-figure motion (e.g., dancing) that is statistically similar to the original motion captured data. We de- fine motion texture as a set of motion textons and their distribution, which characterize the stochastic and dynamic nature of the captured motion. Specifically, a motion texton is modeled by a linear dynamic system (LDS) while the texton distribution is represented by a transition matrix indicating how likely each texton is switched to another. We have designed a maximum likelihood algorithm to learn the motion textons and their relationship from the captured dance motion. The learnt motion texture can then be used to generate new animations automatically and/or edit animation sequences interactively. Most interestingly, motion texture can be manipulated at different levels, either by changing the fine details of a specific motion at the texton level or by designing a new choreography at the distribution level. Our approach is demonstrated by many synthesized sequences of visually compelling dance motion.
A review of statistical approaches to level set segmentation: Integrating color, texture, motion and shape
- International Journal of Computer Vision
, 2007
"... Abstract. Since their introduction as a means of front propagation and their first application to edge-based segmentation in the early 90’s, level set methods have become increasingly popular as a general framework for image segmentation. In this paper, we present a survey of a specific class of reg ..."
Abstract
-
Cited by 169 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Since their introduction as a means of front propagation and their first application to edge-based segmentation in the early 90’s, level set methods have become increasingly popular as a general framework for image segmentation. In this paper, we present a survey of a specific class of region-based level set segmentation methods and clarify how they can all be derived from a common statistical framework. Region-based segmentation schemes aim at partitioning the image domain by progressively fitting statistical models to the intensity, color, texture or motion in each of a set of regions. In contrast to edge-based schemes such as the classical Snakes, region-based methods tend to be less sensitive to noise. For typical images, the respective cost functionals tend to have less local minima which makes them particularly well-suited for local optimization methods such as the level set method. We detail a general statistical formulation for level set segmentation. Subsequently, we clarify how the integration of various low level criteria leads to a set of cost functionals and point out relations between the different segmentation schemes. In experimental results, we demonstrate how the level set function is driven to partition the image plane into domains of coherent color, texture, dynamic texture or motion. Moreover, the Bayesian formulation allows to introduce prior shape knowledge into the level set method. We briefly review a number of advances in this domain.
Motion-Based Background Subtraction Using Adaptive Kernel Density Estimation
, 2004
"... Background modeling is an important component of many vision systems. Existing work in the area has mostly addressed scenes that consist of static or quasi-static structures. When the scene exhibits a persistent dynamic behavior in time, such an assumption is violated and detection performance deter ..."
Abstract
-
Cited by 166 (1 self)
- Add to MetaCart
(Show Context)
Background modeling is an important component of many vision systems. Existing work in the area has mostly addressed scenes that consist of static or quasi-static structures. When the scene exhibits a persistent dynamic behavior in time, such an assumption is violated and detection performance deteriorates. In this paper, we propose a new method for the modeling and subtraction of such scenes. Towards the modeling of the dynamic characteristics, optical flow is computed and utilized as a feature in a higher dimensional space. Inherent ambiguities in the computation of features are addressed by using a data-dependent bandwidth for density estimation using kernels. Extensive experiments demonstrate the utility and performance of the proposed approach.
Gaussian process dynamical models for human motion
- IEEE TRANS. PATTERN ANAL. MACHINE INTELL
, 2008
"... We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, ..."
Abstract
-
Cited by 158 (5 self)
- Add to MetaCart
We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from high-dimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces.