Results 1 - 10
of
37
Active and Dynamic Information Fusion for Facial Expression Understanding from Image Sequences
- IEEE TRANS. PATTERN ANALYSIS & MACHINE INTELLIGENCE
, 2005
"... This paper explores the use of multisensory information fusion technique with Dynamic Bayesian networks (DBNs) for modeling and understanding the temporal behaviors of facial expressions in image sequences. Our facial feature detection and tracking based on active IR illumination provides reliable ..."
Abstract
-
Cited by 111 (8 self)
- Add to MetaCart
This paper explores the use of multisensory information fusion technique with Dynamic Bayesian networks (DBNs) for modeling and understanding the temporal behaviors of facial expressions in image sequences. Our facial feature detection and tracking based on active IR illumination provides reliable visual information under variable lighting and head motion. Our approach to facial expression recognition lies in the proposed dynamic and probabilistic framework based on combining DBNs with Ekman's Facial Action Coding System (FACS) for systematically modeling the dynamic and stochastic behaviors of spontaneous facial expressions. The framework not only provides a coherent and unified hierarchical probabilistic framework to represent spatial and temporal information related to facial expressions, but also allows us to actively select the most informative visual cues from the available information sources to minimize the ambiguity in recognition. The recognition of facial expressions is accomplished by fusing not only from the current visual observations, but also from the previous visual evidences. Consequently, the recognition becomes more robust and accurate through explicitly modeling temporal behavior of facial expression. In this paper, we present the theoretical foundation underlying the proposed probabilistic and dynamic framework for facial expression modeling and understanding. Experimental results demonstrate that our approach can accurately and robustly recognize spontaneous facial expressions from an image sequence under different conditions.
Context-aware visual tracking
- IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2009
"... Enormous uncertainties in unconstrained environments lead to a fundamental dilemma that many tracking algorithms have to face in practice: Tracking has to be computationally efficient, but verifying whether or not the tracker is following the true target tends to be demanding, especially when the ba ..."
Abstract
-
Cited by 55 (8 self)
- Add to MetaCart
(Show Context)
Enormous uncertainties in unconstrained environments lead to a fundamental dilemma that many tracking algorithms have to face in practice: Tracking has to be computationally efficient, but verifying whether or not the tracker is following the true target tends to be demanding, especially when the background is cluttered and/or when occlusion occurs. Due to the lack of a good solution to this problem, many existing methods tend to be either effective but computationally intensive by using sophisticated image observation models or efficient but vulnerable to false alarms. This greatly challenges long-duration robust tracking. This paper presents a novel solution to this dilemma by considering the context of the tracking scene. Specifically, we integrate into the tracking process a set of auxiliary objects that are automatically discovered in the video on the fly by data mining. Auxiliary objects have three properties, at least in a short time interval: 1) persistent co-occurrence with the target, 2) consistent motion correlation to the target, and 3) easy to track. Regarding these auxiliary objects as the context of the target, the collaborative tracking of these auxiliary objects leads to efficient computation as well as strong verification. Our extensive experiments have exhibited exciting performance in very challenging realworld testing cases.
Measurement Integration Under Inconsistency for Robust Tracking
- IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR
, 2006
"... The solutions to many vision problems involve integrating measurements from multiple sources. Most existing methods rely on a hidden assumption, i.e., these measurements are consistent. In reality, unfortunately, this may not hold. The fact that naively fusing inconsistent measurements amounts to fa ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
(Show Context)
The solutions to many vision problems involve integrating measurements from multiple sources. Most existing methods rely on a hidden assumption, i.e., these measurements are consistent. In reality, unfortunately, this may not hold. The fact that naively fusing inconsistent measurements amounts to failing these methods indicates that this is not a trivial problem. This paper presents a novel approach to handling it. A new theorem is proven that gives two algebraic criteria to examine the consistency and inconsistency. In addition, a more general criterion is presented. Based on the theoretical analysis, a new information integration method is proposed and leads to encouraging results when applied to the task of visual tracking. 1
Variational Learning in Mixed-State Dynamic Graphical Models
- IN PROCEEDINGS OF THE FIFTEENTH ANNUAL CONFERENCE ON UNCERTAINTY IN ARTI INTELLIGENCE (UAI{99
, 1999
"... Many real-valued stochastic time-series are locally linear (Gaussian), but globally nonlinear. For example, the trajectory of a human hand gesture can be viewed as a linear dynamic system driven by a nonlinear dynamic system that represents muscle actions. We present a mixed-state dynamic grap ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Many real-valued stochastic time-series are locally linear (Gaussian), but globally nonlinear. For example, the trajectory of a human hand gesture can be viewed as a linear dynamic system driven by a nonlinear dynamic system that represents muscle actions. We present a mixed-state dynamic graphical model in which a hidden Markov model drives a linear dynamic system. This combination allows us to model both the discrete and continuous causes of trajectories suchas human gestures. The number of computations needed for exact inference is exponential in the sequence length, so we derive an approximate variational inference technique that can also be used to learn the parameters of the discrete and continuous models. We showhow the mixed-state model and the variational technique can be used to classify human hand gestures made with a computer mouse.
Applying Dynamic Bayesian Networks in Transliteration Identification
"... This report presents work associated with the application of Dynamic Bayesian Networks (DBNs) in transliteration identification. Transliteration identification is mainly needed to help deal with Out Of Vocabulary words in Cross Language Information Retrieval (CLIR) and Machine Translation (MT). In r ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
This report presents work associated with the application of Dynamic Bayesian Networks (DBNs) in transliteration identification. Transliteration identification is mainly needed to help deal with Out Of Vocabulary words in Cross Language Information Retrieval (CLIR) and Machine Translation (MT). In related transliteration studies, transliteration identification refers
Narrative Spaces: bridging architecture and entertainment via interactive technology
- 6th International Conference on Generative Art
, 2002
"... Our society's modalities of communication are rapidly changing. Large panel displays and screens are be ing installed in many public spaces, ranging from open plazas, to shopping malls, to private houses, to theater stages, classrooms, and museums. In parallel, wearable computers are transformi ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Our society's modalities of communication are rapidly changing. Large panel displays and screens are be ing installed in many public spaces, ranging from open plazas, to shopping malls, to private houses, to theater stages, classrooms, and museums. In parallel, wearable computers are transforming our technological landscape by reshaping the heavy, bulky desktop computer into a lightweight, portable device that is accessible to people at any time. Computation and sensing are moving from computers and devices into the environment itself. The space around us is instrumented with sensors and displays, and it tends to reflect a diffused need to combine together the information space with our physical space. This combination of large public and miniature personal digital displays together with distributed computing and sensing intelligence offers unprecedented opportunities to merge the virtual and the real, the information landscape of the Internet with the urban landscape of the city, to transform digital animated media in storytellers, in public installations and through personal wearable technology. This paper describes technological platforms built at the MIT Media Lab, through 1994-2002, that contribute to defining new trends in architecture that merge virtual and real spaces, and are reshaping the way we live and experience the museum, the house, the theater, and the modern city.
Variational Maximum A Posteriori by Annealed Mean Field Analysis
"... Abstract—This paper proposes a novel probabilistic variational method with deterministic annealing for the maximum a posteriori (MAP) estimation of complex stochastic systems. Since the MAP estimation involves global optimization, in general, it is very difficult to achieve. Therefore, most probabil ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Abstract—This paper proposes a novel probabilistic variational method with deterministic annealing for the maximum a posteriori (MAP) estimation of complex stochastic systems. Since the MAP estimation involves global optimization, in general, it is very difficult to achieve. Therefore, most probabilistic inference algorithms are only able to achieve either the exact or the approximate posterior distributions. Our method constrains the mean field variational distribution to be multivariate Gaussian. Then, a deterministic annealing scheme is nicely incorporated into the mean field fix-point iterations to obtain the optimal MAP estimate. This is based on the observation that when the covariance of the variational Gaussian distribution approaches to zero, the infimum point of the Kullback-Leibler (KL) divergence between the variational Gaussian and the real posterior will be the same as the supreme point of the real posterior. Although global optimality may not be guaranteed, our extensive synthetic and real experiments demonstrate the effectiveness and efficiency of the proposed method. Index Terms—Mean field variational analysis, deterministic annealing, maximum a posteriori estimation, graphical model, Markov network. 1
Human-robot interface with anticipatory characteristics based on Laban Movement Analysis and Bayesian models
- roceedings of the IEEE 10th International Conference on Rehabilitation Robotics (ICORR), 2007
"... In this work we contribute to the field of human-machine interaction with a system that anticipates human movements using the concept of Laban Movement Analysis (LMA). The implementation uses a Bayesian model for learning and classification and results are presented for the application to online ges ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
(Show Context)
In this work we contribute to the field of human-machine interaction with a system that anticipates human movements using the concept of Laban Movement Analysis (LMA). The implementation uses a Bayesian model for learning and classification and results are presented for the application to online gesture recognition. The merging of assistive robotics and socially interactive robotics has recently led to the definition of socially assistive robotics. What is necessary and we found still missing are socially interactive robots with a higher level cognitive system which analyzes deeply the observed human movement. In this article we provide a framework for cognitive processes to be implemented in human-machine-interfaces based on nowadays technologies. We present LMA as a concept that helps to identify useful low-level features, defines a framework of mid-level descriptors for movement-properties and helps to develop a classifier of expressive actions. Our interface anticipates a performed action observed from a stream of monocular camera images by using a Bayesian framework. With this work we define the required qualities and characteristics of future embodied agents in terms of social interaction with humans. This article searches for human qualities like anticipation and empathy and presents possible ways towards implementation in the cognitive system of a social robot. We present results through its embodiment in the social robot ’Nicole ’ in the context of a person performing gestures and ’Nicole ’ reacting by means of audio output and robot movement. ∗This work is partially supported by FCT-Fundação para a Ciência
Gesture recognition using a marionette model and dynamic bayesian networks (dbns
- ICIAR 2006. LNCS
, 2006
"... Abstract. This paper presents a framework for gesture recognition by modeling a system based on Dynamic Bayesian Networks (DBNs) from a Marionette point of view. To incorporate human qualities like anticipation and empathy inside the perception system of a social robot remains, so far an open issue. ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents a framework for gesture recognition by modeling a system based on Dynamic Bayesian Networks (DBNs) from a Marionette point of view. To incorporate human qualities like anticipation and empathy inside the perception system of a social robot remains, so far an open issue. It is our goal to search for ways of implementation and test the feasibility. Towards this end we started the development of the guide robot ’Nicole ’ equipped with a monocular camera and an inertial sensor to observe its environment. The context of interaction is a person performing gestures and ’Nicole ’ reacting by means of audio output and motion. In this paper we present a solution to the gesture recognition task based on Dynamic Bayesian Network (DBN). We show that using a DBN is a human-like concept of recognizing gestures that encompass the quality of anticipation through the concept of prediction and update. A novel approach is used by incorporating a marionette model in the DBN as a trade-off between simple constant acceleration models and complex articulated models. 1
Vision And Learning For Intelligent Human-Computer Interaction
, 2001
"... It was a dream to make computers see. The research in computer vision provides promising technologies to capture, analyze, transmit, retrieve and interpret visual information. However, due to the richness and large variations in the visual inputs, the practice of many statistical learning techniques ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
It was a dream to make computers see. The research in computer vision provides promising technologies to capture, analyze, transmit, retrieve and interpret visual information. However, due to the richness and large variations in the visual inputs, the practice of many statistical learning techniques for visual motion capturing and recognition are confronted by some similar problems, such that making intelligent and visually capable machines is still a challenging task. This dissertation concentrates on two important problems: capturing and recognizing human motion in video sequences, which are crucial for the research and applications of intelligent human computer interaction, multimedia communication, and smart environments.