Results 1 - 10
of
21
Automatic Analysis of Multimodal Group Actions in Meetings
, 2003
"... This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audio ..."
Abstract
-
Cited by 90 (26 self)
- Add to MetaCart
This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audio-visual features monitoring the actions of individuals. Experiments demonstrate the importance of taking interactions into account in modelling the group actions. It is also shown that the visual modality contains useful information, even for predominantly audio-based events, motivating a multimodal approach to meeting analysis.
Multimodal integration for meeting group action segmentation and recognition
- in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI
, 2005
"... Abstract. We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Abstract. We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of the meeting. A common lexicon of multimodal group meeting actions, a shared meeting data set, and a common evaluation procedure enable us to compare the different approaches. We compare three different multimodal feature sets and four modelling infrastructures: a higher semantic feature approach, multi-layer HMMs, a multistream DBN, as well as a multi-stream mixed-state DBN for disturbed data. 1
Multimodal group action clustering in meetings
- in Proc. ACM Int. Conf. on Multimedia, Workshop on Video Surveillance and Sensor Networks (ACM MM-VSSN
, 2004
"... We address the problem of clustering multimodal group actions in meetings using a two-layer HMM framework. Meetings are structured as sequences of group actions. Our approach aims at creating one cluster for each group action, where the number of group actions and the action boundaries are unknown a ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
We address the problem of clustering multimodal group actions in meetings using a two-layer HMM framework. Meetings are structured as sequences of group actions. Our approach aims at creating one cluster for each group action, where the number of group actions and the action boundaries are unknown a priori. In our framework, the first layer models typical actions of individuals in meetings using supervised HMM learning and low-level audio-visual features. A number of options that explicitly model certain aspects of the data (e.g., asynchrony) were considered. The second layer models the group actions using unsupervised HMM learning. The two layers are linked by a set of probabilitybased features produced by the individual action layer as input to the group action layer. The methodology was assessed on a set of multimodal turn-taking group actions, using a public five-hour meeting corpus. The results show that the use of multiple modalities and the layered framework are advantageous, compared to various baseline methods. Keywords Automatic meeting analysis, multi-person event modeling, multi-sensor networks 1.
Analyzing group interactions in conversations: a review
- IN PROC. IEEE INT. CONF. MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS ’06
, 2006
"... ..."
Audio-visual Information Fusion In Human Computer Interfaces and Intelligent Environments: A Survey
"... Microphones and cameras have been extensively used to observe and detect human activity and to facilitate natural modes of interaction between humans and intelligent systems. Human brain processes the audio and video modalities extracting complementary and robust information from them. Intelligent s ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Microphones and cameras have been extensively used to observe and detect human activity and to facilitate natural modes of interaction between humans and intelligent systems. Human brain processes the audio and video modalities extracting complementary and robust information from them. Intelligent systems with audio-visual sensors should be capable of achieving similar goals. The audio-visual information fusion strategy is a key component in designing such systems. In this paper we exclusively survey the fusion techniques used in various audio-visual information fusion tasks. The fusion strategy used tends to depend mainly on the model, probabilistic or otherwise, used in the particular task to process sensory information to obtain higher level semantic information. The models themselves are task oriented. In this paper we describe the fusion strategies and the corresponding models used in audiovisual tasks such as speech recognition, tracking, biometrics, affective state recognition and meeting scene analysis. We also review the challenges and existing solutions and also unresolved or partially resolved issues in these fields. Specifically, we discuss established and upcoming work in hierarchical fusion strategies and crossmodal learning techniques, identifying these as critical areas of research in the future development of intelligent systems.
Toward generic intelligent knowledge extraction from video and audio: the EU-funded CARETAKER project
- in "IET conference on Imaging for Crime Detection and Prevention (ICDP 2006
, 2006
"... The CARETAKER 1 project, which is a 30-month project that has just kicked off, aims at studying, developing and assessing multimedia knowledge-based content analysis, knowledge extraction components, and metadata management sub-systems in the context of automated situation awareness, diagnosis and d ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The CARETAKER 1 project, which is a 30-month project that has just kicked off, aims at studying, developing and assessing multimedia knowledge-based content analysis, knowledge extraction components, and metadata management sub-systems in the context of automated situation awareness, diagnosis and decision support. More precisely, CARETAKER will focus on the extraction of structured knowledge from large multimedia collections recorded over networks of cameras and microphones deployed in real sites. The produced audio-visual streams, in addition to surveillance and safety issues, could represent a useful source of information if stored and automatically analyzed, in urban/environment planning, resource optimization, disabled/elderly person monitoring,... 1
Multi channel sequence processing
- in Proc. PASCAL Machine Learning Workshop
, 2004
"... submitted for publication Abstract. This paper summarizes some of the current research challenges arising from multichannel sequence processing. Indeed, multiple real life applications involve simultaneous recording and analysis of multiple information sources, which may be asynchronous, have differ ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
submitted for publication Abstract. This paper summarizes some of the current research challenges arising from multichannel sequence processing. Indeed, multiple real life applications involve simultaneous recording and analysis of multiple information sources, which may be asynchronous, have different frame rates, exhibit different stationarity properties, and carry complementary (or correlated) information. Some of these problems can already be tackled by one of the many statistical approaches towards sequence modeling. However, several challenging research issues are still open, such as taking into account asynchrony and correlation between several feature streams, or handling the underlying growing complexity. In this framework, we discuss here two novel approaches, which recently started to be investigated with success in the context of large multimodal problems. These include the asynchronous HMM, providing a principled approach towards the processing of multiple feature streams, and the layered HMM approach, providing a good formalism for decomposing large and complex (multi-stream) problems into layered architectures. As briefly reported here, combination of these two approaches yielded successful results on several multi-channel tasks, ranging from audio-visual speech recognition to automatic meeting analysis. 2 IDIAP–RR 05-04 1
Layered hmm for motion intention recognition
- in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS’06
, 2006
"... Abstract — Acquiring, representing and modeling human skills is one of the key research areas in teleoperation, programmingby-demonstration and human-machine collaborative settings. One of the common approaches is to divide the task that the operator is executing into several subtasks in order to pr ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract — Acquiring, representing and modeling human skills is one of the key research areas in teleoperation, programmingby-demonstration and human-machine collaborative settings. One of the common approaches is to divide the task that the operator is executing into several subtasks in order to provide manageable modeling. In this paper we consider the use of a Layered Hidden Markov Model (LHMM) to model human skills. We evaluate a gestem classifier that classifies motions into basic action-primitives, or gestems. The gestem classifiers are then used in a LHMM to model a simulated teleoperated task. We investigate the online and offline classification performance with respect to noise, number of gestems, type of HMM and the available number of training sequences. We also apply the LHMM to data recorded during the execution of a trajectory-tracking task in 2D and 3D with a robotic manipulator in order to give qualitative as well as quantitative results for the proposed approach. The results indicate that the LHMM is suitable for modeling teleoperative trajectory-tracking tasks and that the difference in classification performance between one and multi dimensional HMMs for gestem classification is small. It can also be seen that the LHMM is robust w.r.t misclassifications in the underlying gestem classifiers. I.
VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING
, 2007
"... ii ..."
Recognition of coordinated multi agent activities, the individual vs the group
"... Abstract. The problem of identifying coordinated group activity has received relatively little attention compared to the many approaches of recognising individual actions. Here a comparison between a global and individual modelling approach to classifying coordinated group activity is presented. Res ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. The problem of identifying coordinated group activity has received relatively little attention compared to the many approaches of recognising individual actions. Here a comparison between a global and individual modelling approach to classifying coordinated group activity is presented. Results show that by taking a global perspective to classifying coordinated activity in the sports domain better classification performance can be achieved within a relatively simple framework. 1

