Results 1 - 10
of
17
Modeling dominance in group conversations using non-verbal activity cues
- IDIAP Research Report
, 2007
"... Abstract — Dominance- a behavioral expression of power- is a fundamental mechanism of social interaction, expressed and perceived in conversations through spoken words and audio-visual nonverbal cues. The automatic modeling of dominance patterns from sensor data represents a relevant problem in soci ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract — Dominance- a behavioral expression of power- is a fundamental mechanism of social interaction, expressed and perceived in conversations through spoken words and audio-visual nonverbal cues. The automatic modeling of dominance patterns from sensor data represents a relevant problem in social computing. In this paper, we present a systematic study on dominance modeling in group meetings from fully automatic nonverbal activity cues, in a multi-camera, multi-microphone setting. We investigate efficient audio and visual activity cues for the characterization of dominant behavior, analyzing single and joint modalities. Unsupervised and supervised approaches for dominance modeling are also investigated. Activity cues and models are objectively evaluated on a set of dominance-related classification tasks, derived from an analysis of the variability of human judgment of perceived dominance in group discussions. Our investigation highlights the power of relatively simple yet efficient approaches and the challenges of audio-visual integration. This constitutes the most detailed study on automatic dominance modeling in meetings to date. Index Terms — Group Meetings, dominance modeling, nonverbal communication, audio-visual activity cues
Analyzing group interactions in conversations: a review
- IN PROC. IEEE INT. CONF. MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS ’06
, 2006
"... ..."
Learning Situation Models for Providing Context- Aware Services
"... In order to provide information and communication services without disrupting human activity, information services must implicitly conform to the current context of human activity. However, the variability of human environments and human preferences make it impossible to preprogram the appropriate b ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In order to provide information and communication services without disrupting human activity, information services must implicitly conform to the current context of human activity. However, the variability of human environments and human preferences make it impossible to preprogram the appropriate behaviors for a context aware service. One approach to overcoming this obstacle is to have services adapt behavior to individual preferences though feedback from users. This article describes a method for learning situation models to drive context-aware services. With this approach an initial simplified situation model is adapted to accommodate user preferences by a supervised learning algorithm using feedback from users. To bootstrap this process, the initial situation model is acquired by applying an automatic segmentation process to sample observation of human activities. This model is subsequently adapted to different operating environments and human preferences through interaction with users, using a supervised learning algorithm.
Predicting Two Facets of Social Verticality in Meetings from Five-Minute Time Slices and Nonverbal Cues
"... This paper addresses the automatic estimation of two aspects of social verticality (status and dominance) in small-group meetings using nonverbal cues. The correlation of nonverbal behavior with these social constructs have been extensively documented in social psychology, but their value for comput ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper addresses the automatic estimation of two aspects of social verticality (status and dominance) in small-group meetings using nonverbal cues. The correlation of nonverbal behavior with these social constructs have been extensively documented in social psychology, but their value for computational models is, in many cases, still unknown. We present a systematic study of automatically extracted cues- including vocalic, visual activity, and visual attention cues- and investigate their relative effectiveness to predict both the most-dominant person and the high-status project manager from relative short observations. We use five hours of task-oriented meeting data with natural behavior for our experiments. Our work suggests that, although dominance and role-based status are related concepts, they are not equivalent and are thus not equally explained by the same nonverbal cues. Furthermore, the best cues can correctly predict the person with highest dominance or role-based status with an accuracy of 70 % approximately.
Extracting Information from Multimedia Meeting Collections
- ACM SIGMM Information Workshop on Multimedia Information Retrieval, in conjunction with ACM Multimedia
, 2005
"... Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them in large quantities, by the opportunities for information access and retrieval applications derived from the automatic extraction of relevant meeting information, and by the challenges that the extraction of semantic information from real human activities entails. In this paper, we present a succint overview of recent approaches in this field, largely influenced by our own experiences. We first review some of the existing and potential needs for users of multimedia meeting information systems. We then summarize recent work on various research areas addressing some of these requirements. In more detail, we describe our work on automatic analysis of human interaction patterns from audio-visual sensors, discussing open issues in this domain.
Multi-Person Visual Focus of Attention from Head Pose and Meeting Contextual Cues
"... This paper introduces a novel contextual model for the recognition of people’s visual focus of attention (VFOA) estimation in meetings from audio-visual perceptual cues. More specifically, instead of independently recognizing the VFOA of each meeting participant from his own head pose, we propose t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper introduces a novel contextual model for the recognition of people’s visual focus of attention (VFOA) estimation in meetings from audio-visual perceptual cues. More specifically, instead of independently recognizing the VFOA of each meeting participant from his own head pose, we propose to jointly recognize the participants ’ visual attention in order to introduce context dependent interaction models that relates to group activity and the social dynamics of communication. Meeting contextual information is represented by the location of people, conversational events identifying floor holding patterns, and a presentation activity variable. By modeling the interactions between the different contexts and their combined and sometimes contradictory impact on the gazing behavior, our model allows to handle VFOA recognition in difficult task-based meetings involving artifacts, presentations, and moving people. We validated our model through rigorous evaluation on a publicly available and challenging dataset of 12 real meetings (five hours of data). The results demonstrated that the integration of the presentation and conversation dynamical context using our model can lead to significant performance improvements.
Activity Recognition Using a Combination of Category Components and Local Models for Video Surveillance
"... Abstract—This paper presents a novel approach for automatic recognition of human activities for video surveillance applications. We propose to represent an activity by a combination of category components and demonstrate that this approach offers flexibility to add new activities to the system and a ..."
Abstract
- Add to MetaCart
Abstract—This paper presents a novel approach for automatic recognition of human activities for video surveillance applications. We propose to represent an activity by a combination of category components and demonstrate that this approach offers flexibility to add new activities to the system and an ability to deal with the problem of building models for activities lacking training data. For improving the recognition accuracy, a confident-frame-based recognition algorithm is also proposed, where the video frames with high confidence for recognizing an activity are used as a specialized local model to help classify the remainder of the video frames. Experimental results show the effectiveness of the proposed approach. Index Terms—Category components, event detection, local model, video surveillance. I. INTRODUCTION AND RELATED WORK
Behavior Modeling with Probabilistic Context Free Grammars
"... Abstract – Identifying the behavioral patterns in a social network setting is beneficial to understand how people behave in certain application domains. Such patterns can also be utilized to characterize social signals such as social roles from interactions. In this work, we examine how probabilisti ..."
Abstract
- Add to MetaCart
Abstract – Identifying the behavioral patterns in a social network setting is beneficial to understand how people behave in certain application domains. Such patterns can also be utilized to characterize social signals such as social roles from interactions. In this work, we examine how probabilistic context free grammars (PCFGs) can be utilized to model interactions and role taking in a social network. We describe how to automatically build a PCFG given a set of interactions as the training data. Our experiments on the Mission Survival Corpus 1 (MSC-1) dataset show that PCFGs are a concise way of modeling social entity behaviors and are useful in understanding the probability distribution of interactions as well as the behavior types that are observed.
Analysis and Indexing – indexing methods
"... Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them ..."
Abstract
- Add to MetaCart
Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them in large quantities, by the opportunities for information access and retrieval applications derived from the automatic extraction of relevant meeting information, and by the challenges that the extraction of semantic information from real human activities entails. In this paper, we present a succint overview of recent approaches in this field, largely influenced by our own experiences. We first review some of the existing and potential needs for users of multimedia meeting information systems. We then summarize recent work on various research areas addressing some of these requirements. In more detail, we describe our work on automatic analysis of human interaction patterns from audio-visual sensors, discussing open issues in this domain.
EXPLORING CONTEXTUAL INFORMATION IN A LAYERED FRAMEWORK FOR GROUP ACTION RECOGNITION
"... Contextual information is important for sequence modeling. Hidden Markov Models (HMMs) and extensions, which have been widely used for sequence modeling, make simplifying, often unrealistic assumptions on the conditional independence of observations given the class labels, thus cannot accommodate ov ..."
Abstract
- Add to MetaCart
Contextual information is important for sequence modeling. Hidden Markov Models (HMMs) and extensions, which have been widely used for sequence modeling, make simplifying, often unrealistic assumptions on the conditional independence of observations given the class labels, thus cannot accommodate overlapping features or long-term contextual information. In this paper, we introduce a principled layered framework with three implementation methods that take into account contextual information (as available in the whole or part of the sequence). The first two methods are based on state alpha and gamma posteriors (as usually referred to in the HMM formalism). The third method is based on Conditional Random Fields (CRFs), a conditional model that relaxes the independent assumption on the observations required by HMMs for computational tractability. We illustrate our methods with the application of recognizing group actions in meetings. Experiments and comparison with standard HMM baseline showed the validity of the proposed approach. 1

