Results 1 - 10
of
119
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions
, 2009
"... Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypi ..."
Abstract
-
Cited by 374 (51 self)
- Add to MetaCart
Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions, despite the fact that deliberate behavior differs in visual appearance, audio profile, and timing from spontaneously occurring behavior. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behavior have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis, including audiovisual fusion, linguistic and paralinguistic fusion, and multicue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next, we examine available approaches for solving the problem of machine understanding of human affective behavior and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.
Multimodal fusion for multimedia analysis: a survey
, 2010
"... This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several c ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature, decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses, and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues for further research in the area of multimodal fusion.
Affective Feedback: An Investigation into the Role of Emotions in the Information Seeking Process
"... User feedback is considered to be a critical element in the information seeking process, especially in relation to relevance assessment. Current feedback techniques determine content relevance with respect to the cognitive and situational levels of interaction that occurs between the user and the re ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
(Show Context)
User feedback is considered to be a critical element in the information seeking process, especially in relation to relevance assessment. Current feedback techniques determine content relevance with respect to the cognitive and situational levels of interaction that occurs between the user and the retrieval system. However, apart from real-life problems and information objects, users interact with intentions, motivations and feelings, which can be seen as critical aspects of cognition and decision-making. The study presented in this paper serves as a starting point to the exploration of the role of emotions in the information seeking process. Results show that the latter not only interweave with different physiological, psychological and cognitive processes, but also form distinctive patterns, according to specific task, and according to specific user.
Gaze-X: Adaptive affective multimodal interface for single-user office scenarios
- Proc. ACM Int’l Conf. Multimodal Interfaces
, 2006
"... This paper describes an intelligent system that we developed to support affective multimodal human-computer interaction (AMM-HCI) where the user’s actions and emotions are modeled and then used to adapt the HCI and support the user in his or her activity. The proposed system, which we named Gaze-X, ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
(Show Context)
This paper describes an intelligent system that we developed to support affective multimodal human-computer interaction (AMM-HCI) where the user’s actions and emotions are modeled and then used to adapt the HCI and support the user in his or her activity. The proposed system, which we named Gaze-X, is based on sensing and interpretation of the human part of the computer’s context, known as W5+ (who, where, what, when, why, how). It integrates a number of natural human communicative modalities including speech, eye gaze direction, face and facial expression, and a number of standard HCI modalities like keystrokes, mouse movements, and active software identification, which, in turn, are fed into processes that provide decision making and adapt the HCI to support the user in his or her activity according to his or her preferences. To attain a system that can be educated, that can improve its knowledge and decision making through experience, we use case-based reasoning as the inference engine of Gaze-X. The utilized case base is a dynamic, incrementally self-organizing event-content-addressable memory that allows fact retrieval and evaluation of encountered events based upon the user preferences and the generalizations formed from prior input. To support concepts of concurrency, modularity/scalability, persistency, and mobility, Gaze-X has been built as an agent-based system where different agents are responsible for different parts of the processing. A usability study conducted in an office scenario with a number of users indicates that Gaze-X is perceived as effective, easy to use, useful, and affectively qualitative.
Multimodal Interfaces: A Survey of Principles, Models and Frameworks
"... Abstract. The grand challenge of multimodal interface creation is to build reliable processing systems able to analyze and understand multiple communication means in real-time. This opens a number of associated issues covered by this chapter, such as heterogeneous data types fusion, architectures fo ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
(Show Context)
Abstract. The grand challenge of multimodal interface creation is to build reliable processing systems able to analyze and understand multiple communication means in real-time. This opens a number of associated issues covered by this chapter, such as heterogeneous data types fusion, architectures for real-time processing, dialog management, machine learning for multimodal interaction, modeling languages, frameworks, etc. This chapter does not intend to cover exhaustively all the issues related to multimodal interfaces creation and some hot topics, such as error handling, have been left aside. The chapter starts with the features and advantages associated with multimodal interaction, with a focus on particular findings and guidelines, as well as cognitive foundations underlying multimodal interaction. The chapter then focuses on the driving theoretical principles, time-sensitive software architectures and multimodal fusion and fission issues. Modeling of multimodal interaction as well as tools allowing rapid creation of multimodal interfaces are then presented. The article concludes with an outline of the current state of multimodal interaction research in Switzerland, and also summarizes the major future challenges in the field. 1
Human-Computer Interaction: Overview on State of the Art
"... The intention of this paper is to provide an overview on the subject of Human-Computer Interaction. The overview includes the basic definitions and terminology, a survey of existing technologies and recent advances in the field, common architectures used in the design of HCI systems which includes u ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
The intention of this paper is to provide an overview on the subject of Human-Computer Interaction. The overview includes the basic definitions and terminology, a survey of existing technologies and recent advances in the field, common architectures used in the design of HCI systems which includes unimodal and multimodal configurations, and finally the applications of HCI. This paper also offers a comprehensive number of references for each concept, method, and application in the HCI.
Audio-visual Information Fusion In Human Computer Interfaces and Intelligent Environments: A Survey
"... Microphones and cameras have been extensively used to observe and detect human activity and to facilitate natural modes of interaction between humans and intelligent systems. Human brain processes the audio and video modalities extracting complementary and robust information from them. Intelligent s ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Microphones and cameras have been extensively used to observe and detect human activity and to facilitate natural modes of interaction between humans and intelligent systems. Human brain processes the audio and video modalities extracting complementary and robust information from them. Intelligent systems with audio-visual sensors should be capable of achieving similar goals. The audio-visual information fusion strategy is a key component in designing such systems. In this paper we exclusively survey the fusion techniques used in various audio-visual information fusion tasks. The fusion strategy used tends to depend mainly on the model, probabilistic or otherwise, used in the particular task to process sensory information to obtain higher level semantic information. The models themselves are task oriented. In this paper we describe the fusion strategies and the corresponding models used in audiovisual tasks such as speech recognition, tracking, biometrics, affective state recognition and meeting scene analysis. We also review the challenges and existing solutions and also unresolved or partially resolved issues in these fields. Specifically, we discuss established and upcoming work in hierarchical fusion strategies and crossmodal learning techniques, identifying these as critical areas of research in the future development of intelligent systems.
IMPLICIT HUMAN-CENTERED TAGGING
, 2009
"... This paper provides a general introduction to the concept of Implicit Human-Centered Tagging (IHCT)- the automatic extraction of tags from nonverbal behavioral feedback of media users. The main idea behind IHCT is that nonverbal behaviors displayed when interacting with multimedia data (e.g., facial ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper provides a general introduction to the concept of Implicit Human-Centered Tagging (IHCT)- the automatic extraction of tags from nonverbal behavioral feedback of media users. The main idea behind IHCT is that nonverbal behaviors displayed when interacting with multimedia data (e.g., facial expressions, head nods, etc.) provide information useful for improving the tag sets associated with the data. As such behaviors are displayed naturally and spontaneously, no effort is required from the users, and this is why the resulting tagging process is said to be “implicit”. Tags obtained through IHCT are expected to be more robust than tags associated with the data explicitly, at least in terms of: generality (they make sense to everybody) and statistical reliability (all tags will be sufficiently represented). The paper discusses these issues in detail and provides an overview of pioneering efforts in the field.
Multimodal interfaces: Challenges and perspectives
- Journal of Ambient Intelligence and Smart Environments
, 2009
"... Abstract. The development of interfaces has been a technology-driven process. However, the newly developed multimodal interfaces are using recognition-based technologies that must interpret human-speech, gesture, gaze, movement patterns, and other behavioral cues. As a result, the interface design ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The development of interfaces has been a technology-driven process. However, the newly developed multimodal interfaces are using recognition-based technologies that must interpret human-speech, gesture, gaze, movement patterns, and other behavioral cues. As a result, the interface design requires a human-centered approach. In this paper we review the major approaches to multimodal Human Computer Interaction, giving an overview the user and task modeling, and to the multimodal fusion. We highlight the challenges, open issues, and the future trends in multimodal interfaces research.
Automatic Facial Expression Recognition on A Single 3D Face by Exploring Shape Deformation
- 17th ACM International Conference on Multimedia
, 2009
"... Facial expression recognition has many applications in multime-dia processing and the development of 3D data acquisition tech-niques makes it possible to identify expressions using 3D shape information. In this paper, we propose an automatic facial expres-sion recognition approach based on a single ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Facial expression recognition has many applications in multime-dia processing and the development of 3D data acquisition tech-niques makes it possible to identify expressions using 3D shape information. In this paper, we propose an automatic facial expres-sion recognition approach based on a single 3D face. The shape of an expressional 3D face is approximated as the sum of two parts, a basic facial shape component (BFSC) and an expressional shape component (ESC). The BFSC represents the basic face structure and neutral-style shape and the ESC contains shape changes caused by facial expressions. To separate the BFSC and ESC, our method firstly builds a reference face for each input 3D non-neutral face by a learning method, which well represents the basic facial shape. Then, based on the BFSC and the original expressional face, a fa-cial expression descriptor is designed. The surface depth changes are considered in the descriptor. Finally, the descriptor is input into an SVM to recognize the expression. Unlike previous meth-ods which recognize a facial expression with the help of manually labeled key points and/or a neutral face, our method works on a sin-gle 3D face without any manual assistance. Extensive experiments are carried out on the BU-3DFE database and comparisons with existing methods are conducted. The experimental results show the effectiveness of our method.