Results 1 - 10
of
95
Toward an Affect-Sensitive Multimodal Human-Computer Interaction
- Proceedings of the IEEE
, 2003
"... The ability to recognize affective states of a person... This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence -- the ability to recognize a user's affective states -- in order to become more human-like, more effective, ..."
Abstract
-
Cited by 98 (24 self)
- Add to MetaCart
The ability to recognize affective states of a person... This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence -- the ability to recognize a user's affective states -- in order to become more human-like, more effective, and more efficient. Affective arousal modulates all nonverbal communicative cues (facial expressions, body movements, and vocal and physiological reactions). In a face-to-face interaction, humans detect and interpret those interactive signals of their communicator with little or no effort. Yet design and development of an automated system that accomplishes these tasks is rather difficult. This paper surveys the past work in solving these problems by a computer and provides a set of recommendations for developing the first part of an intelligent multimodal HCI -- an automatic personalized analyzer of a user's nonverbal affective feedback.
Towards detecting emotions in spoken dialogs
- IEEE Transactions on Speech and Audio Processing
, 2005
"... Abstract—The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. This paper explores the detection of domain-specific emotions using language and discourse information in conju ..."
Abstract
-
Cited by 58 (7 self)
- Add to MetaCart
Abstract—The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. This paper explores the detection of domain-specific emotions using language and discourse information in conjunction with acoustic correlates of emotion in speech signals. The specific focus is on a case study of detecting negative and non-negative emotions using spoken language data obtained from a call center application. Most previous studies in emotion recognition have used only the acoustic information contained in speech. In this paper, a combination of three sources of information—acoustic, lexical, and discourse—is used for emotion recognition. To capture emotion information at the language level, an information-theoretic notion of emotional salience is introduced. Optimization of the acoustic correlates of emotion with respect to classification error was accomplished by investigating different feature sets obtained from feature selection, followed by principal component analysis. Experimental results on our call center data show that the best results are obtained when acoustic and language information are combined. Results show that combining all the information, rather than using only acoustic information, improves emotion classification by 40.7 % for males and 36.4 % for females (linear discriminant classifier used for acoustic information). Index Terms—Acoustic correlates, dialog systems, emotion recognition, emotional salience, feature selection, information fusion, principal component analysis, spoken language processing. I.
Emotional speech: Towards a new generation of databases
, 2003
"... Research on speech and emotion is moving from a period of exploratory research into one where there is a prospect of substantial applications, notably in human–computer interaction. Progress in the area relies heavily on the development of appropriate databases. This paper addresses four main issues ..."
Abstract
-
Cited by 57 (9 self)
- Add to MetaCart
Research on speech and emotion is moving from a period of exploratory research into one where there is a prospect of substantial applications, notably in human–computer interaction. Progress in the area relies heavily on the development of appropriate databases. This paper addresses four main issues that need to be considered in developing databases of emotional speech: scope, naturalness, context and descriptors. The state of the art is reviewed. A good deal has been done to address the key issues, but there is still a long way to go. The paper shows how the challenge of developing appropriate databases is being addressed in three major recent projects––the Reading–Leeds project, the Belfast project and the CREST–ESPproject. From these and other studies the paper draws together the tools and methods that have been developed, addresses the problems that arise and indicates the future directions for the development of emotional speech databases.
Analysis of emotion recognition using facial expressions, speech and multimodal information
- in Sixth International Conference on Multimodal Interfaces ICMI 2004
, 2004
"... The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limit ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, and other, modalities to improve the accuracy and robustness of the emotion recognition system. This paper analyzes the strengths and the limitations of systems based only on facial expressions or acoustic information. It also discusses two approaches used to fuse these two modalities: decision level and feature level integration. Using a database recorded from an actress, four emotions were classified: sadness, anger, happiness, and neutral state. By the use of markers on her face, detailed facial motions were captured with motion capture, in conjunction with simultaneous speech recordings. The results reveal that the system based on facial expression gave better performance than the system based on just acoustic information for the emotions considered. Results also show the complementarily of the two modalities and that when these two modalities are fused, the performance and the robustness of the emotion recognition system improve measurably.
Social Signal Processing: Survey of an Emerging Domain
, 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially-aware computing.
Spotting "Hot Spots" in Meetings: Human Judgments and Prosodic Cues
- in Proc. Eurospeech
, 2003
"... Recent interest in the automatic processing of meetings is motivated by a desire to summarize, browse, and retrieve important information from lengthy archives of spoken data. One of the most useful capabilities such a technology could provide is a way for users to locate "hot spots" or regions in w ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Recent interest in the automatic processing of meetings is motivated by a desire to summarize, browse, and retrieve important information from lengthy archives of spoken data. One of the most useful capabilities such a technology could provide is a way for users to locate "hot spots" or regions in which participants are highly involved in the discussion (e.g. heated arguments, points of excitement, etc.). We ask two questions about hot spots in meetings in the ICSI Meeting Recorder corpus. First, we ask whether involvement can be judged reliably by human listeners. Results show that despite the subjective nature of the task, raters show significant agreement in distinguishing involved from non-involved utterances. Second, we ask whether there is a relationship between human judgments of involvement and automatically extracted prosodic features of the associated regions. Results show that there are significant differences in both F0 and energy between involved and non-involved utterances. These findings suggest that humans do agree to some extent on the judgment of hot spots, and that acoustic-only cues could be used for automatic detection of hot spots in natural meetings.
The Relationship between Dialogue Acts and Hot Spots in Meetings
- IN PROC. IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), VIRGIN ISLANDS
, 2003
"... We examine the relationship between hot spots (annotated in terms of involvement) and dialogue acts (DAs, annotated in an independent effort) in roughly 32 hours of speech data from naturallyoccurring meetings. Results reveal that four independentlymotivated involvement categories (non-involved, dis ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
We examine the relationship between hot spots (annotated in terms of involvement) and dialogue acts (DAs, annotated in an independent effort) in roughly 32 hours of speech data from naturallyoccurring meetings. Results reveal that four independentlymotivated involvement categories (non-involved, disagreeing, amused, and other) show statistically significant associations with particular DAs. Further examination shows that involvement is associated with contextual features (such as the speaker or type of meeting), as well as with lexical features (such as utterance length and perplexity). Finally, we found (surprisingly) that perplexities are similar for involved and Non-involved utterances. This suggests that it may not be the amount of propositional content, but rather participants' attitudes toward that content, that differentiates hot spots from other regions in a meeting. Overall, these specific correlations, and their relationships to other features such as perplexity, could provide useful information for the automatic archiving and browsing of natural meetings.
RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA
- IN PROCEEDINGS OF THE WORKSHOP EMBODIED
, 2002
"... In this paper, we describe the Rich Representation Language (RRL) which is used in the NECA system. The NECA system generates interactions between two or more animated characters. The RRL is a formal framework for representing the information that is exchanged at the interfaces between the various N ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
In this paper, we describe the Rich Representation Language (RRL) which is used in the NECA system. The NECA system generates interactions between two or more animated characters. The RRL is a formal framework for representing the information that is exchanged at the interfaces between the various NECA system modules.
Detecting group interest level in meetings
- in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP
, 2005
"... Finding relevant segments in meeting recordings is important for summarization, browsing, and retrieval purposes. In this paper, we define relevance as the interest-level that meeting participants manifest as a group during the course of their interaction (as perceived by an external observer), and ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
Finding relevant segments in meeting recordings is important for summarization, browsing, and retrieval purposes. In this paper, we define relevance as the interest-level that meeting participants manifest as a group during the course of their interaction (as perceived by an external observer), and investigate the automatic detection of segments of high-interest from audio-visual cues. This is motivated by the assumption that there is a relationship between segments of interest to participants, and those of interest to the end user, e.g. of a meeting browser. We first address the problem of human annotation of group interest-level. On a 50-meeting corpus, recorded in a room equipped with multiple cameras and microphones, we found that the annotations generated by multiple people exhibit a good degree of consistency, providing a stable ground-truth for automatic methods. For the automatic detection of high-interest segments, we investigate a methodology based on Hidden Markov Models (HMMs) and a number of audio and visual features. Single- and multi-stream approaches were studied. Using precision and recall as performance measures, the results suggest that the automatic detection of group interest-level is promising, and that while audio in general constitutes the predominant modality in meetings, the use of a multi-modal approach is beneficial. 1.
Emotion in Human-Computer Interaction
, 2002
"... Emotion is a fundamental component of being human. Joy, hate, anger, and pride, among the plethora of other emotions, motivate action and add meaning and richness to virtually all human experience. Traditionally, human-computer interaction has been viewed as the “ultimate ” exception: Users must dis ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Emotion is a fundamental component of being human. Joy, hate, anger, and pride, among the plethora of other emotions, motivate action and add meaning and richness to virtually all human experience. Traditionally, human-computer interaction has been viewed as the “ultimate ” exception: Users must discard their emotional selves to work efficiently and rationality with computers, the quintessentially unemotional artifact. Emotion seemed at best marginally relevant to human-computer interaction―and at worst oxymoronic. Recent research in psychology and technology suggests a very different view of the relationship between humans, computers, and emotion. After a long period of dormancy and confusion, there has been an explosion of research on the psychology of emotion (Gross, 1999). Emotion is no longer seen as limited to the occasional outburst of fury when a computer crashes inexplicably, excitement when a videogame character leaps past an obstacle, or frustration at an incomprehensible error message. It is now understood that a wide range of emotions plays a critical role in every computer-related, goal-directed activity, from developing a 3-D CAD model and running calculations on a spreadsheet, to searching the Web and sending an email, to making an online purchase and playing solitaire. Indeed, many psychologists now argue that it is

