Results 1 - 10
of
40
Emotional speech: Towards a new generation of databases
, 2003
"... Research on speech and emotion is moving from a period of exploratory research into one where there is a prospect of substantial applications, notably in human–computer interaction. Progress in the area relies heavily on the development of appropriate databases. This paper addresses four main issues ..."
Abstract
-
Cited by 116 (12 self)
- Add to MetaCart
Research on speech and emotion is moving from a period of exploratory research into one where there is a prospect of substantial applications, notably in human–computer interaction. Progress in the area relies heavily on the development of appropriate databases. This paper addresses four main issues that need to be considered in developing databases of emotional speech: scope, naturalness, context and descriptors. The state of the art is reviewed. A good deal has been done to address the key issues, but there is still a long way to go. The paper shows how the challenge of developing appropriate databases is being addressed in three major recent projects––the Reading–Leeds project, the Belfast project and the CREST–ESPproject. From these and other studies the paper draws together the tools and methods that have been developed, addresses the problems that arise and indicates the future directions for the development of emotional speech databases.
Adaptive interfaces and agents
, 2003
"... As its title suggests, this chapter covers a broad range of in-teractive systems. But they all have one idea in common: that it can be worthwhile for a system to learn something about each individual user and adapt its behavior to them in ..."
Abstract
-
Cited by 101 (10 self)
- Add to MetaCart
(Show Context)
As its title suggests, this chapter covers a broad range of in-teractive systems. But they all have one idea in common: that it can be worthwhile for a system to learn something about each individual user and adapt its behavior to them in
Emotional speech recognition: resources, features, and methods
- Speech Communication
, 2006
"... In this paper we overview emotional speech recognition having in mind three goals. The first goal is to provide an up-to-date record of the available emotional speech data collections. The number of emotional states, the language, the number of speakers, and the kind of speech are briefly addressed. ..."
Abstract
-
Cited by 90 (4 self)
- Add to MetaCart
(Show Context)
In this paper we overview emotional speech recognition having in mind three goals. The first goal is to provide an up-to-date record of the available emotional speech data collections. The number of emotional states, the language, the number of speakers, and the kind of speech are briefly addressed. The second goal is to present the most frequent acoustic features used for emotional speech recognition and to assess how the emotion affects them. Typical features are the pitch, the formants, the vocal-tract cross-section areas, the mel-frequency cepstral coefficients, the Teager energy operator-based features, the intensity of the speech signal, and the speech rate. The third goal is to review appropriate techniques in order to classify speech into emotional states. We examine separately classification techniques that exploit timing information from which that ignore it. Classification techniques based
StressSense: Detecting Stress in Unconstrained Acoustic Environments using Smartphones
- In UbiComp ’12. ACM
, 2012
"... Stress can have long term adverse effects on individuals’ physical and mental well-being. Changes in the speech production process is one of many physiological changes that happen during stress. Microphones, embedded in mobile phones and carried ubiquitously by people, provide the opportunity to con ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
(Show Context)
Stress can have long term adverse effects on individuals’ physical and mental well-being. Changes in the speech production process is one of many physiological changes that happen during stress. Microphones, embedded in mobile phones and carried ubiquitously by people, provide the opportunity to continuously and non-invasively monitor stress in real-life situations. We propose StressSense for unobtrusively recognizing stress from human voice using smartphones. We investigate methods for adapting a one-size-fitsall stress model to individual speakers and scenarios. We demonstrate that the StressSense classifier can robustly identify stress across multiple individuals in diverse acoustic environments: using model adaptation StressSense achieves 81 % and 76 % accuracy for indoor and outdoor environments, respectively. We show that StressSense can be implemented on commodity Android phones and run in real-time. To the best of our knowledge, StressSense represents the first system to consider voice based stress detection and model adaptation in diverse real-life conversational situations using smartphones.
Gesture-based affective computing on motion capture data
- IN 1ST INT. CONF. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII'2005
, 2005
"... This paper presents research using full body skeletal movements captured using video-based sensor technology developed by Vicon Motion Systems, to train a machine to identify different human emotions. The Vicon system uses a series of 6 cameras to capture lightweight markers placed on various poin ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
This paper presents research using full body skeletal movements captured using video-based sensor technology developed by Vicon Motion Systems, to train a machine to identify different human emotions. The Vicon system uses a series of 6 cameras to capture lightweight markers placed on various points of the body in 3D space, and digitizes movement into x, y, and z displacement data. Gestural data from five subjects was collected depicting four emotions: sadness, joy, anger, and fear. Experimental results with different machine learning techniques show that automatic classification of this data ranges from 84 % to 92% depending on how it is calculated. In order to put these automatic classification results into perspective a user study on the human perception of the same data was conducted with average classification accuracy of 93%.
UTDrive: driver behavior and speech interactive systems for invehicle environments. In:
- IEEE Intelligent Vehicles Symposium,
, 2007
"... Abstract-This paper describes an overview of the UTDrive project. UTDrive is part of an on-going international collaboration to collect and research rich multi-modal data recorded for modeling driver behavior for in-vehicle environments. The objective of the UTDrive project is to analyze behavior w ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
(Show Context)
Abstract-This paper describes an overview of the UTDrive project. UTDrive is part of an on-going international collaboration to collect and research rich multi-modal data recorded for modeling driver behavior for in-vehicle environments. The objective of the UTDrive project is to analyze behavior while the driver is interacting with speech-activated systems or performing common secondary tasks, as well as to better understand speech characteristics of the driver undergoing additional cognitive load. The corpus consists of audio, video, gas/brake pedal pressure, forward distance, GPS information, and CAN-Bus information. The resulting corpus, analysis, and modeling will contribute to more effective speech interactive systems with are less distractive and adjustable to the driver's cognitive capacity and driving situations.
Automatic Affective Feedback in an Email Browser
- In MIT Media Lab Software Agents Group
, 2002
"... This paper demonstrates a new approach to recognizing and presenting the affect of text. The approach starts with a corpus of 400,000 responses to questions about everyday life in Open Mind Common Sense. This so-called commonsense knowledge is the basis of a textual affect sensing engine. The engine ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
This paper demonstrates a new approach to recognizing and presenting the affect of text. The approach starts with a corpus of 400,000 responses to questions about everyday life in Open Mind Common Sense. This so-called commonsense knowledge is the basis of a textual affect sensing engine. The engine dynamically analyzes a user’s text and senses broad affective qualities of the story at the sentence level. This paper shows how a commonsense affect model was constructed and incorporated into Chernov face style feedback in an affectively responsive email browser called EmpathyBuddy. This experimental system reacts to sentences as they are typed. It is robust enough that it is being used to send email. The response of the few dozen people that have typed into it is dramatically enthusiastic. This paper debuts a new style of user interface technique for creating intelligent responses. Instead of relying on specialized handcrafted knowledge bases this approach relies on a generic commonsense repository. Instead of relying on linguistic or statistical analysis alone to “understand ” the affect of text, it relies on a small society of approaches based on the commonsense repository.
A Review of Emotional Speech Databases
"... Abstract. Thirty-two emotional speech databases are reviewed. Each database consists of a corpus of human speech pronounced under different emotional conditions. A basic description of each database and its applications is provided. The conclusion of this study is that automated emotion recognition ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Thirty-two emotional speech databases are reviewed. Each database consists of a corpus of human speech pronounced under different emotional conditions. A basic description of each database and its applications is provided. The conclusion of this study is that automated emotion recognition cannot achieve a correct classification that exceeds 50 % for the four basic emotions, i.e., twice as much as random selection. Second, natural emotions cannot be easily classified as simulated ones (i.e., acting) can be. Third, the most common emotions searched for in decreasing frequency of appearance are anger, sadness, happiness, fear, disgust, joy, surprise, and boredom.
A State of the Art Review on Emotional Speech Databases
"... Thirty-two emotional speech databases are reviewed. Each database consists of a corpus of human speech pronounced under different emotional conditions. A basic description of each database and its applications is provided. The conclusion of this study is that automated emotion recognition on these d ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
Thirty-two emotional speech databases are reviewed. Each database consists of a corpus of human speech pronounced under different emotional conditions. A basic description of each database and its applications is provided. The conclusion of this study is that automated emotion recognition on these databases cannot achieve a correct classification that exceeds 50 % for the four basic emotions, i.e., twice as much as random selection. Second, natural emotions cannot be easily classified as simulated ones (i.e., acting) can be. Third, the most common emotions searched for in decreasing frequency of appearance are anger, sadness, happiness, fear, disgust, joy, surprise, and boredom.
Assessment of a User’s Time Pressure and Cognitive Load on the Basis of Features of Speech
"... Abstract. One of the central questions addressed in the project READY was that of how a system can automatically recognize situationally determined resource limitations of its user—in particular, time pressure and cognitive load. This chapter summarizes most of the work done in READY on this topic, ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Abstract. One of the central questions addressed in the project READY was that of how a system can automatically recognize situationally determined resource limitations of its user—in particular, time pressure and cognitive load. This chapter summarizes most of the work done in READY on this topic, presenting as well some previously unpublished results. We first consider why on-line recognition or resource limitations can be useful by discussing the ways in which a system might adapt its behavior to perceived resource limitations. We then summarize a number of approaches to the recognition problem that have been taken in READY and other projects, before focusing on one particular approach: the analysis of features of a user’s speech. In each of two similarly structured experiments, we created four experimental conditions that varied in terms of whether the user was (a) required to produce spoken utterances quickly or not; and (b) navigating within a simulated airport terminal or standing still. In the second experiment, additional distraction was caused by continuous loudspeaker announcements. The speech produced by the experimental subjects (32 in each experiment) was coded in terms of 7 variables. We report on the extent to which each of these variables was influenced by the subjects ’ resource limitations. We also trained dynamic Bayesian networks on the resulting data in order to see how well the information in the users ’ speech could serve as evidence as to which condition the user had been in. The results yield information about the accuracy that can be attained in this way and about the diagnostic value of some specific features of speech. 1