Results 1 - 10
of
19
Learning Smooth, Human-Like Turntaking in Realtime Dialogue
- In Proceedings of Intelligent Virtual Agents (IVA 08
, 2008
"... Abstract. Giving synthetic agents human-like realtime turntaking skills is a challenging task. Attempts have been made to manually construct such skills, with systematic categorization of silences, prosody and other candidate turn-giving signals, and to use analysis of corpora to produce static deci ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Giving synthetic agents human-like realtime turntaking skills is a challenging task. Attempts have been made to manually construct such skills, with systematic categorization of silences, prosody and other candidate turn-giving signals, and to use analysis of corpora to produce static decision trees for this purpose. However, for general-purpose turntaking skills which vary between individuals and cultures, a system that can learn them on-the-job would be best. We are exploring ways to use machine learning to have an agent learn proper turntaking during interaction. We have implemented a talking agent that continuously adjusts its turntaking behavior to its interlocutors based on realtime analysis of the other party’s prosody. Initial results from experiments on collaborative, content-free dialogue show that, for a given subset of turntaking conditions, our modular reinforcement learning techniques allow the system to learn to take turns in an efficient, human-like manner.
Whiteboards: Scheduling blackboards for semantic routing of messages & streams
- In AAAI-05, AAAI technical report
, 2005
"... This paper presents a type of scheduling blackboard called whiteboards. Blackboards can simplify construction of systems with large numbers of heterogeneous components requiring a high number of fine-grained interactions. An increase in systems integration, for example in humanoid robotics and intel ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
This paper presents a type of scheduling blackboard called whiteboards. Blackboards can simplify construction of systems with large numbers of heterogeneous components requiring a high number of fine-grained interactions. An increase in systems integration, for example in humanoid robotics and intelligent environments, has called for better solutions to support multi-module integration. Whiteboards extend the blackboard model in a number of significant ways that allow them to fill this role. Chief among their features are: an explicit temporal model; quality of service; both publish-subscribe and querying for data; both discrete and streaming data using the same API; explicit data wrappers; programming language independence; as well as a number of solutions to practical issues for improving development effort and runtime performance. Whiteboards consist of (i) a general-purpose message type format, (ii) ontologically-defined message and data stream types, and (iii) specifications for routing between system components. Whiteboards thus provide a development tool especially relevant for simulations of complex natural systems where symbolic data meets raw numerical data; systems with ill-defined boundaries between sub-systems; and systems where the number of component states and interactions is considered to be relatively large. Examples include computer vision, speech recognition and robotics, ecosystems and biological systems. This paper describes the main constituents of whiteboards and their use.
Modeling Multimodal Communication as a Complex System
- MODELING COMMUNICATION WITH ROBOTS AND VIRTUAL HUMANS
, 2008
"... The overall behavior and nature of complex natural systems is in large part determined by the number and variety of the mechanisms involved – and the complexity of their interactions. Embodied natural communication belongs to this class of systems, encompassing many cognitive mechanisms that intera ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
(Show Context)
The overall behavior and nature of complex natural systems is in large part determined by the number and variety of the mechanisms involved – and the complexity of their interactions. Embodied natural communication belongs to this class of systems, encompassing many cognitive mechanisms that interact in highly complex ways, both within and between communicating individuals, constituting a heterogeneous, large, densely-coupled system (HeLD). HeLDs call for finer model granularity than other types of systems, lest we risk them to be not only incomplete but likely incorrect. Consequently, models of communication must encompass a large subset of the functions and couplings that make up the real system, calling for a powerful methodology for integrating information from multiple fields and for producing runnable models. In this paper I propose such an approach, abstract module hierarchies, that leverages the benefits of modular construction without forcing modularity on the phenomena being modeled.
Achieving Artificial General Intelligence Through Peewee Granularity
"... The general intelligence of any autonomous system must in large part be measured by its ability to automatically learn new skills and integrate these with prior skills. Cognitive architectures addressing these topics are few and far between – possibly because of their difficulty. We argue that archi ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
(Show Context)
The general intelligence of any autonomous system must in large part be measured by its ability to automatically learn new skills and integrate these with prior skills. Cognitive architectures addressing these topics are few and far between – possibly because of their difficulty. We argue that architectures capable of diverse skill acquisition and integration, and real-time management of these, require an approach of modularization that goes well beyond the current practices, leading to a class of architectures we refer to as peewee-granule systems. The building blocks (modules) in such systems have simple operational semantics and result in architectures that are heterogeneous at the cognitive level but homogeneous at the computational level.
A Granular Architecture for Dynamic Realtime Dialogue
- Intelligent Virtual Agents (IVA
, 2008
"... Abstract. We present a dialogue architecture that addresses perception, planning and execution of multimodal dialogue behavior. Motivated by realtime human performance and modular architectural principles, the architecture is full-duplex (“open-mic”); prosody is continuously analyzed and used for mi ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
(Show Context)
Abstract. We present a dialogue architecture that addresses perception, planning and execution of multimodal dialogue behavior. Motivated by realtime human performance and modular architectural principles, the architecture is full-duplex (“open-mic”); prosody is continuously analyzed and used for mixed-control turntaking behaviors (reactive and deliberative) and incremental utterance production. The architecture is fine-grain and highly expandable; we are currently applying it in more complex multimodal interaction and dynamic task environments. We describe here the theoretical underpinnings behind the architecture, compare it to prior efforts, discuss the methodology and give a brief overview of its current runtime characteristics.
Spontaneous Avatar Behavior for Human Territoriality
- Proceedings of the 9th International Conference on Intelligent Virtual Agents
, 2009
"... This paper presents a new approach for generating believable social behavior in avatars. The focus is on human territorial behaviors during social interactions, such as during conversations and gatherings. Driven by theories on human territoriality, we define a reactive framework which allows avatar ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper presents a new approach for generating believable social behavior in avatars. The focus is on human territorial behaviors during social interactions, such as during conversations and gatherings. Driven by theories on human territoriality, we define a reactive framework which allows avatar group dynamics during social interaction. We model the territorial dynamics of social interactions as a set of social norms which constrain the avatar’s reactive motion by running a set of behaviors which blend together. The resulting social group behavior appears relatively robust, but perhaps more importantly, it starts to bring a new sense of relevance and continuity to virtual bodies that often get left behind when social situations are simulated. We carried out an evaluation of the technology and the result confirms the validity of our approach. 1
Two approaches to a plug-and-play vision architecture—CAVIAR and psyclone
- In Proceedings of AAAI Workshop on Modular Construction of Human-Like Intelligence
, 2005
"... This paper compares two solutions for human-like perception using two different modular “plug-and-play” frameworks, CAVIAR (List et al, 2005) and Psyclone (Thórisson et al, 2004, 2005a). Each uses a central point of configuration and requires the modules to be auto-descriptive, auto-critical and aut ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
This paper compares two solutions for human-like perception using two different modular “plug-and-play” frameworks, CAVIAR (List et al, 2005) and Psyclone (Thórisson et al, 2004, 2005a). Each uses a central point of configuration and requires the modules to be auto-descriptive, auto-critical and auto-regulative (Crowley and Reignier, 2003) for fully autonomous configuration of processing and dataflow. This allows new modules to be added to or removed from the system with minimal reconfiguration. CAVIAR uses a centralised global controller (Bins et al, 2005) whereas Psyclone supports a fully distributed control architecture. We implemented a computer vision-based human behaviour tracker for public scenes in the two frameworks. CAVIAR’s global controller uses offline learned knowledge to regulate module parameters and select between competing results whereas in Psyclone dynamic multi-level control modules adjust parameters, data and process flow. Each framework results in two very different solutions to control issues such as dataflow regulation and module substitution. However, we found that both frameworks allow easy incremental development of modular architectures with increasingly complex functionality. Their main differences lie in runtime efficiency and module interface semantics.
Teaching computers to conduct spoken interviews: Breaking the realtime barrier with learning
- In IVA ’09: Proceedings of the 9th international conference on Intelligent Virtual Agents
, 2009
"... Abstract. Several challenges remain in the effort to build software capable of conducting realtime dialogue with people. Part of the problem has been a lack of realtime flexibility, especially with regards to turntaking. We have built a system that can adapt its turntaking behavior in natural dialog ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Several challenges remain in the effort to build software capable of conducting realtime dialogue with people. Part of the problem has been a lack of realtime flexibility, especially with regards to turntaking. We have built a system that can adapt its turntaking behavior in natural dialogue, learning to minimize unwanted interruptions and “awkward silences”. The system learns this dynamically during the interaction in less than 30 turns, without special training sessions. Here we describe the system and its performance when interacting with people in the role of an interviewer. A prior evaluation of the system included 10 interactions with a single artificial agent (a non-learning version of itself); the new data consists of 10 interaction sessions with 10 different humans. Results show performance to be close to a human’s in natural, polite dialogue, with 20 % of the turn transitions taking place in under 300 msecs and 60 % under 500 msecs. The system works in real-world settings, achieving robust learning in spite of noisy data. The modularity of the architecture gives it significant potential for extensions beyond the interview scenario described here.
A Software Framework for Multimodal Human- Computer Interaction Systems
"... Abstract—This paper describes a software framework we designed and implemented for the development and research in the area of multimodal human-computer interface. The proposed framework is based on publish / subscribe architecture, which allows developers and researchers to conveniently configure, ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract—This paper describes a software framework we designed and implemented for the development and research in the area of multimodal human-computer interface. The proposed framework is based on publish / subscribe architecture, which allows developers and researchers to conveniently configure, test and expand their system in a modular and incremental manner. In order to achieve reliable and efficient data transport between modules while still providing a high degree of system flexibility, the framework uses a shared-memory based data transport protocol for message delivery together with a TCP based system management protocol to maintain the integrity of system structure at runtime. The framework is delivered as a communication middleware, providing a basic system manager and well-documented, easy-to-use and open source C++ SDKs supporting both module development and server extension. The experimental comparison between the proposed framework and other similar tools available to the community indicates that our framework greatly outperforms the others in terms of average message latency, maximum data throughput and CPU consumption level, especially in heavy workload scenarios. To demonstrate the performance of our framework in real world applications, we have built a demo system which is used to detect faces and facial feature points in real-time captured video. The result shows our framework is capable of delivering some tens of megabytes of data per second effectively and efficiently even under tight resource constraint.
HCI-2 workbench: A development tool for multimodal human-computer interaction systems
- in Proceedings of the International Workshop on Facial and Bodily Expressions for Control and Adaptation of Games (ECAG’11
, 2011
"... Abstract—In this paper, we present a novel software tool designed and implemented to simplify the development process of Multimodal Human-Computer Interaction (MHCI) systems. This tool, which is called the HCI^2 Workbench, exploits a Publish / Subscribe (P/S) architecture [13] [14] to facilitate eff ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper, we present a novel software tool designed and implemented to simplify the development process of Multimodal Human-Computer Interaction (MHCI) systems. This tool, which is called the HCI^2 Workbench, exploits a Publish / Subscribe (P/S) architecture [13] [14] to facilitate efficient and reliable inter-module data communication and runtime system management. In addition, through a combination of SDK, software tools, and standardized description / configuration file semantics, the HCI^2 Workbench provides an easy-to-follow procedure for developing highly flexible and reusable modules. Moreover, the HCI^2 Workbench features system persistence and portability by using standardized module packaging method and system configuration files. Last but not least, usability was another major concern. Unlike other similar tool, including Psyclone [9] and ActiveMQ [10], the HCI^2 Workbench provides a complete graphical environment to support every step in a typical MHCI system development process, including module program development and debugging, module packaging, module management, system configuration, module and system testing, in a convenient and intuitive manner. To help demonstrating the HCI^2 Workbench, we also present a readily-applicable system developed using our tool. This open-source demo system, which is called the CamGame, consists of an interactive system allowing users to play a computer game using hand-held marker(s) and low-cost camera(s) instead of keyboard and mouse.