Results 1 - 10
of
112
QuickSet: Multimodal Interaction for Distributed Applications
, 1997
"... This paper presents an emerging application of multimodal interface research to distributed applications. We have developed the QuickSet prototype, a pen/voice system running on a hand-held PC, communicating via wireless LAN through an agent architecture to a number of systems, including NRaD's ..."
Abstract
-
Cited by 289 (35 self)
- Add to MetaCart
This paper presents an emerging application of multimodal interface research to distributed applications. We have developed the QuickSet prototype, a pen/voice system running on a hand-held PC, communicating via wireless LAN through an agent architecture to a number of systems, including NRaD's LeatherNet system, a distributed interactive training simulator built for the US Marine Corps. The paper describes the overall system architecture, a novel multimodal integration strategy offering mutual compensation among modalities, and provides examples of multimodal simulation setup. Finally, we discuss our applications experience and evaluation.
Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directions
, 2000
"... The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of humancomputer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applic ..."
Abstract
-
Cited by 150 (15 self)
- Add to MetaCart
The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of humancomputer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, to be usable by a broader spectrum of the average population, and to function more reliably under realistic and challenging usage conditions. In this paper, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner--- including early and late fusion approaches, and the new hybrid symbolic/statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multi-person use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems. Multimodal Speech and Gesture Interfaces 3 CONT...
Mutual Disambiguation of Recognition Errors in a Multimodal Architecture
, 1999
"... As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that s ..."
Abstract
-
Cited by 143 (12 self)
- Add to MetaCart
As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a...
Multimodal Interfaces That Process What Comes Naturally
- Communications of the ACM
, 2000
"... this article, we summarize the nature of new multimodal systems and how they work, with a focus on multimodal speech and pen-based systems. The primary reasons for building multimodal systems are outlined, including expansion of the accessibility of computing for diverse users, support for new forms ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
(Show Context)
this article, we summarize the nature of new multimodal systems and how they work, with a focus on multimodal speech and pen-based systems. The primary reasons for building multimodal systems are outlined, including expansion of the accessibility of computing for diverse users, support for new forms of computing not available in the past, enhancement of performance stability and robustness, and improved expressive 3
Creating Interactive Virtual Humans: Some Assembly Required
- IEEE INTELLIGENT SYSTEMS
, 2002
"... ..."
Unification-based Multimodal Parsing
- In COLING/ACL
, 1998
"... In order to realize their full potential, multimodal systems need to support not just input from multiple modes, but also synchronized integration of modes. Johnston et al (1997) model this integration using a unification operation over typed feature structures. This is an effective solution for a b ..."
Abstract
-
Cited by 84 (4 self)
- Add to MetaCart
In order to realize their full potential, multimodal systems need to support not just input from multiple modes, but also synchronized integration of modes. Johnston et al (1997) model this integration using a unification operation over typed feature structures. This is an effective solution for a broad class of systems, but limits multimodal utterances to combinations of a single spoken phrase with a single gesture. We show how the unification-based approach can be scaled up to provide a full multimodal grammar formalism. In conjunction with a multidimensional chart parser, this approach supports integration of multiple elements distributed across the spatial, temporal, and acoustic dimensions of multimodal interaction. Integration strategies are stated in a high level unification-based rule formalism supporting rapid prototyping and iterative development of multimodal systems. 1 Introduction Multimodal interfaces enable more natural and efficient interaction between humans and mach...
Coordination and Context-Dependence in the Generation of Embodied Conversation
, 2000
"... We describe the generation of communicative actions in an implemented embodied conversational agent. Our agent plans each utterance so that mul- tiple communicative goals may be realized opportunistically by a composite action including not only speech but also coverbat gesture that fits the con- te ..."
Abstract
-
Cited by 68 (20 self)
- Add to MetaCart
We describe the generation of communicative actions in an implemented embodied conversational agent. Our agent plans each utterance so that mul- tiple communicative goals may be realized opportunistically by a composite action including not only speech but also coverbat gesture that fits the con- text and the ongoing speech in ways representative of natural human conversation. We accomplish this by reasoning from a grammar which describes ges- ture declaratively in terms of its discourse function, semantics and synchrony with speech.
Finite-state multimodal parsing and understanding
- In Proceedings of COLING 2000
, 2000
"... Multimodal interfaces require effective parsing and understanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies for multimodal integration are stated declaratively using a unification-based grammar that is used by a mul ..."
Abstract
-
Cited by 62 (14 self)
- Add to MetaCart
(Show Context)
Multimodal interfaces require effective parsing and understanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies for multimodal integration are stated declaratively using a unification-based grammar that is used by a multidimensional chart parser to compose inputs. This approach is highly expressive and supports a broad class of interfaces, but offers only limited potential for mutual compensation among the input modes, is subject to significant concerns in terms of computational complexity, and complicates selection among alternative multimodal interpretations of the input. In this paper, we present an alternative approach in which multimodal parsing and understanding are achieved using a weighted finite-state device which takes speech and gesture streams as inputs and outputs their joint interpretation. This approach is significantly more efficient, enables tight-coupling of multimodal understanding with speech recognition, and provides a general probabilistic framework for multimodal ambiguity resolution. 1
Multimodal Integration - A Statistical View
- IEEE Transactions on Multimedia
, 1999
"... This paper presents a statistical approach to developing multimodal recognition systems and, in particular, to integrating the posterior probabilities of parallel input signals involved in the multimodal system. We first identify the primary factors that influence multimodal recognition performance ..."
Abstract
-
Cited by 60 (11 self)
- Add to MetaCart
(Show Context)
This paper presents a statistical approach to developing multimodal recognition systems and, in particular, to integrating the posterior probabilities of parallel input signals involved in the multimodal system. We first identify the primary factors that influence multimodal recognition performance by evaluating the multimodal recognition probabilities. We then develop two techniques, an estimate approach and a learning approach, which are designed to optimize accurate recognition during the multimodal integration process. We evaluate these methods using Quickset, a speech/gesture multimodal system, and report evaluation results based on an empirical corpus collected with Quickset. From an architectural perspective, the integration technique presented here offers enhanced robustness. It also is premised on more realistic assumptions than previous multimodal systems using semantic fusion. From a methodological standpoint, the evaluation techniques that we describe provide a valuable too...