| Herzog, G. and Wazinski, P. (1994) VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artif. Intell. Rev. 8, |
....includes attribute selection, syntactic structures and the visual semantics of words. A second difference is that we take the notion of grounding semantics in sub symbolic representations to be a critical aspect of linking natural language to visual scenes. The Visual Translator system (VITRA) [8] grounds language generation in visual input (dynamic scenes from automobile traffic and soccer games) In contrast to our work, VITRA is not designed as a learning system. Thus porting it to a new domain would presumably be a arduous and labor intensive task. 1.1. The Learning Problems In this ....
Gerd Herzog and Peter Wazinski, "VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions," Artificial Intelligence Review, vol. 8, pp. 175--187, 1994.
....to be a critical aspect of linking natural language to visual scenes. Third, we limit the scope of our work to generating referring expressions based solely on information available in static visual scenes. Thus, discourse history is not used by our system. The Visual Translator system (VITRA) [10] is a natural language generation system which is grounded directly in perceptual input. VITRA generates natural language descriptions of dynamic scenes from multiple domains including automobile tra#c and soccer games. Semantic representations are extracted from video image sequences. Detailed ....
Gerd Herzog and Peter Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review, 8:175-- 187, 1994.
....of the degree of applicability. More formally, Gapp (1994) provides a computational model of functions which define the degree of applicability for a number of basic spatial relations with respect to geometrical object properties. For a more complete overview of the entire Vitra project see Herzog Wazinski (1994). Unlike the Vitra project where the primary concern has been to produce a natural language dialog (or running commentary) of situations occurring in a scene, the Views (Visual Inspection and Evaluation of Wide area Scenes) project (Corrall Hill 1992) concentrates on Visual Surveillance in ....
Herzog, G. & Wazinski, P. (1994), `Visual Translator: Linking perceptions and natural language descriptions', Artificial Intelligence Review 8, 175--187.
....between visual data and linguistic structures have to be considered in the process of language production. 3 The Visual Translator Conception As it is depicted in Fig. 3, the transformation of visual data into a verbal description can roughly be subdivided into three subtasks (see [Herzog Wazinski 94] Starting from a sequence of digitized video frames, the processes on the sensory level concentrate on the recognition and tracking of visible objects. They provide a geometrical reconstruction of the perceived scene. Within the VITRA framework, low level vision is carried out by our ....
G. Herzog und P. Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review, 8(2/3):175--187, 1994.
....become a move in the appearance space for dogs to a Collie model. Words associated 6 There is a large body of relevant work (Herskovits 1986, Herskovits 1985, Abella and Kender 1993, Andre et al. 1987, Olivier and Tsujii 1994, Yamada 1995, Yamada et al. 1992, Gapp and Maa 1994, Herzog 1995, Herzog and Wazinski 1994, Eschenbach et al. 1997, Vandeloise 1991) 7 This is the chief technical point that distinguishes the sense of dynamic used here from work in dynamic semantics (Kamp 1981, Heim 1982, Groenendijk and Stokhof 1991) An obvious analogy can also be drawn between the notion of dynamic VOL predicate ....
Herzog, G., and P. Wazinski. 1994. Visual translator: Linking perceptions and natural language descriptions. Artificial Intelligence Review 8(2/3):175--187. Also available as Universitat des Saarlandes VITRA Bericht Nr. 100.
....phrases. Nagel (1994) Kollnig Nagel (1993) and Kollnig et al. 1995) describe a system which analyzes traffic scenes in image sequences and generates symbolic descriptions of movements in the scene which correspond to motion verbs in natural language. The system of Andre et al. 1988) and Herzog Wazinski (1994) has football scenes as its domain and produces speech output describing movements of players and the ball. Olivier Tsujii (1994) visualize spatial prepositions supporting different reference frames. Other systems are designed to understand spatial prepositions (Gapp, 1994) or emotive ....
Herzog, G. & Wazinski, P. (1994). VIsual TRanslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review Special Issue on Integration of Natural Language and Vision Processing 8(2-3), 83--95.
....the less specific motion verb zip with an accompaning driving (two hands on the wheel) gesture. The rules instatiated by the system are the following: In the unmarked case, distribute semantic features across speech and gesture. That is, look first at what is perceptually salient in the scene (Herzog Wazinski, 1994), and then look at the lexicon of the language for what is likely and able to be marked in language, and what in gesture (see Kita, 1993 and McNeill, this volume) among the salient features. In the marked case, overmark , or add redundance to the expression of concepts by conveying them in ....
Herzog, G. & Wazinski, P. (1994). VIsual TRAnslator: linking perceptions and natural language descriptions.
.... spoken language dialogue system for train schedule inquiries (EFFENDI, Poller Heisterkamp 95] a dialogue system managing negotiations in the used car sales domain (PRACMA, Jameson et al. 94] the natural language description of simultaneously interpreted real world image sequences (VITRA, Herzog Wazinski 94] a natural language interface to an autonomous mobile robot (KANTRA, Langle et al. 95] and a system for explaining machine found proofs (PROVERB, Huang 94] Nevertheless, we have identified several shortcomings of the system, that are worth being examined in future research. Since the ....
G. Herzog and P. Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artifical Intelligence Review, 8(2):175 -- 187, 1994.
....language descriptions, explanations, and queries. An incremental generator, which is based on Tree Adjoining Grammars, generates the surface structures [Harbusch et al. 91] KANTRA is an extension of the VITRA (Visual Translator) system, which allows for natural language access to visual data [Herzog Wazinski 94] A referential semantics has been defined which connects verbal descriptions to visual and geometric information. This approach provides powerful methods to treat the problem of spatial reference. In order to use and understand localization expressions, the interface has to take into account ....
G. Herzog and P. Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review Journal, 8, 1994. Special Volume on the Integration of Natural Language and Vision Processing, edited by P. Mc Kevitt, to appear.
....into natural language descriptions, explanations, and queries. An incremental generator, which is based on Tree Adjoining Grammars, generates the surface structures [24] KANTRA is an extension of the VITRA (Visual Translator) system, which allows for natural language access to visual data [15]. A referential semantics has been defined which connects verbal descriptions to visual and geometric information. This approach provides powerful methods to treat the problem of spatial reference. In order to use and understand localization expressions, the interface has to take into account how ....
Herzog, G.; Wazinski, P.: VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review Journal, 8, 1994, to appear
....between data perception and information presentation) the medium used in presentation. The work described in this paper aims at a multimedia reporting system. Following the paradigm of rapid prototyping, we rely on our previous work in both analysis and interpretation of image sequences (cf. [1, 13]) and generation of multimedia presentations (cf. 12, 14] Short sections of video recordings of soccer games have been chosen as the domain of discourse since they offer interesting possibilitiesfor the automatic interpretation of visual data in a restricted domain. Also, the broad variety of ....
....to building a reporting system is to rely on existing modules for the interpretation of image sequences, and the generation of multimedia presentations. When conceiving our prototype system, called VIPS, we consequently follow this approach when the reuse of modules from our previous systems VITRA [13] and WIP [16] is possible. In the following, we sketch the processing mechanisms of VIPS core modules. 3.1 Image Analysis For technical reasons, we do not directly incorporate a lowlevel vision component for the processing of the camera data into VIPS. Rather, this task is done with the systems ....
G. Herzog and P. Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. In: Mc Kevitt [31]. to appear.
.... [Lobin 92; Nilsson 84; Torrance 94] Other approaches have been concerned with natural language control of autonomous agents within simulated 2D or 3D environments (cf. Badler et al. 91; Chapman 91; Vere Bickmore 90] or with natural language access to visual data (cf. Bajcsy et al. 85; Herzog Wazinski 94; Neumann 89; Wahlster et al. 83] The aim of our joint effort is the integration of the Karlsruhe autonomous mobile robot KAMRO [L uth Rembold 94] and the natural language component VITRA (VIsual TRAnslator) developedin Saarbr ucken [Herzog Wazinski 94] Focused on here is the problem of ....
.... to visual data (cf. Bajcsy et al. 85; Herzog Wazinski 94; Neumann 89; Wahlster et al. 83] The aim of our joint effort is the integration of the Karlsruhe autonomous mobile robot KAMRO [L uth Rembold 94] and the natural language component VITRA (VIsual TRAnslator) developedin Saarbr ucken [Herzog Wazinski 94] Focused on here is the problem of spatial reference. The specific need for generating and understanding localization expressions will be shown. In addition, we will describe how such natural language utterances can be processed taking into account information provided by vision sensors and the ....
G. Herzog and P. Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review, 8(2), 1994.
....from sensory data on the one hand and verbal expressions on the other hand constitutes a prominent issue for natural language access to robot systems. First results concerning the integration of vision and language processing have already been achieved in the context of image sequence analysis, [Herzog and Wazinski, 1994, Neumann, 1989, Wahlster et al. 1983] In addition, the approaches of [Badler et al. 1991, Chapman, 1991, Vere and Bickmore, 1990, Wachsmuth and Cao, 1994] are relevant for verbal man machine interaction, since they consider natural language control of autonomous agents within simulated 2D or ....
G. Herzog and P. Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review, 8(2), 1994.
No context found.
Herzog, G. and Wazinski, P. (1994) VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artif. Intell. Rev. 8,
No context found.
Gerd Herzog and Peter Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review, 8:175--187, 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC