23 citations found. Retrieving documents...
Hauptmann, A. and Smith, M. "Text, Speech, and Vision for Video Segmentation: The Informedia Project." In AAAI Symposium on Computational Models for Integrating Language and Vision, Cambridge, MA, Nov. 10--12, 1995.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Simplifying Video Editing with Intelligent Interaction - Juan Casares Brad   (Correct)

....has over 2,000 hours of material, including documentaries and news broadcasts. Informedia adds two hours of additional news material every day. For all of its video content, Informedia creates a textual transcript of the audio track using closed captioning information and speech recognition [7, 10]. The transcript is time aligned with the video using CMU s Sphinx speech recognition system [14] It is important to note that no other system described here uses metadata derived from the audio track. Informedia also performs image analysis to detect shot boundaries and extracts representative ....

Hauptmann, A. and Smith, M. "Text, Speech, and Vision for Video Segmentation: The Informedia Project," in AAAI Symposium on Computational Models for Integrating Language and Vision. 1995.


A Self-Adaptive Semantic Schema Mechanism for Multimedia.. - Yang, Li, Zhuang   (Correct)

....types of media. Substantial modifications and extensions can be found in both relational databases (e.g. STARBURST system [4] and object oriented databases (e.g. 10] 12] as an effort to accommodate multimedia features . Some emerging multimedia systems, such as QBIC [2] and Informedia [5], manifest certain characteristics of a multimedia database. Nevertheless, even till now there is no widely accepted definition of multimedia database and its architecture, and many relevant problems have not been successfully addressed. Object oriented data modeling is an established technique ....

Hauptmann, A., Smith, M., "Text, speech, and vision for video segmentation: The Informedia project", AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision, 1995.


Multimedia Content Analysis Using Both Audio and Visual Cues - Wang, Liu, Huang (2000)   (17 citations)  (Correct)

....a companion project Name It. Segmentation into Story Units: Each video document is partitioned into story units. This is done manually in the current running version of the system. However, various automatic segmentation schemes have been developed and evaluated. The approach described in [50] makes use of text, audio, and image information jointly. In [50] video paragraph refers to a story unit, and scene actually refers to a shot. When closed caption is available, text markers such as punctuation are used to identify story segments. Otherwise, silence periods are detected ....

....Each video document is partitioned into story units. This is done manually in the current running version of the system. However, various automatic segmentation schemes have been developed and evaluated. The approach described in [50] makes use of text, audio, and image information jointly. In [50], video paragraph refers to a story unit, and scene actually refers to a shot. When closed caption is available, text markers such as punctuation are used to identify story segments. Otherwise, silence periods are detected based on the audio sample energy. Either approach yields boundaries of ....

[Article contains additional citation context not shown here]

A.G. Hauptman and M.A. Smith, "Text, speech and vision for video segmentation: the informedia project," in Proc. AAAI Fall Symp. Computational Models for Integrating Language and Vision, Boston, MA, Nov 10-12, 1995.


A Fully Automated Content-Based Video Search.. - Chang, Chen.. (1998)   (50 citations)  (Correct)

....stream. Many retrieval systems, such as PhotoBook, VisualSEEk, and Virage, share this paradigm, but only support still image retrieval. Video retrieval systems should evolve toward a systematic integration of all available media such as audio, video, and captions. While video engines such as [15], 14] 31] and [24] attempt such an integration, much research on the representation and analysis of each of these different media remains to be done. Those that concentrate on the visual media alone fall into two distinct categories: query by example (QBE) visual sketches. In the ....

A. G. Hauptmann and M. Smith, "Text, speech and vision for video segmentation: The informedia project," presented at the AAAI Fall Symp., Computational Models for Integrating Language and Vision, Boston, MA, Nov. 1995.


A Multi-View Intelligent Editor for Digital Video.. - Myers, Casares.. (2001)   (2 citations)  (Correct)

....6.3 Transcript View An important innovation in the Silver video editor, which is enabled by Informedia, is the provision of a textual transcript of the video. This is displayed in a conventional texteditor like window (see Figure 4) Informedia generates the transcript from various sources [16]. If the video provides closed captioning, then this is used. Speech recognition is used to recognize other speech, and also to align the transcript with the place in the audio track that each word appears [7] Because the recognition can contain mistakes, Silver inserts green s where there ....

Hauptmann, A. and Smith, M. "Text, Speech, and Vision for Video Segmentation: The Informedia Project," in AAAI Symposium on Computational Models for Integrating Language and Vision. 1995. A Multi-View Intelligent Editor for Digital Video Libraries - 10 - Submitted for Publication


Use of Transforms for Indexing in Audio Databases - Subramanya, Youssef..   (Correct)

....data, good indices which characterize the data features well, while still being of low dimensionality are desirable, which facilitate accurate and fast retrievals. There has been considerable attention devoted to the problem of content based image indexing ( 1, 2, 3, 4, 5] and video indexing ([6, 7, 8, 9, 10]) but comparatively little to content based audio indexing. Although tremendous amount of work has been done in the analysis, synthesis, recognition, coding, and transmission of speech data, and the analysis and synthesis of music, the on line storage of digital audio data in computers and their ....

A.Hauptmann and M.A.Smith `Text, Speech and Vision for Video Segmentation: The Informedia Project'.


Columbia's VoD and Multimedia Research Testbed.. - Chang.. (1996)   (3 citations)  (Correct)

.... For example, dedicated stor 2 age architectures for real time multi access have been studied in [9, 10, 11, 12, 13] Systematic approaches to the design of video servers (VS) are reported in [14, 15, 16, 17] Innovative methods for indexing searching images by image contents were addressed in [4, 5, 6, 7, 8]. In addition, many field trials of VOD services using proprietary high performance VS technologies have made news headlines. Lastly, a major international forum, DAVIC, has been active in specifying standards for critical protocols and interfaces for achieving interoperability between various ....

A. G. Hauptmann and M. Smith, "Text, Speech and Vision for Video Segmentation: The Informedia Project," AAAI Fall Symposium, Computational Models for Integrating Language and Vision, Boston, November 10-12, 1995.


Columbia's VoD and Multimedia Research Testbed.. - Chang.. (1997)   (3 citations)  (Correct)

....fields as well. For example, dedicated storage architectures for real time multi access have been studied in [8, 9, 10] Systematic approaches to the design of video servers (VS) are reported in [11, 12, 13] Innovative methods for indexing searching images by image contents were addressed in [4, 5, 6, 7]. In addition, many field trials of VoD services using proprietary high performance VS technologies have made news headlines. Lastly, a major internaJournal on Multimedia Tools and Applications, Special Issue on Video on Demand, Kluwer Academic Publishers, 2 tional forum, DAVIC, has been active ....

A. G. Hauptmann and M. Smith, "Text, Speech and Vision for Video Segmentation: The Informedia Project," AAAI Fall Symposium, Computational Models for Integrating Language and Vision, Boston, November 10-12, 1995.


Self-Describing Schemes for Interoperable MPEG-7.. - Paek, Benitez, Chang   (Correct)

.... edit data about the image video we get from the content provider (e.g. stock video companies) The descriptions that are used by current state of the art image video search engines can be viewed primarily as falling into the class of visual feature description [6] 8] and semantic description [9][17] 18] 22] The visual feature based approach has been to obtain and utilize discriminants (features) that are useful in conducting similarity queries for visual information. Recent efforts have focused on a few specific visual dimensions such as color, texture, shape, motion and spatial ....

A. G. Hauptmann and M. Smith, "Text, Speech and Vision for Video Segmentation: The Informedia Project", AAAI Fall Symposium, Computational Models for Integrating Language and Vision, Boston, MA, Nov. 1995.


Self-Describing Schemes for Interoperable MPEG-7.. - Paek, Benitez, Chang (1999)   (Correct)

.... edit data about the image video we get from the content provider (e.g. stock video companies) The descriptions that are used by current state of the art image video search engines can be viewed primarily as falling into the class of visual feature description [6] 8] and semantic description [9][17] 18] 22] The visual feature based approach has been to obtain and utilize discriminants (features) that are useful in conducting similarity queries for visual information. Recent efforts have focused on a few specific visual dimensions such as color, texture, shape, motion and spatial ....

A. G. Hauptmann and M. Smith, "Text, Speech and Vision for Video Segmentation: The Informedia Project", AAAI Fall Symposium, Computational Models for Integrating Languageand Vision, Boston, MA, Nov. 1995.


Acoustic Segmentation for Audio Browsers - Kimber, Wilcox (1996)   (15 citations)  (Correct)

....time consuming, and methods of producing indices automatically or semi automatically are desirable. There are a number of ways of generating useful indices automatically, such as keyword spotting [19] detection of regions of emphatic speech [4] and alignment of speech with textual transcription [7]. Another method is based on segmentation of audio into regions corresponding to different speakers or acoustic classes [20] In this paper, we review this method and describe our experiences with its use on various types of audio recordings. We also describe a graphical audio tool which both ....

A. Hauptmann, and M. Smith, "Text, Speech and Vision for Video Segmentation: The Informedia Project," Proc. AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision, 1995.


Visual Information Retrieval from Large Distributed.. - Chang, Smith, Beigi.. (1997)   (24 citations)  (Correct)

....from humans or supporting data to index the visual information in a direct, semantic way. Video icons are generated via manual annotations of objects (e.g. people, boat) or semantic events (e.g. sunsets) in [7] Textual indexes are generated from the captions and transcripts of broadcast video [8,9,10] for news video retrieval. A complementary function with visual search is summarization. Scene based techniques are used in efficient browsing interfaces and event detection and clustering [11,12] Video analysis techniques are used to construct mosaic images for efficient browsing and indexing ....

A. G. Hauptmann and M. Smith, "Text, Speech and Vision for Video Segmentation: The Informedia Project," AAAI Fall Symposium, Computational Models for Integrating Languageand Vision, Boston, November 10-12, 1995.


Wavelet-based Indexing of Audio Data in Audio/Multimedia.. - Subramanya, Youssef (1998)   (1 citation)  (Correct)

....and use of multimedia data such as video, audio, and images in computer applications and this is expected to continue at an accelerated pace in the near future. There has been considerable attention devoted to the problem of content based image indexing ( 1, 2, 3, 4, 5] and video indexing ([6, 7, 8, 9, 10]) and very little to content based audio indexing. Although tremendous amount of work has been done in the analysis, synthesis, recognition, coding, and transmission of speech data, and analysis and synthesis of music, the on line storage of digital audio data in com This research was ....

A.Hauptmann and M.A.Smith `Text, Speech and Vision for Video Segmentation: The Informedia Project'.


Visual Information Retrieval from Large Distributed.. - Chang, Smith, Beigi.. (1997)   (24 citations)  Self-citation (Smith)   (Correct)

....data to better index the visual information. For example, in [5] video icons are generated by a manual process for annotating objects (e.g. people, boat) and semantic events (e.g. sunsets) in videos. Textual indexes have also been generated from the captions and transcripts of broadcast video [6] for news video retrieval. 2 Complementary to visual search is visual summarization. By decomposing the video, e.g. by using automated scene detection, a more spatially and or temporally compact presentation of the video can be generated. For example, 7] has developed news video summarization ....

....modalities, including images, video, text, audio and graphics. VIR systems differ in their treatment of the multiple modalities. Typically, if multiple modalities are considered, they are indexed independently. The integration of multiple modalities has been investigated in a few systems [6,12] but has not yet been fully exploited. Adaptability Most VIR systems use a static set of previously extracted features. The selection of these features by the system and or system designer involves trade offs in the indexing costs and search functionalities. However, due to the ....

A. G. Hauptmann and M. Smith, "Text, Speech and Vision for Video Segmentation: The Informedia Project," AAAI Fall Symposium, Computational Models for Integrating Language and Vision, Boston, November 10-12, 1995.


Informedia: News-on-Demand Multimedia Information.. - Hauptmann, Witbrock (1997)   (28 citations)  Self-citation (Hauptmann)   (Correct)

.... yellow flowers , that might not have been easily identified from the annotations or image qualities alone. 1. 3 Component Technologies There are three broad categories of technologies we can bring to bear to create and search a digital video library built from broadcast video and audio materials [HauptmannSmith95]: Text processing looks at the textual (ASCII) representation of the words that were spoken, and at other text annotations that may be derived from the transcript, from the production notes, or from closed captioning that is sometimes broadcast with the news stories. Text analysis can work on an ....

....in natural sounding English. Natural language processing also has a role in query matching for optimal retrieval from the story texts. Finally, the system would greatly improve if queries could be parsed to separate out dates, major concepts and types of news sources. Image processing research [HauptmannSmith95] is continuing to refine the identification of cuts in the video for scene segmentation. Within a scene and within a story, image processing gives us a key frame to represent that scene or story. The choice of a single key frame to best represent a whole scene is a subject of active research. In ....

Hauptmann, A. G., and Smith, M., " Text, Speech, and Vision for Video Segmentation: The Informedia Project," AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision, Cambridge, MA: MIT, November 1995, pp. 90-95.


Informedia News-On Demand: Using Speech Recognition to .. - Wactlar, Hauptmann.. (1998)   (2 citations)  Self-citation (Hauptmann)   (Correct)

....architecture and the information retrieval component. No image processing and no speech recognition is performed. Component Technologies There are three broad categories of technologies we can bring to bear to create and search a digital video library from broadcast video and audio materials (Hauptmann Smith 1995) Text processing looks at the textual (ASCII) representation of the words that were spoken, as well as other text annotations. These may be derived from the transcript, from the production notes or from the closedcaptioning that might be available. Text analysis can work on an existing transcript ....

....news stories in normal English. Natural language processing also has a role in query matching for optimal retrieval from the story texts. Finally, the system would greatly improve if queries could be parsed to separate out dates, major concepts and types of news sources. Image processing research (Hauptmann Smith 1995) is continuing to refine the scene segmentation component. The choice of a single key frame to best represent a whole scene is a subject of active research. In the longer term, we plan to add text detection and apply OCR capabilities to reading text off the screen. We also hope to include ....

Hauptmann, A. G., and Smith, M., 1995,"Text, Speech, and Vision for Video Segmentation: The Informedia Project," AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision, Cambridge, MA: MIT, pp. 90-95.


Simplifying Video Editing Using Metadata - Juan Casares Chris (2002)   (1 citation)  (Correct)

No context found.

Hauptmann, A. and Smith, M. "Text, Speech, and Vision for Video Segmentation: The Informedia Project." In AAAI Symposium on Computational Models for Integrating Language and Vision, Cambridge, MA, Nov. 10--12, 1995.


A Multi-View Intelligent Editor - For Digital Video   (Correct)

No context found.

Hauptmann, A. and Smith, M. "Text, Speech, and Vision for Video Segmentation: The Informedia Project," in AAAI Symposium on Computational Models for Integrating Language and Vision. 1995.


Portable Meeting Recorder - Dar-Shyang Lee Berna (2002)   (Correct)

No context found.

Hauptmann, A. G., and Smith, M., "Text speech and vision for video segmentation: The informedia project," Proceedings of the AAAI Fall Symposium on Computational Models for Integrating Language and Vision, 1995.


Video Scouting: An Architecture and System for the .. - Jasinschi.. (2001)   (Correct)

No context found.

A. Hauptmann and M. Smith, "Text, speech, and vision for video segmentation: The Informedia project," AAAI Symp. on Comp. Models for Integrating Lang. and Vision, 1995.


A Multi-paradigm Querying Approach for a Generic Multimedia .. - Wen, Li, Ma, Zhang (2002)   (Correct)

No context found.

Hauptmann, A., Smith, M., "Text, speech, and vision for video segmentation: The Informedia project", In AAAI Fall 1995.


From Low Level Features To High Level - Zhang   (Correct)

No context found.

A. G. Hauptmann and M. A. Smith. "Text, Speech, and Vision for Video Segmentation: The Informedia Project." In AAAI Fall Symposium, Computational Models for Integrating Language and Vision, Boston, 1995.


Simplifying Video Editing Using Metadata - Juan Casares Chris (2002)   (1 citation)  (Correct)

No context found.

Hauptmann, A. and Smith, M. "Text, Speech, and Vision for Video Segmentation: The Informedia Project." In AAAI Symposium on Computational Models for Integrating Language and Vision, Cambridge, MA, Nov. 10--12, 1995.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC