Results 1 - 10
of
171
Comparison of Video Shot Boundary Detection Techniques
, 1996
"... Many algorithms have been proposed for detecting video shot boundaries and classifying shot and shot transition types. Few published studies compare available algorithms, and those that do have looked at limited range of test material. This paper ..."
Abstract
-
Cited by 174 (4 self)
- Add to MetaCart
Many algorithms have been proposed for detecting video shot boundaries and classifying shot and shot transition types. Few published studies compare available algorithms, and those that do have looked at limited range of test material. This paper
A Feature-Based Algorithm for Detecting and Classifying Scene Breaks
"... We describe a new approach to the detection and classification of scene breaks in video sequences. Our method can detect and classify a variety of scene breaks, including cuts, fades, dissolves and wipes, even in sequences involving signi cant motion. We detect the appearance of intensity edges that ..."
Abstract
-
Cited by 170 (2 self)
- Add to MetaCart
We describe a new approach to the detection and classification of scene breaks in video sequences. Our method can detect and classify a variety of scene breaks, including cuts, fades, dissolves and wipes, even in sequences involving signi cant motion. We detect the appearance of intensity edges that are distant from edges in the previous frame. A global motion computation is used to handle camera or object motion. The algorithm we propose withstands JPEG and MPEG artifacts, even at very high compression rates. Experimental evidence demonstrates that our method can detect and classify scene breaks that are difficult to detect with previous approaches. An initial implementation runs at approximately 2 frames per second on a Sun workstation.
Multimodal Video Indexing: A Review of the State-of-the-art
- Multimedia Tools and Applications
, 2003
"... Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video in ..."
Abstract
-
Cited by 103 (18 self)
- Add to MetaCart
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.
Video Indexing Based on Mosaic Representations
, 1998
"... Video is a rich source of information. It provides visual information about scenes. However, this information is implicitly buried inside the raw video data, and is provided with the cost of very high temporal redundancy. While the standard sequential form of video storage is adequate for viewing in ..."
Abstract
-
Cited by 99 (0 self)
- Add to MetaCart
Video is a rich source of information. It provides visual information about scenes. However, this information is implicitly buried inside the raw video data, and is provided with the cost of very high temporal redundancy. While the standard sequential form of video storage is adequate for viewing in a "movie mode", it fails to support rapid access to information of interest that is required in many of the emerging applications of video. This paper presents an approach for efficient access, use and manipulation of video data. The video data is first transformed from its sequential and redundant frame-based representation in which the information about the scene is distributed over many frames, to an explicit and compact scene-based representation, to which each frame can be directly related. This compact reorganization of the video data supports non-linear browsing and efficient indexing to provide rapid access directly to information of interest. The paper describes a new set of metho...
Comparison of automatic shot boundary detection algorithms
, 1999
"... Various methods of automatic shot boundary detection have been proposed and claimed to perform reliably. Although the detection of edits is fundamental to any kind of video analysis since it segments a video into its basic components, the shots, only few comparative investigations on early shot boun ..."
Abstract
-
Cited by 86 (2 self)
- Add to MetaCart
Various methods of automatic shot boundary detection have been proposed and claimed to perform reliably. Although the detection of edits is fundamental to any kind of video analysis since it segments a video into its basic components, the shots, only few comparative investigations on early shot boundary detection algorithms have been published. These investigations mainly concentrate on measuring the edit detection performance, however, do not consider the algorithms ’ ability to classify the types and to locate the boundaries of the edits correctly. This paper extends these comparative investigations. More recent algorithms designed explicitly to detect specific complex editing operations such as fades and dissolves are taken into account, and their ability to classify the types and locate the boundaries of such edits are examined. The algorithms ’ performance is measured in terms of hit rate, number of false hits, and miss rate for hard cuts, fades, and dissolves over a large and diverse set of video sequences. The experiments show that while hard cuts and fades can be detected reliably, dissolves are still an open research issue. The false hit rate for dissolves is usually unacceptably high, ranging from 50 % up to over 400%. Moreover, all algorithms seem to fail under roughly the same conditions.
A User Attention Model for Video Summarization
- In Proceedings of ACM Multimedia
, 2003
"... Automatic generation of video summarization is one of the key techniques in video management and browsing. In this paper, we present a generic framework of video summarization based on the modeling of viewer's attention. Without fully semantic understanding of video content, this framework takes adv ..."
Abstract
-
Cited by 68 (10 self)
- Add to MetaCart
Automatic generation of video summarization is one of the key techniques in video management and browsing. In this paper, we present a generic framework of video summarization based on the modeling of viewer's attention. Without fully semantic understanding of video content, this framework takes advantage of computational attention models and eliminates the needs of complex heuristic rules in video summarization. A set of methods of audio-visual attention model features are proposed and presented. The experimental evaluations indicate that the computational attention based approach is an effective alternative to video semantic analysis for video summarization.
Event-based analysis of video
- In Proc. CVPR
, 2001
"... Dynamic events can be regarded as long-term temporal objects, which are characterized by spatiotemporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences (possibly of different lengths) based on their behavioral content. This ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
Dynamic events can be regarded as long-term temporal objects, which are characterized by spatiotemporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences (possibly of different lengths) based on their behavioral content. This measure is non-parametric and can thus handle a wide range of dynamic events. Having an event-based distance measure between sequences, we use it for a variety of tasks, including: (i) event-based search and indexing into long video sequences (for “intelligent fast forward”), (ii) temporal segmentation of long video sequences based on behavioral content, and (iii) clustering events within long video sequence into event-consistent sub-sequences (i.e., into event-consistent “clusters”). These tasks are performed without prior knowledge of the types of events, their models, or their temporal extents. Our simple event representation and associated distance measure supports event-based search and indexing even when only one short example-clip is available. However, when multiple example-clips of the same event are available (either as a result of the clustering process, or supplied manually), these can be used to refine the event representation, the associated distance measure, and accordingly the quality of the detection and clustering process. 1
Constructing Table-of-Content for Videos
- ACM Multimedia Systems
, 1999
"... A fundamental task in video analysis is to extract structures from the video to facilitate user's access (browsing and retrieval). Motivated by the important role that Table-of-Content (ToC) plays in a book, in this paper we introduce the concept of ToC in the video domain. Some existing approaches ..."
Abstract
-
Cited by 64 (1 self)
- Add to MetaCart
A fundamental task in video analysis is to extract structures from the video to facilitate user's access (browsing and retrieval). Motivated by the important role that Table-of-Content (ToC) plays in a book, in this paper we introduce the concept of ToC in the video domain. Some existing approaches implicitly use the ToC, but are mainly limited to low-level entities (e.g. shots and key frames). The drawbacks are that low-level structures (1) contain too many entries to be e ciently presented to the user � and (2) do not capture the underlying semantic structure of the video based on which the user may wishtobrowse/retrieve. To address these limitations, in this paper we present an e ective semantic-level ToC construction technique based on intelligent unsupervised clustering. It has the characteristics of better modeling the time locality and scene structure. Experiments based on real-world movie videos validate the e ectiveness of the proposed approach. Examples are given to demonstrate the usage of the scene based ToC in facilitating user's access to the video. Key words: video accessing, scene level ToC construction 1
Automatic Audio Content Analysis
, 1996
"... This paper describes the theoretic framework and applications of automatic audio content analysis. After explaining the tools for audio analysis such as analysis of the pitch or the frequency spectrum, we describe new applications which can be developed using the toolset. We discuss content-based se ..."
Abstract
-
Cited by 64 (3 self)
- Add to MetaCart
This paper describes the theoretic framework and applications of automatic audio content analysis. After explaining the tools for audio analysis such as analysis of the pitch or the frequency spectrum, we describe new applications which can be developed using the toolset. We discuss content-based segmentation of the audio stream, music analysis and violence detection.
A survey of technologies for parsing and indexing digital video
- Journal of visual Communication and image representation
, 1996
"... Abstract–In the future we envision systems that will provide video information delivery services to customers on a very large scale. These systems must provide customers with mechanisms to select programs of their choice from live broadcasts. Customers should also be provided with easy means of brow ..."
Abstract
-
Cited by 64 (8 self)
- Add to MetaCart
Abstract–In the future we envision systems that will provide video information delivery services to customers on a very large scale. These systems must provide customers with mechanisms to select programs of their choice from live broadcasts. Customers should also be provided with easy means of browsing and accessing pre-recorded digital data (e.g., distributed digital multimedia libraries), and downloading data from other information sources. To be viable for such large information sets, these systems must understand customer preferences and tailor the available information to the customer’s needs. To support this vision, a number of issues must be addressed and obstacles overcome. Intuitive interfaces, powerful query formulation and evaluation techniques, comprehensive data models, and flexible presentation functionalities must be developed. To realize these components, an effective query evaluation engine with the capabilities of query resolution in different content-specific formats (e.g., by graphics, by image, by sound) and in different domain-specific models (e.g., database of movies, database of newsclips) should be present. Additionally, the digital video database will require an efficient indexing system for easy access to the stored information. In this paper we discuss existing research trends in this

