Results 1 - 10
of
12
Multimedia Content Analysis Using Both Audio and Visual Cues
, 2000
"... : Including all the scenes/shots that contain special events may generate too long an abstract. Also, simply staggering them together may not be visually or aurally appealing. In the MoCA project, it was determined that only 50% of the abstract should contain special events. The remaining part shoul ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
: Including all the scenes/shots that contain special events may generate too long an abstract. Also, simply staggering them together may not be visually or aurally appealing. In the MoCA project, it was determined that only 50% of the abstract should contain special events. The remaining part should be left for filler clips. The special event clips to be included are chosen uniformly and randomly from different types of events. The selection of a short clip from a scene is subject to some additional criteria, such as the amount of action and the similarity to the overall color composition of the movie. Closeness to the desired AV characteristics of certain scene types are also considered. The filler clips are chosen so that they do not overlap with the content of chosen special event clips, to ensure a good coverage of all parts of a movie. MPEG-7 Standard for Multimedia Content Description Interface MPEG-7 is an on-going standardization effort for content description of AV documen...
Audio content analysis for online audiovisual data segmentation and classification
- 62 IEEE SIGNAL PROCESSING MAGAZINE MARCH 2004
, 2001
"... Abstract—While current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
Abstract—While current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based on audio content analysis is proposed. The audio signal from movies or TV programs is segmented and classified into basic types such as speech, music, song, environmental sound, speech with music background, environmental sound with music background, silence, etc. Simple audio features including the energy function, the average zero-crossing rate, the fundamental frequency, and the spectral peak tracks are extracted to ensure the feasibility of real-time processing. A heuristic rule-based procedure is proposed to segment and classify audio signals and built upon morphological and statistical analysis of the time-varying functions of these audio features. Experimental results show that the proposed scheme achieves an accuracy rate of more than 90 % in audio classification. Index Terms—Audio analysis, audio indexing, audio segmentation, audiovisual content parsing, information filtering and retrieval, multimedia database management. I.
Automatic Video Scene Extraction by Shot Grouping
- Proc. ICPR
, 2000
"... For more efficient organizing, browsing, and retrieving digital video content, it is important to extract video structure information at both scene and shot levels. This paper presents an effective approach to video scene segmentation based on a pseudo-object-based shot correlation analysis. A new m ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
For more efficient organizing, browsing, and retrieving digital video content, it is important to extract video structure information at both scene and shot levels. This paper presents an effective approach to video scene segmentation based on a pseudo-object-based shot correlation analysis. A new measure of the semantic correlation of consecutive shots based on dominant color grouping and tracking is proposed. A new shot grouping method named expanding window is designed to cluster correlated consecutive shots into one scene. Evaluations based on real-world sports video programs validate the efficiency and effectiveness of our shot correlation measure and scene structure construction. 1.
Survey of Compressed-Domain Features used in Audio-Visual Indexing and Analysis
"... In this paper, we attempt to provide a comprehensive and high-level review of audiovisual features that can be extracted from the standard compressed domains, such as MPEG-1 and MPEG-2. The paper is motivated by the myriad of active research works in extraction and application of compressed-domain f ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
In this paper, we attempt to provide a comprehensive and high-level review of audiovisual features that can be extracted from the standard compressed domains, such as MPEG-1 and MPEG-2. The paper is motivated by the myriad of active research works in extraction and application of compressed-domain features in various fields, such as indexing, filtering, and manipulation. Compressed domain approaches avoid expensive computation and memory requirements involved in decoding and/or re-encoding. Selected features are categorized into four groups -- spatial visual (e.g., color, texture, edge, shape), motion (e.g., motion field, trajectory), audio (e.g., energy, spectral features, pitch), and coding (e.g., bit rate, frame/block type). For each feature, we briefly discuss the extraction methods, computational complexity, potential effectiveness in applications, and possible limitations caused by compress-domain approaches. Finally, we briefly describe audio-visual features specified in the MPEG-7 standard and discuss the possibility of extracting them in the compressed domain.
Joint Video Scene Segmentation And Classification Based On Hidden Markov Model
- ICME-2000
, 2000
"... Video classi#cation and segmentation are fundamental steps for e#cient accessing, retrieving and browsing large amount of video data. Wehave developed a scene classi#cation scheme using a Hidden Markov Model #HMM#- based classi#er. By utilizing the temporal behaviors of di#erent scene classes, HMM ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Video classi#cation and segmentation are fundamental steps for e#cient accessing, retrieving and browsing large amount of video data. Wehave developed a scene classi#cation scheme using a Hidden Markov Model #HMM#- based classi#er. By utilizing the temporal behaviors of di#erent scene classes, HMM classi#er can e#ectively classify video segments into one of the prede#ned scene classes. In this paper, we describe two approaches for joint video classi#cation and segmentation based on HMM, which works by searching for the most likely class transition path utilizing the dynamic programming technique. 1. INTRODUCTION Video classi#cation and segmentation are fundamental steps for e#cient accessing, retrieving and browsing large amount of video data. Recently, several research groups have developed algorithms to detect scene change by incorporating audio and visual information. Most of these works #1, 2, 3# are based on some prior scene models, #e.g. dialog, setting, etc.# and accomplish ...
Dialogue scene detection in movies using low and mid-level visual features
- proceedings of International Workshop on Image, Video, and Audio Retrieval
, 2001
"... This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shotlevel temp ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shotlevel temporal characteristics of the scene under investigation. The choice of visual features used is motivated by a consideration of formal film syntax. The system is designed so that the analysis may be applied in order to detect different types of scenes, although in this paper we focus on dialogue sequences as these are the most prevalent scenes in the movies considered to date. 1
Video Content Representation For Shot Retrieval And Scene Extraction
- International Journal of Image & Graphics
, 2001
"... this paper was mainly performed when this author was working at Microsoft Research, China as a research intern ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
this paper was mainly performed when this author was working at Microsoft Research, China as a research intern
Action sequence detection in motion pictures
- in: Proc. European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology
, 2004
"... This paper describes an approach for automatically detecting action sequences in movies. We examine the filmmaking conventions that are inherent in action sequences and use these as a basis for our analysis. A system of low- mid- and highlevel features is presented that combines pure digital video a ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper describes an approach for automatically detecting action sequences in movies. We examine the filmmaking conventions that are inherent in action sequences and use these as a basis for our analysis. A system of low- mid- and highlevel features is presented that combines pure digital video analysis at the low- and mid-levels, with high-level filmmaking knowledge. A state machine is used to combine these features and detect the action sequences. The overall system is designed so that the analysis can be used to detect different types of scenes, but in this paper we focus on action sequence detection. 1
Detection of documentary scene changes by audio-visual fusion
- In Proc. CIVR
, 2004
"... Abstract. The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint obse ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections. 1
Football Video Segmentation Based on Video Production Strategy
- In ECIR 2005
, 2005
"... Abstract. We present a statistical approach for parsing football video structures. Based on video production conventions, a new generic structure called ‘attack ’ is identified, which is an equivalent of scene in other video domains. We define four video segments to construct it, namely play, focus, ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. We present a statistical approach for parsing football video structures. Based on video production conventions, a new generic structure called ‘attack ’ is identified, which is an equivalent of scene in other video domains. We define four video segments to construct it, namely play, focus, replay and break. Two middle level visual features, play field ratio and zoom size, are also computed. The detection process includes a two-pass classifier, a combination of Gaussian Mixture Model and Hidden Markov Models. A general suffix tree is introduced to identify and organize ‘attack’. In experiments, video structure classification accuracy of about 86 % is achieved on broadcasting World Cup 2002 video data. 1

