Results 1 - 10
of
44
Multimodal Video Indexing: A Review of the State-of-the-art
- Multimedia Tools and Applications
, 2003
"... Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video in ..."
Abstract
-
Cited by 103 (18 self)
- Add to MetaCart
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.
News Video Classification Using SVM-based Multimodal Classifiers and Combination Strategies
- In ACM Multimedia, Juan-les-Pins
, 2002
"... Video classification is the first step toward multimedia content understanding. When video is classified into conceptual categories, it is usually desirable to combine evidence from multiple modalities. However, combination strategies in previous studies were usually ad hoc. We investigate a meta-cl ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Video classification is the first step toward multimedia content understanding. When video is classified into conceptual categories, it is usually desirable to combine evidence from multiple modalities. However, combination strategies in previous studies were usually ad hoc. We investigate a meta-classification combination strategy using Support Vector Machine, and compare it with probability-based strategies. Text features from closedcaptions and visual features from images are combined to classify broadcast news video. The experimental results show that combining multimodal classifiers can significantly improve recall and precision, and our meta-classification strategy gives better precision than the approach of taking the product of the posterior probabilities.
Unsupervised Discovery Of Multilevel Statistical Video Structures Using Hierarchical Hidden Markov Models
- IN PROC. ICME
, 2003
"... Structure elements in a time sequence (e.g. video) are repetitive segments with consistent deterministic or stochastic characteristics. While most existing work in detecting structures follow a supervised paradigm, we propose a fully unsupervised statistical solution in this paper. We present a unif ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
Structure elements in a time sequence (e.g. video) are repetitive segments with consistent deterministic or stochastic characteristics. While most existing work in detecting structures follow a supervised paradigm, we propose a fully unsupervised statistical solution in this paper. We present a unified approach to structure discovery from long video sequences as simultaneously finding the statistical descriptions of structure and locating segments that matches the descriptions. We model the multilevel statistical structure as hierarchical hidden Markov models, and present efficient algorithms for learning both the parameters and the model structure. When tested on a specific domain, soccer video, the unsupervised learning scheme achieves very promising results: it automatically discovers the statistical descriptions of high-level structures, and at the same time achieves even slightly better accuracy in detecting discovered structures in unlabelled videos than a supervised approach designed with domain knowledge and trained with comparable hidden Markov models.
Unsupervised Mining of Statistical Temporal Structures
- VIDEO MINING, AZREIL ROSENFELD, DAVID DOERMANN, DANIEL DEMENTHON EDS
, 2003
"... In this paper, we present algorithms for unsupervised mining of structures in video using multiscale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semant ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
In this paper, we present algorithms for unsupervised mining of structures in video using multiscale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semantics, particularly in structured domains like sports. While much work in the literature explores the link between the observations and the semantics using supervised learning, we propose unsupervised structure mining algorithms that aim at alleviating the burden of labelling and training, as well as providing a scalable solution for generalizing video indexing techniques to heterogeneous content collections such as surveillance and consumer videos. Existing unsupervised video structuring works primarily use clustering techniques, while the rich statistical characteristics in the temporal dimension at different granularity remain unexplored. Automatically identifying structures from an unknown domain poses significant challenges when domain knowledge is not explicitly present to assist algorithm design, model selection, and feature selection. In this work, we model multi-level statistical structures with hierarchical hidden Markov models based on a multi-level Markov dependency assumption. The parameters of the model are efficiently estimated using the EM algorithm, we have also developed a model structure learning algorithm that uses stochastic sampling techniques to find the optimal model structure, and a feature selection algorithm that automatically finds compact relevant feature sets using hybrid wrapper-filter methods. When tested on sports videos, the unsupervised learning scheme achieves very promising results: (1) The automatically selectead feature set...
Audiovisual integration for tennis broadcast structuring
- In International Workshop on (CBMI’03
, 2003
"... This paper focuses on the integration of multimodal features for sport video structure analysis. The method relies on a statistical model which takes into account both the shot content and the interleaving of shots. This stochastic modelling is performed in the global framework of Hidden Markov Mode ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper focuses on the integration of multimodal features for sport video structure analysis. The method relies on a statistical model which takes into account both the shot content and the interleaving of shots. This stochastic modelling is performed in the global framework of Hidden Markov Models (HMMs) that can be efficiently applied to merge audio and visual cues. Our approach is validated in the particular domain of tennis videos. The model integrates prior information about tennis content and editing rules. The basic temporal unit is the video shot. Visual features are used to characterize the type of shot view. Audio features describe the audio events within a video shot. As a result, typical tennis scenes are simultaneously segmented and identified. 1.
Classifying Music By Genre Using The Wavelet Packet Transform And Round-Robin Ensemble
, 2002
"... The vast amount of music available electronically presents considerable challenges for information retrieval. There is a need to annotate music items with descriptors in order to facilitate retrieval. In this paper we present a process for determining the music genre of an item using the Discrete Wa ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The vast amount of music available electronically presents considerable challenges for information retrieval. There is a need to annotate music items with descriptors in order to facilitate retrieval. In this paper we present a process for determining the music genre of an item using the Discrete Wavelet Transform and a round-robin classification technique. The wavelet transform is used to extract time and frequency features that are used to classify items by genre. Rather than use a single multi-class classifier we use an ensemble of binary classifiers with each classifier mined on a pair of genres. Our evaluation shows that this approach achieves very high classification accuracy.
Using MPEG standards for multimedia customization
, 2004
"... The multimedia content delivery chain poses today many challenges. The increasing terminal diversity, network heterogeneity and the pressure to satisfy the user preferences are raising the need for content to be customized in order to provide the user the best possible experience. This paper address ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
The multimedia content delivery chain poses today many challenges. The increasing terminal diversity, network heterogeneity and the pressure to satisfy the user preferences are raising the need for content to be customized in order to provide the user the best possible experience. This paper addresses the problem ofmultimedia customization by (1) presenting the MPEG-7 multimedia content description standard and the MPEG-21 multimedia framework; (2) classifying multimedia customization processing algorithms; (3) discussing multimedia customization systems; and (4) presenting some customization experiments.
Audio-based event detection for sports video
- In CIVR2003
, 2003
"... Abstract. In this paper, we present an audio-based event detection approach shown to be effective when applied to the Sports broadcast data. The main benefit of this approach is the ability to recognise patterns that indicate high levels of crowd response which can be correlated to key events. By ap ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Abstract. In this paper, we present an audio-based event detection approach shown to be effective when applied to the Sports broadcast data. The main benefit of this approach is the ability to recognise patterns that indicate high levels of crowd response which can be correlated to key events. By applying Hidden Markov Model-based classifiers, where the predefined content classes are parameterised using Mel-Frequency Cepstral Coefficients, we were able to eliminate the need for defining a heuristic set of rules to determine event detection, thus avoiding a two-class approach shown not to be suitable for this problem. Experimentation indicated that this is an effective method for classifying crowd response in Soccer matches, thus providing a basis for automatic indexing and summarisation. 1
Learning Hierarchical Hidden Markov Models for Video Structure Discovery
, 2002
"... Structure elements in a time sequence are repetitive segments that bear consistent deterministic or stochastic characteristics. While most existing work in detecting structures follow a supervised paradigm, we propose a fully unsupervised statistical solution in this paper. We present a unified appr ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Structure elements in a time sequence are repetitive segments that bear consistent deterministic or stochastic characteristics. While most existing work in detecting structures follow a supervised paradigm, we propose a fully unsupervised statistical solution in this paper. We present a unified approach to structure discovery from long video sequences as simultaneously finding the statistical descriptions of structure and locating segments that matches the descriptions. We model the multilevel statistical structure as hierarchical hidden Markov models, and present efficient algorithms for learning both the parameters, as well as the model structure including the complexity of each structure element and the number of elements in the stream. We have also proposed feature selection algorithms that iterate between a wrapper and a filter method to partition the large feature pool into consistent and compact subsets, upon which the hierarchical hidden Markov model is learned. When tested on a specific domain, soccer video, the unsupervised learning scheme achieves very promising results: the automatically selected feature set includes the manually identified intuitively most significant feature, and the system automatically discovers the statistical descriptions of high-level structures, and at the same time achieves even slightly better accuracy in detecting discovered structures in unlabelled videos than a supervised approach designed with domain knowledge and trained with comparable hidden Markov models. Keywords: video structure, statistical learning, unsupervised learning, feature selection, hierarchical hidden Markov model, hidden Markov model, model selection, MCMC Contents 1
Information theory-based shot cut/fade detection and video summarization
- IEEE Transactions on Circuits and Systems for Video Technology
, 2006
"... Abstract — New methods for detecting shot boundaries in video sequences and for extracting key frames using metrics based on information theory are proposed. The method for shot boundary detection relies on the mutual information (MI) and the joint entropy (JE) between the frames. It can detect cuts ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract — New methods for detecting shot boundaries in video sequences and for extracting key frames using metrics based on information theory are proposed. The method for shot boundary detection relies on the mutual information (MI) and the joint entropy (JE) between the frames. It can detect cuts, fadeins and fade-outs. The detection technique was tested on the TRECVID2003 video test set having different types of shots and containing significant object and camera motion inside the shots. It is demonstrated that the method detects both fades and abrupt cuts with high accuracy. The information theory measure provides us with better results because it exploits the inter-frame information in a more compact way than frame subtraction. It was also successfully compared to other methods published in literature. The method for key frame extraction uses mutual information as well. We show that it captures satisfactorily the visual content of the shot. Index Terms — shot boundary detection, entropy, mutual information, detection accuracy, video segmentation, video analysis, key frame extraction. I.

