Results 1 - 10
of
18
VENKATESH S.: Video abstraction: A systematic review and classification
- ACM Transactions on Multimedia Computing, Communications and Applications
"... The demand for various multimedia applications is rapidly increasing due to the recent advance in the computing and network infrastructure, together with the widespread use of digital video technology. Among the key elements for the success of these applications is how to effectively and efficiently ..."
Abstract
-
Cited by 107 (0 self)
- Add to MetaCart
The demand for various multimedia applications is rapidly increasing due to the recent advance in the computing and network infrastructure, together with the widespread use of digital video technology. Among the key elements for the success of these applications is how to effectively and efficiently manage and store a huge amount of audio visual information, while at the same time providing user-friendly access to the stored data. This has fueled a quickly evolving research area known as video abstraction. As the name implies, video abstraction is a mechanism for generating a short summary of a video, which can either be a sequence of stationary images (keyframes) or moving images (video skims). In terms of browsing and navigation, a good video abstract will enable the user to gain maximum information about the target video sequence in a specified time constraint or sufficient information in the minimum time. Over past years, various ideas and techniques have been proposed towards the effective abstraction of video contents. The purpose of this article is to provide a systematic classification of these works. We identify and detail, for each approach, the underlying components and how they are addressed in specific works.
A Survey on Visual Content-Based Video Indexing and Retrieval
"... Abstract—Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for vi ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions. Index Terms—Feature extraction, video annotation, video browsing, video retrieval, video structure analysis. I.
M.: Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs
- In: CIVR ’08: Proceedings of the 2008 international conference on Content-based image and video retrieval, Niagara Falls
, 2008
"... The SenseCam is a passive capture wearable camera, worn around the neck, and when worn continuously it takes an average of 1,900 images per day. It can be used to create a personal lifelog or visual recording of the wearer’s life which can be helpful as an aid to human memory. For such a large amoun ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
The SenseCam is a passive capture wearable camera, worn around the neck, and when worn continuously it takes an average of 1,900 images per day. It can be used to create a personal lifelog or visual recording of the wearer’s life which can be helpful as an aid to human memory. For such a large amount of visual information to be useful, it needs to be structured into “events”, which can be achieved through automatic segmentation. An important component of this structuring process is the selection of keyframes to represent individual events. This work investigates a variety of techniques for the selection of a single representative keyframe image from each event, in order to provide the user with an instant visual summary of that event. In our experiments we use a large test set of 2,232 lifelog events collected by 5 users over a time period of one month each (equating to 194,857 images). We propose a novel keyframe selection technique which seeks to select the image with the highest “quality” as the keyframe. The inclusion of “quality ” approaches in keyframe selection is demonstrated to be useful owing to the high variability in image visual quality within passively captured image collections. 1.
Active frame selection for label propagation in videos
- IN ECCV
, 2012
"... Manually segmenting and labeling objects in video sequences is quite tedious, yet such annotations are valuable for learning-based approaches to object and activity recognition. While automatic label propagation can help, existing methods simply propagate annotations from arbitrarily selected fram ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Manually segmenting and labeling objects in video sequences is quite tedious, yet such annotations are valuable for learning-based approaches to object and activity recognition. While automatic label propagation can help, existing methods simply propagate annotations from arbitrarily selected frames (e.g., the first one) and so may fail to best leverage the human effort invested. We define an active frame selection problem: select k frames for manual labeling, such that automatic pixel-level label propagation can proceed with minimal expected error. We propose a solution that directly ties a joint frame selection criterion to the predicted errors of a flow-based random field propagation model. It selects the set of k frames that together minimize the total mislabeling risk over the en-tire sequence. We derive an efficient dynamic programming solution to optimize the criterion. Further, we show how to automatically determine how many total frames k should be labeled in order to minimize the total manual effort spent labeling and correcting propagation errors. We demonstrate our method’s clear advantages over several baselines, saving hours of human effort per video.
Learning Discriminative Key Poses for Action Recognition
"... Abstract—In this paper, we present a new approach for human action recognition based on key-pose selection and representation. Poses in video frames are described by the proposed extensive pyramidal features (EPFs), which include the Gabor, Gaussian, and wavelet pyramids. These features are able to ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper, we present a new approach for human action recognition based on key-pose selection and representation. Poses in video frames are described by the proposed extensive pyramidal features (EPFs), which include the Gabor, Gaussian, and wavelet pyramids. These features are able to encode the ori-entation, intensity, and contour information and therefore provide an informative representation of human poses. Due to the fact that not all poses in a sequence are discriminative and representative, we further utilize the AdaBoost algorithm to learn a subset of discriminative poses. Given the boosted poses for each video se-quence, a new classifier named weighted local naive Bayes nearest neighbor is proposed for the final action classification, which is demonstrated to be more accurate and robust than other classi-fiers, e.g., support vector machine (SVM) and naive Bayes nearest neighbor. The proposed method is systematically evaluated on the KTH data set, the Weizmann data set, the multiview IXMAS data set, and the challenging HMDB51 data set. Experimental results manifest that our method outperforms the state-of-the-art techniques in terms of recognition rate. Index Terms—AdaBoost, computer vision, extensive pyrami-dal features (EPFs), human action recognition, pose selection, weighted local naive Bayes nearest neighbor (WLNBNN) classifier. I.
Automatic 3D video summarization: Key frame extraction from selfsimilarity
- In Proceedings of the Forth International Symposium on 3D Data Processing, Visualization and Transmission
, 2008
"... In this paper we present an automatic key frame selection method to summarise 3D video sequences. Key-frame selection is based on optimisation for the set of frames which give the best representation of the sequence according to a rate-distortion trade-off. Distortion of the summarization from the o ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
In this paper we present an automatic key frame selection method to summarise 3D video sequences. Key-frame selection is based on optimisation for the set of frames which give the best representation of the sequence according to a rate-distortion trade-off. Distortion of the summarization from the original sequence is based on measurement of self-similarity using volume histograms. The method evaluates the globally optimal set of key-frames to represent the entire sequence without requiring pre-segmentation of the sequence into shots or temporal correspondence. Results demonstrate that for 3D video sequences of people wearing a variety of clothing the summarization automatically selects a set of key-frames which represent the dynamics. Comparative evaluation of rate-distortion characteristics with previous 3D video summarization demonstrates improved performance. 1.
A rapid and robust method for shot boundary detection and classification in uncompressed MPEG video sequences
- Int. J. Comput. Sci. Issues
, 2012
"... Abstract Shot boundary and classification is the first and most important step for further analysis of video content. Shot transitions include abrupt changes and gradual changes. A rapid and robust method for shot boundary detection and classification in MPEG compressed sequences is proposed in thi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract Shot boundary and classification is the first and most important step for further analysis of video content. Shot transitions include abrupt changes and gradual changes. A rapid and robust method for shot boundary detection and classification in MPEG compressed sequences is proposed in this paper. We firstly only decode I frames partly in video sequences to generate DC images and then calculate the difference values of histogram of these DC images in order to detect roughly the shot boundary. Then, for abrupt change detection, shot boundary is precisely located by movement information of B frames. Shot gradual change is located by difference values of successive N I frames and classified by the alteration of the number of intra coding macroblocks (MBs) in P frames. All features such as the number of MBs in frames are extracted from uncompressed video sequences. Experiments have been done on the standard TRECVid video database and others to reveal the performance of the proposed method.
AN EVENT BASED APPROACH TO VIDEO ANALYSIS AND KEYFRAME SELECTION
"... We propose an event based approach for locating keyframes in natural video through detection of locally correlated spectral targets. Temporal Decomposition (TD) is used to describe a set of spectral parameters of the video as a linear combination of a set of temporally overlapping event functions. T ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We propose an event based approach for locating keyframes in natural video through detection of locally correlated spectral targets. Temporal Decomposition (TD) is used to describe a set of spectral parameters of the video as a linear combination of a set of temporally overlapping event functions. This process provides preliminary information about keyframes, by selecting the frames located at event centroids as the keyframe candidates. No shot or shot cluster boundary detection is needed and keyframes are extracted directly from among event centroids that are much smaller in number than the number of frames. Generalized Gaussian Density (GGD) parameters, extracted from 2D wavelet transform subbands of the frames, are used as the spectral parameters in the event detection process and Kullback-Leibler distance (KLD) is employed as a measure to select salient keyframes. Experimental results confirm superiority of the proposed scheme over the conventional keyframe selection approaches. Index Terms — Video scene analysis, keyframe selection, temporal decomposition, event function.
Autoregressive Video Modeling through 2D Wavelet Statistics
"... Abstract — We present an Autoregressive (AR) modeling method for video signal analysis based on 2D Wavelet Statistics. The video signal is assumed to be a combination of spatial feature time series that are temporally approximated by the AR model. The AR model yields a linear approximation to the te ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — We present an Autoregressive (AR) modeling method for video signal analysis based on 2D Wavelet Statistics. The video signal is assumed to be a combination of spatial feature time series that are temporally approximated by the AR model. The AR model yields a linear approximation to the temporal evolution of a stationary stochastic process. Generalized Gaussian Density (GGD) parameters, extracted from 2D wavelet transform subbands, are used as the spatial features. Wavelet transform efficiently resembles the Human Visual System (HVS) characteristics and captures more suitable features, as compared to color histogram features. The AR model describes each spatial feature vector as a linear combination of the previous vectors within a reasonable time interval. Shot boundaries are well detected based on the AR prediction errors, and then at least one keyframe is extracted from each shot. Experimental results confirm high accuracy of the proposed method compared to existing methods, such as [5]. Keywords- Video scene analysis; Autoregressive modeling; 2D wavelet marginal statistics; keyframe selection. I.
Keyframe Extraction using Local Visual Semantics in the form of a Region
"... This paper presents an approach for efficient keyframe extraction, using local semantics in form of a region thesaurus. More specifically, certain MPEG-7 color and texture features are locally extracted from keyframe regions. Then, using a hierarchical clustering approach a local region thesaurus is ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
This paper presents an approach for efficient keyframe extraction, using local semantics in form of a region thesaurus. More specifically, certain MPEG-7 color and texture features are locally extracted from keyframe regions. Then, using a hierarchical clustering approach a local region thesaurus is constructed to facilitate the description of each frame in terms of higher semantic features. The feature is consisted by the most common region types that are encountered within the video shot, along with their synonyms. These region types carry semantic information. Each keyframe is represented by a vector consisting of the degrees of confidence of the existence of all region types within this shot. Using this keyframe representation, the most representative keyframe is then selected for each shot. Where a single keyframe is not adequate, using the same algorithm and exploiting the coverage of the visual thesaurus, more keyframes are extracted. 1