Results 1 - 10
of
18
Content-Based Video Description for Automatic Video Genre Categorization
"... Abstract. In this paper, we propose an audio-visual approach to video genre categorization. Audio information is extracted at block-level, which has the advantage of capturing local temporal information. At temporal structural level, we asses action contents with respect to human perception. Further ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we propose an audio-visual approach to video genre categorization. Audio information is extracted at block-level, which has the advantage of capturing local temporal information. At temporal structural level, we asses action contents with respect to human perception. Further, color perception is quantified with statistics of color distribution, elementary hues, color properties and relationship of color. The last category of descriptors determines statistics of contour geometry. An extensive evaluation of this multi-modal approach based on on more than 91 hours of video footage is presented. We obtain average precision and recall ratios within [87 % − 100%] and [77 % − 100%], respectively, while average correct classification is up to 97%. Additionally, movies displayed according to feature-based coordinates in a virtual 3D browsing environment tend to regroup with respect to genre, which has potential application with real content-based browsing systems.
CoPhIR Image Collection under the Microscope
- In Proc. SISAP
, 2009
"... Abstract—The Content-based Photo Image Retrieval (CoPhIR) dataset is the largest available database of digital images with corresponding visual descriptors. It contains five MPEG-7 global descriptors extracted from more than 106 million images from Flickr photo-sharing system. In this paper, we anal ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The Content-based Photo Image Retrieval (CoPhIR) dataset is the largest available database of digital images with corresponding visual descriptors. It contains five MPEG-7 global descriptors extracted from more than 106 million images from Flickr photo-sharing system. In this paper, we analyze this dataset focusing on 1) efficiency of similarity-based indexing and searching and on 2) expressiveness of combination of the descriptors with respect to subjective perception of visual similarity. We treat the descriptors as metric spaces and then combine them into a multi-metric space. We analyze distance distributions of individual descriptors, measure intrinsic dimensionality of these datasets and statistically evaluate correlation between these descriptors. Further, we use two methods to assess subjective accuracy and satisfaction of similarity retrieval based on a combination of descriptors that is recommended for CoPhIR, and we compare these results on databases of 10 and 100 million CoPhIR images. Finally, we suggest, explore and evaluate two approaches to improve the accuracy: 1) applying logarithms in order to weaken influence of a single descriptor contribution if it deviates from the rest, and 2) the possibility of categorization of the dataset and identifying visual characteristics important for individual categories. Keywords-metric space; MPEG-7; visual descriptors; CoPhIR dataset; dataset analysis I.
A Novel Structural-Description Approach For Image Retrieval
"... We tested our image classification methodology in the photo-annotation task of the ImageCLEF competition [Nowak, 2010] using a visual-only approach performing automated labeling. Our labeling process consisted of three phases: 1) feature extraction using color histogramming and using a novel method ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We tested our image classification methodology in the photo-annotation task of the ImageCLEF competition [Nowak, 2010] using a visual-only approach performing automated labeling. Our labeling process consisted of three phases: 1) feature extraction using color histogramming and using a novel method of structural description, that was exploited in a statistical manner only; 2) classification using Linear Discriminant (LD) or Average-Retrieval Rank (ARR) methods that provided the confidence (scalar) values, which were then thresholded to obtain the binary values; 3) eliminating labels (setting binary values to 0) on the testing set thereby exploiting the calculated joint-probabilities for pairs of concepts from the training set. The results show that our present system performs better on ’whole-image’ labels than on object labels.
N.: Fisher kernel based relevance feedback for multimodal video retrieval
"... This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval. The Fisher Kernel representa-tion describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval. The Fisher Kernel representa-tion describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribu-tion that models the feature distribution. In the context of relevance feedback, instead of learning the generative prob-ability distribution over all features of the data, we learn it only over the top retrieved results. Hence during relevance feedback we create a new Fisher Kernel representation based on the most relevant examples. In addition, we propose to use the Fisher Kernel to capture temporal information by cutting up a video in smaller segments, extract a feature vector from each segment, and represent the resulting fea-ture set using the Fisher Kernel representation. We evaluate our method on the MediaEval 2012 Video Genre Tagging Task, a large dataset, which contains 26 categories in 15.000 videos totalling up to 2.000 hours of footage. Results show that our method significantly improves results over existing state-of-the-art relevance feedback techniques. Furthermore, we show significant improvements by using the Fisher Ker-nel to capture temporal information, and we demonstrate that Fisher kernels are well suited for this task.
Background Invariant Static Hand Gesture Recognition based on Hidden Markov Models
"... Abstract — This paper addresses the problem of Static Hand Gesture Recognition (SHGR) and proposes a fast yet simple solution based on Discrete Hidden Markov Models (DHMMs) that use features extracted from the hand contours. In addition to previous work, the use of depth information ensures robustne ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract — This paper addresses the problem of Static Hand Gesture Recognition (SHGR) and proposes a fast yet simple solution based on Discrete Hidden Markov Models (DHMMs) that use features extracted from the hand contours. In addition to previous work, the use of depth information ensures robustness to the overall system, making it background invariant. Experiments carried on a challenging noisy dataset reveal the superior discriminating as well as generalizing abilities of statistical models, when compared to state-of-the-art methods. I.
Video Genre Categorization and Representation using Audio-Visual Information
, 2012
"... We propose an audio-visual approach to video genre classification using content descriptors that exploit audio, color, temporal, and contour information. Audio information is extracted at block-level, which has the advantage of capturing local temporal information. At the temporal structure level, w ..."
Abstract
- Add to MetaCart
(Show Context)
We propose an audio-visual approach to video genre classification using content descriptors that exploit audio, color, temporal, and contour information. Audio information is extracted at block-level, which has the advantage of capturing local temporal information. At the temporal structure level, we consider action content in relation to human perception. Color perception is quantified using statistics of color distribution, elementary hues, color properties, and relationships between colors. Further, we compute statistics of contour geometry and relationships. The main contribution of our work lies in harnessing the descriptive power of the combination of these descriptors in genre classification. Validation was carried out on over 91 hours of video footage encompassing 7 common video genres, yielding average precision and recall ratios of 87%−100 % and 77%−100%, respectively, and an overall average correct classification of up to 97%. Also, experimental comparison as part of 1 the MediaEval 2011 benchmarking campaign demonstrated the superiority of the proposed audio-visual descriptors over other existing approaches. Finally, we discuss a 3D video browsing platform that displays movies using feature-based coordinates and thus regroups them according to genre.
Approaching Shape Matching with the Local/Global Space Affiliation and mailing address:
"... A shape matching approach is introduced, which is based on a novel curve description, namely a (lo-cal/global) amplitude space. Two matching princi-ples are tested with this description. First, a point-based (correspondence) matching is carried out with the entire amplitude space, for which the MPG7 ..."
Abstract
- Add to MetaCart
(Show Context)
A shape matching approach is introduced, which is based on a novel curve description, namely a (lo-cal/global) amplitude space. Two matching princi-ples are tested with this description. First, a point-based (correspondence) matching is carried out with the entire amplitude space, for which the MPG7 re-trieval score is 78.74%. Second, a segment-based matching with abstracted boundary segments is in-troduced, with the goal to move away from the typical constraints of point-based matching. Those segments are obtained by analyzing the local/global space. The retrieval score for this type of matching is 70.48 % and although it is lower than the former, it can be applied to gray-scale images. When the two matching metrics are combined, a retrieval score of 84.80 % is obtained, which is near top-performing, reported methods. Us-ing an optimization method for the distance matrix, the score can be driven up to 95.01 % (2nd best re-ported so far). The particular advantage of the pre-sented approach is that it allows part interpretation (irrespective of the matching type).
Grouping and Description of Partitioned Segments Affiliation and mailing address:
"... A methodology for the detection and geometric char-acterization of groups of segments is introduced. One set of groups focuses on a precise geometric charac-terization of the alignment of two and four segments; and on a geometric characterization of shapes up to five corners, whose outlines are obta ..."
Abstract
- Add to MetaCart
(Show Context)
A methodology for the detection and geometric char-acterization of groups of segments is introduced. One set of groups focuses on a precise geometric charac-terization of the alignment of two and four segments; and on a geometric characterization of shapes up to five corners, whose outlines are obtained from iso-contours. Another set of groups focuses on a loose geometric characterization of three or more segments. The grouping processes occur relatively fast as only keypoints are used, such as the segments end- and midpoints. The grouping output is tested in an im-age classification task, evaluated on three image col-lections (Urban&Natural, Landuse and Caltech 101), whereby a structural as well as a statistical form of representation is tested. The classification accuracy is comparable to other approaches.
The Representative Capacity of Parameters Derived from the Radial Signature
"... A method for the boundary representation of ’simple ’ shapes is presented. It is based on the radial signature and exploits its extrema information to arrive at a low-dimensional geometric description (ca. 10 dimensions). This short description can represent shapes well, which is demonstrated on the ..."
Abstract
- Add to MetaCart
(Show Context)
A method for the boundary representation of ’simple ’ shapes is presented. It is based on the radial signature and exploits its extrema information to arrive at a low-dimensional geometric description (ca. 10 dimensions). This short description can represent shapes well, which is demonstrated on the Corel and MPEG7 collection. Its key advantage is the short computation duration. If this description is extended by further radial-based parameters and combined with Fourier descriptors, it leads to almost the same retrieval performance as the best-performing signature approach, which also uses Fourier descriptors. In a classification task, the radial-based descriptors clearly outperform the Fourier descriptors.