8 citations found. Retrieving documents...
J. Nam, A. Enis Cetin, and A.H. Tewfik, "Speaker identification and video analysis for hierarchical video shot classification," in IEEE International Conference on Image Processing, Washington DC, USA, 1997, Vol. 2.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Multimodal Video Indexing: A Review of the State-of-the-art - Snoek, Worring (2001)   (10 citations)  (Correct)

....cepstral coe#cients (MFCC) and linear prediction coe#cients (LPC) which achieve a much better classification accuracy. When a segment is labeled as speech, speaker recognition can be used to identify a person based on his or her speech utterance. Di#erent techniques are proposed, e.g. [54, 61]. A generic speaker identification system consisting of three modules is presented in [61] In the first module feature extraction is performed using a set of 14 MFCC from each window. In the second module those features are used to classify each moving window using a nearest neighbor classifier. ....

J. Nam, A. Enis Cetin, and A.H. Tewfik. Speaker identification and video analysis for hierarchical video shot classification. In IEEE International Conference on Image Processing, volume 2, Washington DC, USA, 1997.


A State-of-the-art Review on Multimodal Video Indexing - Snoek, Worring (2002)   (Correct)

....cepstral coefficients (MFCC) and linear prediction coefficients (LPC) which achieve a much better classification accuracy. When a segment is labelled as speech, speaker recognition can be used to identify a person based on his or her speech utterance. Different techniques are proposed, e.g. [25, 30]. A generic speaker identification system consisting of three modules is presented in [30] The authors report encouraging performance using speech segments of a feature film. A strong textual cue for the appearance of people in a video document are words which are names. Such an approach is ....

J. Nam, A. E. Cetin, and A. Tewfik. Speaker identification and video analysis for hierarchical video shot classification. In IEEE International Conference on Image Processing, volume 2, Washington DC, USA, 1997.


Web-based Video Database Management: Issues, Mechanisms.. - Li, Chan, Wu, Zhuang   (Correct)

....and attribute values describing its content. In [CJ91] both the video objects and their spatial relationships are manually annotated in order to support complex spatial queries. In order to index video automatically, not only the visual contents but also the audio content should be used (see [NK98, NC97,WZ01]) Recent work also proposed the use of closed caption. An obvious trend in video indexing and retrieval is to use all the different features like audio, closed captions and so on instead of simply visual features. Another trend is towards web enabled based video management system. Over the last ....

J. Nam, A.E. Cetin, A.H. Tewfik, "Speaker Identification and Video Analysis for Hierarchical Video Shot Classification" ICIP97, vol.2, pp.550-555.


A Hybrid Approach to Video Retrieval in a Generic Video.. - Chan, Li, Wu, Zhuang (2001)   (Correct)

....attributes and attribute values describing its content. In [CJ91] both the video objects and their spatial relationships are manually annotated in order to support complex spatial queries. In order to index video automatically, not only the visual contents but also the audio content was used in [NK98,NC97,WZ01]. Recent work also proposed the use of closed caption and video. An obvious trend in video indexing and retrieval is to use all the different features like audio, closed captions and so on instead of simply visual features. 3. Hybrid Approach to Video Retrieval With the various video ....

J. Nam, A.E.Cetin, A.H.Tewfik, " Speaker Identification and Video Analysis for Hierarchical Video Shot Classification" ICIP 97, Vol.2, pp550-555.


A Hidden Markov Model Framework for Video Segmentation Using .. - John Boreczky And (1998)   (21 citations)  (Correct)

....in order to detect gradual transitions. Audio and motion features have been used to improve shot boundary detection. Saraceno et al. 8] classify audio according to silence, speech, music, or noise and use this information to verify shot boundaries hypothesized by image based features. In [5], speaker identification is used to cluster shots. Phillips and Wolf [6] use motion features alone or with histogram differences to improve boundary detection. Shahraray [9] combines motion features with pixel differences. In this work, we combine information from features that are based on image ....

....pair of adjacent frames at the frame rate of 30 times per second. 2. 2 Audio Features In contrast to other approaches to using audio features to aid in video segmentation, we do not attempt to categorize audio into classes such as speech, silence, music, and noise [8] or to identify speakers [5]. Rather, we take the approach used with the video feature and compute an audio distance measure. This distance is computed between two adjacent intervals of audio X and Y (see Figure 1) In order for this distance to accurately reflect differences in the type of audio (speech, silence, etc. it ....

Nam, J., Cetin, E., and Tewfik, A. "Speaker Identification and Video Analysis for Hierarchical Video Shot Classification", Proc. Int. Conf. Image Processing, Santa Barbara, CA, October, 1997.


Probabilistic Multimedia Objects (Multijects): A.. - Naphade.. (1998)   (11 citations)  (Correct)

....semantics. This paper introduces early ideas, which fall in the third category of domain independent video indexing by providing a framework which supports indexing and thereby retrieval at a semantic level. An additional issue is the use of multiple media for indexing and retrieval. Recent work [7] uses video as well as audio in an attempt to derive the structure of videos. Recent work also proposes the use of closed caption and video. In this paper we present a novel framework for fusing multiple media i.e. video and audio for indexing. While each of the above three approaches are ....

J. Nam, A.E. Cetin, A.H. Tewfik "Speaker Identification and Video Analysis for Hierarchical Video Shot Classification" ICIP 97 Vol. 2, pp. 550-555,


Multimodal Video Indexing: A Review of the State-of-the-art - Snoek, Worring (2005)   (10 citations)  (Correct)

No context found.

J. Nam, A. Enis Cetin, and A.H. Tewfik, "Speaker identification and video analysis for hierarchical video shot classification," in IEEE International Conference on Image Processing, Washington DC, USA, 1997, Vol. 2.


Stochastic Modeling of Soundtrack for Efficient Segmentation.. - Naphade, Huang (2000)   (1 citation)  (Correct)

No context found.

J. Nam, A. Cetin, and A. Tewfik, "Speaker identification and video analysis for hierarchical video shot classification, " in Proceedings of the IEEE Intl. Conference on Image Processing, vol. 2, pp. 550--555, (Santa Barbara, CA), Oct. 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC