Results 1 -
5 of
5
Multimedia Content Analysis Using Both Audio and Visual Cues
, 2000
"... : Including all the scenes/shots that contain special events may generate too long an abstract. Also, simply staggering them together may not be visually or aurally appealing. In the MoCA project, it was determined that only 50% of the abstract should contain special events. The remaining part shoul ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
: Including all the scenes/shots that contain special events may generate too long an abstract. Also, simply staggering them together may not be visually or aurally appealing. In the MoCA project, it was determined that only 50% of the abstract should contain special events. The remaining part should be left for filler clips. The special event clips to be included are chosen uniformly and randomly from different types of events. The selection of a short clip from a scene is subject to some additional criteria, such as the amount of action and the similarity to the overall color composition of the movie. Closeness to the desired AV characteristics of certain scene types are also considered. The filler clips are chosen so that they do not overlap with the content of chosen special event clips, to ensure a good coverage of all parts of a movie. MPEG-7 Standard for Multimedia Content Description Interface MPEG-7 is an on-going standardization effort for content description of AV documen...
Audio-Visual Integration In Multimodal Communication
- Proc. IEEE
, 1998
"... : In this paper, we review recent research that examines audio-visual integration in multimodal communication. The topics include bimodality in human speech, human and automated lip-reading, facial animation, lip synchronization, joint audio-video coding, and bimodal speaker verification. We also st ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
: In this paper, we review recent research that examines audio-visual integration in multimodal communication. The topics include bimodality in human speech, human and automated lip-reading, facial animation, lip synchronization, joint audio-video coding, and bimodal speaker verification. We also study the enabling technologies for these research topics, including automatic facial feature tracking and audio-to-visual mapping. Recent progress in audio-visual research shows that joint processing of audio and video provides advantages that are not available when the audio and video are processed independently. Keywords: Multimedia communication, Speech processing, Speech communication, Video signal processing, Image analysis 1. Introduction Multimedia is more than simply the combination of various forms of data: text, speech, audio, music, images, graphics, and video. When we discuss multimedia signal processing, it is the integration and interaction among these different media types t...
Survey of Compressed-Domain Features used in Audio-Visual Indexing and Analysis
"... In this paper, we attempt to provide a comprehensive and high-level review of audiovisual features that can be extracted from the standard compressed domains, such as MPEG-1 and MPEG-2. The paper is motivated by the myriad of active research works in extraction and application of compressed-domain f ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
In this paper, we attempt to provide a comprehensive and high-level review of audiovisual features that can be extracted from the standard compressed domains, such as MPEG-1 and MPEG-2. The paper is motivated by the myriad of active research works in extraction and application of compressed-domain features in various fields, such as indexing, filtering, and manipulation. Compressed domain approaches avoid expensive computation and memory requirements involved in decoding and/or re-encoding. Selected features are categorized into four groups -- spatial visual (e.g., color, texture, edge, shape), motion (e.g., motion field, trajectory), audio (e.g., energy, spectral features, pitch), and coding (e.g., bit rate, frame/block type). For each feature, we briefly discuss the extraction methods, computational complexity, potential effectiveness in applications, and possible limitations caused by compress-domain approaches. Finally, we briefly describe audio-visual features specified in the MPEG-7 standard and discuss the possibility of extracting them in the compressed domain.
Automatic Music Style Classification: Towards the Detection of Perceptually Similar Music Proposal for Ph.D. Dissertation
, 2001
"... The fundamental problem investigated is to develop a model space that automatically classifies musics into categories by the detection of perceptually similar patterns in music. The concept is based on a modular system consisting of three main stages. The first stage involves the preprocessing of th ..."
Abstract
- Add to MetaCart
The fundamental problem investigated is to develop a model space that automatically classifies musics into categories by the detection of perceptually similar patterns in music. The concept is based on a modular system consisting of three main stages. The first stage involves the preprocessing of the raw audio data with passing over by a number of independent feature extractors. Each feature extractor reduces the information content in the raw music data to a vector in a small number of dimensions. In the second stage the set of feature vectors are classified(indexed) into certain clusters by pattern-matching algorithm. And in the final stage, the query engine detects/retrieves similar trajectories of an example in the database by means of a similarity-matching algorithm. In addition, several challenging issues will be addressed in this dissertation. Firstly, an exploration of ways to measure subjective perceptual qualities of musical signals. How do musical dimensions (e.g., pitch, rhythm, melody, tempo, harmony, etc.) perceptually integrate and interact in order to allow us to perceive a comparative sense of similarity between two sounds sharing particular attributes and structures. Secondly, the development of a good similarity measure for perceptually salient index attributes. Also, if the model space can be evaluated/improved with a human-classified ”ground truth ” dataset, it would result in a good psychological model of human perception and categorization of music. 2 1
MUS 319: Research Seminar on Computational Models of Sound Perception Units: 3 A Short Study on Music Classification System
, 2000
"... The goal of this study is to improve my understanding of audio information retrieval, particularly of musical style classification. I have also included an experiment with my own music classification system which is still incomplete. Even though it is still at a rudimentary level (as well as a low-l ..."
Abstract
- Add to MetaCart
The goal of this study is to improve my understanding of audio information retrieval, particularly of musical style classification. I have also included an experiment with my own music classification system which is still incomplete. Even though it is still at a rudimentary level (as well as a low-level implementation), the process of building the system helped me in understanding the structural process of audio information retrieval. I have also become aware of the areas in which I need to increase my knowledge in order to pursue this field of research. This paper is based on the various articles that I have studied in this field. I intend to describe the procedures and terms related to audio information retrieval in a very basic and detailed manner so that not only engineers, but also musicians would be able to understand the basics of this field of research. This process itself is also meaningful to me because it helps me to clarify and answer specific questions about this subject. Explanations of certain parts are either missing or incomplete, but a full account will be provided in the near future along with my research.

