Results 1 - 10
of
109
Automatic Musical Genre Classification Of Audio Signals
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2002
"... ... describe music. They are commonly used to structure the increasing amounts of music available in digital form on the Web and are important for music information retrieval. Genre categorization for audio has traditionally been performed manually. A particular musical genre is characterized by sta ..."
Abstract
-
Cited by 829 (35 self)
- Add to MetaCart
(Show Context)
... describe music. They are commonly used to structure the increasing amounts of music available in digital form on the Web and are important for music information retrieval. Genre categorization for audio has traditionally been performed manually. A particular musical genre is characterized by statistical properties related to the instrumentation, rhythmic structure and form of its members. In this work, algorithms for the automatic genre categorization of audio signals are described. More specifically, we propose a set of features for representing texture and instrumentation. In addition a novel set of features for representing rhythmic structure and strength is proposed. The performance of those feature sets has been evaluated by training statistical pattern recognition classifiers using real world audio collections. Based on the automatic hierarchical genre classification two graphical user interfaces for browsing and interacting with large audio collections have been developed.
A comparative study on content-based music genre classification
- in Proc. SIGIR, 2003
"... Content-based music genre classification is a fundamental component of music information retrieval systems and has been gaining importance and enjoying a growing amount of attention with the emergence of digital music on the Internet. Currently little work has been done on automatic music genre clas ..."
Abstract
-
Cited by 117 (17 self)
- Add to MetaCart
(Show Context)
Content-based music genre classification is a fundamental component of music information retrieval systems and has been gaining importance and enjoying a growing amount of attention with the emergence of digital music on the Internet. Currently little work has been done on automatic music genre classification, and in addition, the reported classification accuracies are relatively low. This paper proposes a new feature extraction method for music genre classification, DWCHs 1. DWCHs capture the local and global information of music signals simultaneously by computing histograms on their Daubechies wavelet coefficients. Effectiveness of this new feature and of previously studied features are compared using various machine learning classification algorithms, including Support Vector Machines and Linear Discriminant Analysis. It is demonstrated that the use of DWCHs significantly improves the accuracy of music genre classification.
Environmental sound recognition with time-frequency audio
"... Abstract—The paper considers the task of recognizing envi-ronmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been pro-posed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the au ..."
Abstract
-
Cited by 55 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The paper considers the task of recognizing envi-ronmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been pro-posed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spec-trum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to char-acterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characteriza-tion and propose to use the matching pursuit (MP) algorithm to obtain effective time–frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effec-tiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners. Index Terms—Audio classification, auditory scene recognition, data representation, feature extraction, feature selection, matching pursuit, Mel-frequency cepstral coefficient (MFCC). I.
Authentication Gets Personal with Biometrics
, 2004
"... this article, we outline the state-of-the-art of several popular biometric modalities and technologies and provide specific applications where biometric recognition may be beneficially incorporated. In addition, we discuss integration strategies of biometric authentication technologies into DRM syst ..."
Abstract
-
Cited by 48 (5 self)
- Add to MetaCart
this article, we outline the state-of-the-art of several popular biometric modalities and technologies and provide specific applications where biometric recognition may be beneficially incorporated. In addition, we discuss integration strategies of biometric authentication technologies into DRM systems so that the whole process meets the needs and requirements of consumers, content providers, and payment brokers, securing delivery channels and contents
Speakers role recognition in multiparty audio recordings using Social Network Analysis and duration distribution modeling
- IEEE Transactions on Multimedia
, 2007
"... Abstract — This paper presents two approaches for speaker role recognition in multiparty audio recordings. The experiments are performed over a corpus of 96 radio bulletins corresponding to roughly 19 hours of material. Each recording involves, on average, eleven speakers playing one among six roles ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
(Show Context)
Abstract — This paper presents two approaches for speaker role recognition in multiparty audio recordings. The experiments are performed over a corpus of 96 radio bulletins corresponding to roughly 19 hours of material. Each recording involves, on average, eleven speakers playing one among six roles belonging to a predefined set. Both proposed approaches start by segmenting automatically the recordings into single speaker segments, but perform role recognition using different techniques. The first approach is based on Social Network Analysis, the second relies on the intervention duration distribution across different speakers. The two approaches are used separately and combined and the results show that around 85 percent of the recording time can be labeled correctly in terms of role.
Audio-visual event recognition in surveillance video sequences. IEEETransactions on Multimedia
, 2006
"... Abstract—In the context of the automated surveillance field, automatic scene analysis and understanding systems typically consider only visual information, whereas other modalities, such as audio, are typically disregarded. This paper presents a new method able to integrate audio and visual informat ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
(Show Context)
Abstract—In the context of the automated surveillance field, automatic scene analysis and understanding systems typically consider only visual information, whereas other modalities, such as audio, are typically disregarded. This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage and coupled with an audio BG/FG modelling scheme. These processes permit one to detect separate audio and visual patterns representing unusual unimodal events in a scene. The integration of audio and visual data is subsequently performed by exploiting the concept of synchrony between such events. The audio-visual (AV) association is carried out on-line and without need for training sequences, and is actually based on the computation of a characteristic feature called audio-video concurrence matrix, allowing one to detect and segment AV events, as well as to discriminate between them. Experimental tests involving classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by employing the single modalities and without considering the synchrony issue. Index Terms—Audio-visual analysis, automated surveillance, event classification and clustering, multimodal background modelling and foreground detection, multimodality, scene analysis. I.
Audio-based semantic concept classification for consumer video
- IEEE TASLP
, 2010
"... Abstract—This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representat ..."
Abstract
-
Cited by 26 (10 self)
- Add to MetaCart
(Show Context)
Abstract—This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback–Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips. Index Terms—Audio classification, consumer video classification, semantic concept detection, soundtrack analysis. I.
Information extraction from sound for medical telemonitoring
- IEEE Transactions on TITB
, 2006
"... Abstract—Today, the growth of the aging population in Europe needs an increasing number of health care professionals and facilities for aged persons. Medical telemonitoring at home (and, more generally, telemedicine) improves the patient’s comfort and reduces hospitalization costs. Using sound surve ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
(Show Context)
Abstract—Today, the growth of the aging population in Europe needs an increasing number of health care professionals and facilities for aged persons. Medical telemonitoring at home (and, more generally, telemedicine) improves the patient’s comfort and reduces hospitalization costs. Using sound surveillance as an alternative solution to video telemonitoring, this paper deals with the detection and classification of alarming sounds in a noisy environment. The proposed sound analysis system can detect distress or everyday sounds everywhere in the monitored apartment, and is connected to classical medical telemonitoring sensors through a data fusion process. The sound analysis system is divided in two stages: sound detection and classification. The first analysis stage (sound detection) must extract significant sounds from a continuous signal flow. A new detection algorithm based on discrete wavelet transform is proposed in this paper, which leads to accurate results when applied to nonstationary signals (such as impulsive sounds). The algorithm presented in this paper was evaluated in a noisy environment and is favorably compared to the state of the art algorithms in the field. The second stage of the system is sound classification, which uses a statistical approach to identify unknown sounds. A statistical study was done to find out the most discriminant acoustical parameters in the input of the classification module. New wavelet based parameters, better adapted to noise, are proposed in this paper. The telemonitoring system validation is presented through various real and simulated test sets. The global sound based system leads to a 3 % missed alarm rate and could be fused with other medical sensors to improve performance. Index Terms—Gaussian mixture model (GMM), medical telemonitoring, sound classification, sound detection, wavelet transform. I.
ENVIRONMENTAL SOUND RECOGNITION USING MP-BASED FEATURES
"... Defining suitable features for environmental sounds is an important problem in an automatic acoustic scene recognition system. As with most pattern recognition problems, extracting the right feature set is the key to effective performance. A variety of features have been proposed for audio recogniti ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
(Show Context)
Defining suitable features for environmental sounds is an important problem in an automatic acoustic scene recognition system. As with most pattern recognition problems, extracting the right feature set is the key to effective performance. A variety of features have been proposed for audio recognition, but the vast majority of the past work utilizes features that are well-known for structured data, such as speech and music, and assumes this association will transfer naturally well to unstructured sounds. In this paper, we propose a novel method based on matching pursuit (MP) to analyze environment sounds for their feature extraction. The proposed MP-based method utilizes a dictionary from which to select features, resulting in a representation that is flexible, yet intuitive and physically interpretable. We will show that these features are less sensitive to noise and are capable of effectively representing sounds that originate from different sources and different frequency ranges. The MPbased feature can be used to supplement another well-known audio feature, i.e. MFCC, to yield higher recognition accuracy for environmental sounds. Index Terms — Environmental sounds, feature extraction, audio classification, auditory scene recognition, matching pursuit
Automatic Classification of Audio Data
- IEEE Transactions on Systems, Man, and Cybernetics
, 2004
"... In this paper a novel content--based musical genre classification approach that uses combination of classifiers is proposed. First, musical surface features and beat-- related features are extracted from different segments of digital music in MP3 format. Three 15--dimensional feature vectors are ext ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
In this paper a novel content--based musical genre classification approach that uses combination of classifiers is proposed. First, musical surface features and beat-- related features are extracted from different segments of digital music in MP3 format. Three 15--dimensional feature vectors are extracted from three different parts of a music clip and three different classifiers are trained with such feature vectors. At the classification mode, the outputs provided by the individual classifiers are combined using a majority vote rule. Experimental results show that the proposed approach that combines the output of the classifiers achieves higher correct musical genre classification rate than using single feature vectors and single classifiers.