Results 1 - 10
of
82
Semantic annotation and retrieval of music and sound effects
- IEEE TASLP
, 2008
"... Abstract—We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval ..."
Abstract
-
Cited by 40 (16 self)
- Add to MetaCart
Abstract—We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our “query-by-text ” system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects. Index Terms—Audio annotation and retrieval, music information retrieval, semantic music analysis.
Content-Based Music Information Retrieval: Current Directions and Future Challenges
, 2008
"... ..."
Personal communication with A. Agogino
- IEEE Trans. Audio, Speech, and Language Proc
, 2007
"... Abstract — Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody – roughly, the part a lis ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Abstract — Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody – roughly, the part a listener might whistle or hum – as one such reduced descriptor of music audio, and consider how to define it, and what use it might be. We go on to describe the results of full-scale evaluations of melody transcription systems conducted in 2004 and 2005, including an overview of the systems submitted, details of how the evaluations were conducted, and a discussion of the results. For our definition of melody, current systems can achieve around 70 % correct transcription at the frame level, including distinguishing between the presence or absence of the melody. Melodies transcribed at this level are readily recognizable, and show promise for practical applications. I.
FIVE APPROACHES TO COLLECTING TAGS FOR MUSIC
"... We compare five approaches to collecting tags for music: conducting a survey, harvesting social tags, deploying annotation games, mining web documents, and autotagging audio content. The comparison includes a discussion of both scalability (financial cost, human involvement, and computational resour ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
We compare five approaches to collecting tags for music: conducting a survey, harvesting social tags, deploying annotation games, mining web documents, and autotagging audio content. The comparison includes a discussion of both scalability (financial cost, human involvement, and computational resources) and quality (the cold start problem & popularity bias, strong vs. weak labeling, vocabulary structure & size, and annotation accuracy). We then describe one state-ofthe-art system for each approach. The performance of each system is evaluated using a tag-based music information retrieval task. Using this task, we are able to quantify the effect of popularity bias on each approach by making use of a subset of more popular (short-head) songs and a set of less popular (long-tail) songs. Lastly, we propose a simple hybrid context-content system that combines our individual approaches and produces superior retrieval results. 1
Downie: “Exploring Mood Metadata: Relationships with Genre, Artist and Usage
- Metadata,” Proceedings of the International Conference on Music Information Retrieval
, 2007
"... There is a growing interest in developing and then evaluating Music Information Retrieval (MIR) systems that can provide automated access to the mood dimension of music. Mood as a music access feature, however, is not well understood in that the terms used to describe it are not standardized and the ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
There is a growing interest in developing and then evaluating Music Information Retrieval (MIR) systems that can provide automated access to the mood dimension of music. Mood as a music access feature, however, is not well understood in that the terms used to describe it are not standardized and their application can be highly idiosyncratic. To better understand how we might develop methods for comprehensively developing and formally evaluating useful automated mood access techniques, we explore the relationships that mood has with genre, artist and usage metadata. Statistical analyses of term interactions across three metadata collections (AllMusicGuide.com, epinions.com and Last.fm) reveal important consistencies within the genre-mood and artist-mood relationships. These consistencies lead us to recommend a cluster-based approach that overcomes specific term-related problems by creating a relatively small set of data-derived “mood spaces ” that could form the ground-truth for a proposed MIREX “Automated Mood Classification ” task.
Classification-based melody transcription
- Machine Learning Journal
, 2006
"... The melody of a musical piece – informally, the part you would hum along with – is a useful and compact summary of a full audio recording. The extraction of melodic content has practical applications ranging from content-based audio retrieval to the analysis of musical structure. Whereas previous sy ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The melody of a musical piece – informally, the part you would hum along with – is a useful and compact summary of a full audio recording. The extraction of melodic content has practical applications ranging from content-based audio retrieval to the analysis of musical structure. Whereas previous systems generate transcriptions based on a model of the harmonic (or periodic) structure of musical pitches, we present a classification-based system for performing automatic melody transcription that makes no assumptions beyond what is learned from its training data. We evaluate the success of our algorithm by predicting the melody of the ADC 2004 Melody Competition evaluation set, and we show that a simple framelevel note classifier, temporally smoothed by post processing with a hidden Markov model, produces results comparable to state of the art model-based transcription systems. 1
Multimodal music mood classification using audio and lyrics
- In Proceedings of the 7th International Conference on Machine Learning and Applications ( ICMLA' 08). December 2008
, 2008
"... In this paper we present a study on music mood classification using audio and lyrics information. The mood of a song is expressed by means of musical features but a relevant part also seems to be conveyed by the lyrics. We evaluate each factor independently and explore the possibility to combine bot ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper we present a study on music mood classification using audio and lyrics information. The mood of a song is expressed by means of musical features but a relevant part also seems to be conveyed by the lyrics. We evaluate each factor independently and explore the possibility to combine both, using Natural Language Processing and Music Information Retrieval techniques. We show that standard distance-based methods and Latent Semantic Analysis are able to classify the lyrics significantly better than random, but the performance is still quite inferior to that of audio-based techniques. We then introduce a method based on differences between language models that gives performances closer to audio-based classifiers. Moreover, integrating this in a multimodal system (audio+text) allows an improvement in the overall performance. We demonstrate that lyrics and audio information are complementary, and can be combined to improve a classification system. 1.
A large publicly accessible prototype audio database for music research. ISMIR
- in Proc. of the International Conference on Music Information Retrieval (ISMIR
, 2006
"... This paper introduces Codaich, a large and diverse publicly accessible database of musical recordings for use in music information retrieval (MIR) research. The issues that must be dealt with when constructing such a database are discussed, as are ways of addressing these problems. It is suggested t ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
This paper introduces Codaich, a large and diverse publicly accessible database of musical recordings for use in music information retrieval (MIR) research. The issues that must be dealt with when constructing such a database are discussed, as are ways of addressing these problems. It is suggested that copyright restrictions may be overcome by allowing users to make customized feature extraction queries rather than allowing direct access to recordings themselves. The jMusicMetaManager software is introduced as a tool for improving metadata associated with recordings by automatically detecting inconsistencies and redundancies.
Using artist similarity to propagate semantic information. ISMIR
, 2009
"... Tags are useful text-based labels that encode semantic information about music (instrumentation, genres, emotions, geographic origins). While there are a number of ways to collect and generate tags, there is generally a data sparsity problem in which very few songs and artists have been accurately a ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Tags are useful text-based labels that encode semantic information about music (instrumentation, genres, emotions, geographic origins). While there are a number of ways to collect and generate tags, there is generally a data sparsity problem in which very few songs and artists have been accurately annotated with a sufficiently large set of relevant tags. We explore the idea of tag propagation to help alleviate the data sparsity problem. Tag propagation, originally proposed by Sordo et al., involves annotating a novel artist with tags that have been frequently associated with other similar artists. In this paper, we explore four approaches for computing artists similarity based on different sources of music information (user preference data, social tags, web documents, and audio content). We compare these approaches in terms of their ability to accurately propagate three different types of tags (genres, acoustic descriptors, social tags). We find that the approach based on collaborative filtering performs best. This is somewhat surprising considering that it is the only approach that is not explicitly based on notions of semantic similarity. We also find that tag propagation based on content-based music analysis results in relatively poor performance. 1.
Predicting genre labels for artists using freedb
- In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR
, 2006
"... This paper explores the value of FreeDB as a source of genre and music similarity information. FreeDB is a public, dynamic, uncurated database for identifying and labeling CDs with album, song, artist and genre information. One quality of FreeDB is that there is high variance in, e.g., the genre lab ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This paper explores the value of FreeDB as a source of genre and music similarity information. FreeDB is a public, dynamic, uncurated database for identifying and labeling CDs with album, song, artist and genre information. One quality of FreeDB is that there is high variance in, e.g., the genre labels assigned to a particular disc. We investigate here the ability to use these genre labels to predict a more constrained set of “canonical ” genres as decided by the curated but private database AllMusic (i.e. multi-class learning). This work is relevant for study in music similarity: we present an automatic, data-driven method for embedding artists in a continuous space that corresponds to genre similarity judgments over a large population of music fans. At the same time, we observe that FreeDB is a valuable resource to researchers developing music classification algorithms; it serves as a reference for what music is popular over a large population, and provides relevent targets for supervised learning algorithms.

