Results 1 - 10
of
14
Pixels that sound
- In Proc. Computer Vision and Pattern Recognition
, 2005
"... People and animals fuse auditory and visual information to obtain robust perception. A particular benefit of such cross-modal analysis is the ability to localize visual events associated with sound sources. We aim to achieve this using computer-vision aided by a single microphone. Past efforts encou ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
People and animals fuse auditory and visual information to obtain robust perception. A particular benefit of such cross-modal analysis is the ability to localize visual events associated with sound sources. We aim to achieve this using computer-vision aided by a single microphone. Past efforts encountered problems stemming from the huge gap between the dimensions involved and the available data. This has led to solutions suffering from low spatio-temporal resolutions. We present a rigorous analysis of the fundamental problems associated with this task. Then, we present a stable and robust algorithm which overcomes past deficiencies. It grasps dynamic audio-visual events with high spatial resolution, and derives a unique solution. The algorithm effectively detects pixels that are associated with the sound, while filtering out other dynamic pixels. It is based on canonical correlation analysis (CCA), where we remove inherent ill-posedness by exploiting the typical spatial sparsity of audio-visual events. The algorithm is simple and efficient thanks to its reliance on linear programming and is free of user-defined parameters. To quantitatively assess the performance, we devise a localization criterion. The algorithm capabilities were demonstrated in experiments, where it overcame substantial visual distractions and audio noise. 1
You're The Conductor: A Realistic Interactive Conducting System for Children
- In Proceedings of the NIME 2004 Conference on New Interfaces for Musical Expression
, 2004
"... This paper describes the first system designed to allow children to conduct an audio and video recording of an orchestra. No prior music experience is required to control the orchestra, and the system uses an advanced algorithm to time stretch the audio in real-time at high quality and without alter ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
This paper describes the first system designed to allow children to conduct an audio and video recording of an orchestra. No prior music experience is required to control the orchestra, and the system uses an advanced algorithm to time stretch the audio in real-time at high quality and without altering the pitch. We will discuss the requirements and challenges of designing an interface to target our particular user group (children), followed by some system implementation details. An overview of the algorithm used for audio time stretching will also be presented. We are currently using this technology to study and compare professional and non-professional conducting behavior, and its implications when designing new interfaces for multimedia. You're the Conductor is currently a successful exhibit at the Children's Museum in Boston, USA.
Interacting with a Virtual Conductor
- In: 5th International Conference on Entertainment Computing, LNCS 4161
, 2006
"... Abstract. This paper presents a virtual embodied agent that can conduct musicians in a live performance. The virtual conductor conducts music specified by a MIDI file and uses input from a microphone to react to the tempo of the musicians. The current implementation of the virtual conductor can inte ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
Abstract. This paper presents a virtual embodied agent that can conduct musicians in a live performance. The virtual conductor conducts music specified by a MIDI file and uses input from a microphone to react to the tempo of the musicians. The current implementation of the virtual conductor can interact with musicians, leading and following them while they are playing music. Different time signatures and dynamic markings in music are supported. 1
Recognition, Analysis and Performance with Expressive Conducting Gestures
- In Proceedings of the International Computer Music Conference
, 2004
"... Although a number of conducting gesture analysis and following systems have been developed over the years, most of the projects either primarily concentrated on tracking tempo and amplitude indicating gestures while not taking expressive gestures into account, or implemented individual mapping techn ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Although a number of conducting gesture analysis and following systems have been developed over the years, most of the projects either primarily concentrated on tracking tempo and amplitude indicating gestures while not taking expressive gestures into account, or implemented individual mapping techniques for expressive gestures that varied from research to research. There is a clear need for a uniform process that could be applied toward analysis of both indicative and expressive gestures. The conducting gesture recognition system is implemented on the basis of Hidden Markov Model (HMM) process. An external HMM object is developed for Max/MSP software. Training and recognition procedures are applied toward both right hand beat- and amplitude- indicative gestures, and left hand expressive gestures. Continuous recognition of right-hand gestures is incorporated into a real-time gesture analysis and performance system in Max/MSP/Jitter environment. 1
Beat Estimation on the Beat
- In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
, 2003
"... This paper presents a novel method for the estimation of beat interval, and the exact location of the beats from audio files. As a first step, a feature extracted from the waveform is used to identify note onsets. The estimated note onsets are used as input to a beat induction algorithm, where the m ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents a novel method for the estimation of beat interval, and the exact location of the beats from audio files. As a first step, a feature extracted from the waveform is used to identify note onsets. The estimated note onsets are used as input to a beat induction algorithm, where the most probable beat intervals are found. The note onsets corresponding to the beat locations are then identified. Several enhancements are proposed in this work, including methods for identifying the optimum audio feature, a novel weighting system in the beat induction algorithm and a simple robust method for identifying the beat locations. The resulting system runs in real-time, and is shown to work well for a wide variety of contemporary and popular rhythmic music.
Real-Time Beat Estimation Using Feature Extraction
, 2003
"... This paper presents a novel method for the estimation of beat interval from audio files. As a first step, a feature extracted from the waveform is used to identify note onsets. The estimated note onsets are used as input to a beat induction algorithm, where the most probable beat interval is fou ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper presents a novel method for the estimation of beat interval from audio files. As a first step, a feature extracted from the waveform is used to identify note onsets. The estimated note onsets are used as input to a beat induction algorithm, where the most probable beat interval is found. Several enhancements over existing beat estimation systems are proposed in this work, including methods for identifying the optimum audio feature and a novel weighting system in the beat induction algorithm. The resulting system works in real-time, and is shown to work well for a wide variety of contemporary and popular rhythmic music. Several real-time music control systems have been made using the presented beat estimation method.
conga: a framework for adaptive conducting gesture analysis
- Proc. of the International Conference on New Interfaces for Musical Expression (NIME06
, 2006
"... Designing a conducting gesture analysis system for public spaces poses unique challenges. We present conga, a software framework that enables automatic recognition and interpretation of conducting gestures. conga is able to recognize multiple types of gestures with varying levels of difficulty for t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Designing a conducting gesture analysis system for public spaces poses unique challenges. We present conga, a software framework that enables automatic recognition and interpretation of conducting gestures. conga is able to recognize multiple types of gestures with varying levels of difficulty for the user to perform, from a standard four-beat pattern, to simplified up-down conducting movements, to no pattern at all. conga provides an extendable library of feature detectors linked together into a directed acyclic graph; these graphs represent the various conducting patterns as gesture profiles. At run-time, conga searches for the best profile to match a user’s gestures in real-time, and uses a beat prediction algorithm to provide results at the sub-beat level, in addition to output values such as tempo, gesture size, and the gesture’s geometric center. Unlike some previous approaches, conga does not need to be trained with sample data before use. Our preliminary user tests show that conga has a beat recognition rate of over 90%. conga is deployed as the gesture recognition system for Maestro!, an interactive conducting exhibit that opened in the
Tracking a Conductor's Baton
, 2003
"... A system to track an uninstrumented conductor's baton by computer vision techniques is developed. An initial seek mode locates a baton in an image frame without prior knowledge; a subsequent track mode uses temporal information to cope with motion blur and noise. The seek mode utilises the character ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A system to track an uninstrumented conductor's baton by computer vision techniques is developed. An initial seek mode locates a baton in an image frame without prior knowledge; a subsequent track mode uses temporal information to cope with motion blur and noise. The seek mode utilises the characteristic form of detected edges to guide a maximal intensity trace. The track mode enhances this technique by dovetailing with optical flow to give robust tracking, even with considerable motion blur and poor quality images.
THE “AIR WORM”: AN INTERFACE FOR REAL-TIME MANIPULATION OF EXPRESSIVE MUSIC PERFORMANCE
"... Expressive performance of traditional Western music is a complex phenomenon which is mastered by few, and yet appreciated by many. In this paper we explore various ways of interacting with expressive performances using methods that are accessible to non-expert music-lovers. A digital theremin is use ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Expressive performance of traditional Western music is a complex phenomenon which is mastered by few, and yet appreciated by many. In this paper we explore various ways of interacting with expressive performances using methods that are accessible to non-expert music-lovers. A digital theremin is used as an input device, and users can control the two most important expressive parameters, tempo and loudness, during playback of an audio or MIDI file. Several modes of operation are possible: the “Air Worm ” builds on previous work in performance visualisation, where the tempo is displayed on the horizontal axis and loudness on the vertical axis in a two-dimensional animation; the “Air Tapper ” uses a conducting metaphor where the beat is given by the minimum vertical point in a quasi-periodic trajectory; and the “Mouse-Worm ” allows users without a theremin to use a standard input device as controller. 1.
Cross-Modal Localization via Sparsity
"... Abstract—Cross-modal analysis is a natural progression beyond processing of single-source signals. Simultaneous processing of two sources can reveal information that is unavailable when handling the sources separately. Indeed, human and animal perception, computer vision, weather forecasting, and va ..."
Abstract
- Add to MetaCart
Abstract—Cross-modal analysis is a natural progression beyond processing of single-source signals. Simultaneous processing of two sources can reveal information that is unavailable when handling the sources separately. Indeed, human and animal perception, computer vision, weather forecasting, and various other scientific and technological fields can benefit from such a paradigm. A particular cross-modal problem is localization: out of the entire data array originating from one source, localize the components that best correlate with the other. For example, auditory and visual data sampled from a scene can be used to localize visual events associated with the sound track. In this paper we present a rigorous analysis of fundamental problems associated with the localization task. We then develop an approach that leads efficiently to a unique, high definition localization outcome. Our method is based on canonical correlation analysis (CCA), where inherent ill-posedness is removed by exploiting sparsity of cross-modal events. We apply our approach to localization of audio-visual events. The proposed algorithm grasps such dynamic audio-visual events with high spatial resolution. The algorithm effectively detects the pixels that are associated with sound, while filtering out other dynamic pixels, overcoming substantial visual distractions and audio noise. The algorithm is simple and efficient thanks to its reliance on linear programming, while being free of user-defined parameters. Index Terms—Computer vision, cross-sensor fusion, multimedia, multimodal analysis, multisensor fusion, overfitting, regularization, stochastic analysis. I.

