Results 1 - 10
of
17
Robust automatic speech recognition with missing and unreliable acoustic data
- Speech Communication
, 2001
"... ..."
Understanding Speech Understanding: Towards A Unified Theory Of Speech Perception
, 1996
"... Ever since Helmholtz, the perceptual basis of speech has been associated with the energy distribution across frequency. However, there is now accumulating evidence that speech understanding does not require a detailed spectral portraiture of the signal. As a consequence, a new theoretical perspectiv ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
Ever since Helmholtz, the perceptual basis of speech has been associated with the energy distribution across frequency. However, there is now accumulating evidence that speech understanding does not require a detailed spectral portraiture of the signal. As a consequence, a new theoretical perspective, focused on time, is beginning to emerge. This framework emphasizes the temporal evolution of coarse spectral patterns as the primary carrier of information within the speech signal, and provides an efficient and effective means of shielding linguistic information against the potentially hostile forces of the natural soundscape, such as reverberation and background acoustic interference. The auditory system may extract this relational information through computation of the low-frequency modulation spectrum in the auditory cortex, and this representation provides a principled basis for segmentation of the speech signal into syllabic units. Because of the systematic relationship between the syllable and higher-level lexicogrammatical organization it is possible, in principle, to gain direct access to the lexicon and grammar through such an auditory analysis of speech.
Multiresolution spectrotemporal analysis of complex sounds
- J Acoust Soc Am
"... A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features of sound likely critical ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features of sound likely critical in the perception of timbre. Several types of complex stimuli are used to demonstrate the spectrotemporal information extracted and represented by the model. Also outlined are several reconstruction algorithms to resynthesize the sound so as to evaluate the fidelity of the representation and contribution of different features and cues to the sound percept. Simplified versions of this model representations have already been used in a variety of applications, as in the assessment of speech intelligibility [Elhilali et al., 2003, Chi et al., 1999] and in explaining the perception of monaural phase sensitivity [Carlyon and Shamma, 2002]. 1 1.
Distant melodies: Statistical learning of non-adjacent dependencies in tone sequences
- Journal of Experimental Psychology: Learning, Memory, and Cognition
, 2004
"... Human listeners can keep track of statistical regularities among temporally adjacent elements in both speech and musical streams. However, for speech streams, when statistical regularities occur among nonadjacent elements, only certain types of patterns are acquired. Here, using musical tone sequenc ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Human listeners can keep track of statistical regularities among temporally adjacent elements in both speech and musical streams. However, for speech streams, when statistical regularities occur among nonadjacent elements, only certain types of patterns are acquired. Here, using musical tone sequences, the authors investigate nonadjacent learning. When the elements were all similar in pitch range and timbre, learners acquired moderate regularities among adjacent tones but did not acquire highly consistent regularities among nonadjacent tones. However, when elements differed in pitch range or timbre, learners acquired statistical regularities among the similar, but temporally nonadjacent, elements. Finally, with a moderate grouping cue, both adjacent and nonadjacent statistics were learned, indicating that statistical learning is governed not only by temporal adjacency but also by Gestalt principles of similarity. How do listeners organize and learn a patterned sequence of elements? Recent studies of a mechanism we have called statistical learning have shown that adults, young children, and infants are capable of computing transitional probabilities among adjacent syllables in rapidly presented streams of speech and of using these statistics to group syllables into word-like units (Aslin, Saffran, &
Effects of attention and unilateral neglect on auditory stream segregation
- Journal of Experimental Psychology: Human Perception and Performance
, 2001
"... Two pairs of experiments studied the effects of attention and of unilateral neglect on auditory streaming. The first pair showed that the build up of auditory streaming in normal participants is gready reduced or absent when they attend to a competing task in the contralateral ear. It was concluded ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Two pairs of experiments studied the effects of attention and of unilateral neglect on auditory streaming. The first pair showed that the build up of auditory streaming in normal participants is gready reduced or absent when they attend to a competing task in the contralateral ear. It was concluded that the effective build up of streaming depends on attention. The second pair showed that patients with an attentional deficit toward the left side of space (unilateral neglect) show less stream segregation of tone sequences presented to their left than to their right ears. Streaming in their right ears was similar to that for stimuli presented to either ear of healthy and of brain-damaged controls, who showed no across-ear asymmetry. This result is consistent with an effect of attention on streaming, constrains the neural sites involved, and reveals a qualitative difference between the perception of left- and right-sided sounds by neglect patients. Auditory streaming is an example of the grouping or binding processes that have been extensively studied both in the auditory (e.g., Bregman, 1990; Darwin & Carlyon, 1995) and visual (e.g., Treisman & Gormican, 1988) domains. It is well-illustrated by the stimulus shown in Figure 1 (van Noorden, 1975), in which a pair
Monaural speech separation
- Proc. NIPS
, 2002
"... Monaural speech separation has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with speech in the highfrequency range. Psychoacoustic evidence suggests that different perceptual mechanisms are involved ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Monaural speech separation has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with speech in the highfrequency range. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. Motivated by this, we propose a model for monaural separation that deals with low-frequency and highfrequency signals differently. For resolved harmonics, our model generates segments based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For unresolved harmonics, the model generates segments based on amplitude modulation (AM) in addition to temporal continuity and groups them according to AM repetition rates derived from sinusoidal modeling. Underlying the separation process is a pitch contour obtained according to psychoacoustic constraints. Our model is systematically evaluated, and it yields substantially better performance than previous systems, especially in the high-frequency range. 1
Sound Scene Segmentation by Dynamic Detection of Correlogram Comodulation
- the International Joint Conference on AI Workshop on Computational Auditory Scene Analysis
, 1999
"... : A new technique for sound-scene analysis is presented. This technique operates by discovering common modulation behavior among groups of frequency subbands in the autocorrelogram domain. The analysis is conducted by first analyzing the autocorrelogram to estimate the amplitude modulation and perio ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
: A new technique for sound-scene analysis is presented. This technique operates by discovering common modulation behavior among groups of frequency subbands in the autocorrelogram domain. The analysis is conducted by first analyzing the autocorrelogram to estimate the amplitude modulation and period modulation of each channel of data at each time step, and then using dynamic clustering techniques to group together channels with similar modulation behavior. Implementation details of the analysis technique are presented, and its performance is demonstrated on a test sound.
Pitch Variation is Unnecessary (and Sometimes Insufficient) for the Formation of Auditory Objects
, 2003
"... this paper evidence is presented that suggests that differences in pitch are unnecessary for the formation of auditory objects and that under some circumstances, pitch differences are insufficient for the formation of auditory objects ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this paper evidence is presented that suggests that differences in pitch are unnecessary for the formation of auditory objects and that under some circumstances, pitch differences are insufficient for the formation of auditory objects
Object-Based Sound Source Modeling for Musical Signals
"... This study presents a framework for audio and music processing which consists of an analysis and a synthesis path that are connected at three representational levels. Auditory signal analysis techniques include a multi-pitch analysis model, an eventdetector, and sinusoidal modeling that are combined ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This study presents a framework for audio and music processing which consists of an analysis and a synthesis path that are connected at three representational levels. Auditory signal analysis techniques include a multi-pitch analysis model, an eventdetector, and sinusoidal modeling that are combined in an iterative sound separation system. Techniques are presented for detection of perceptually relevant features, such as inharmonicity, vibrato, and decay characteristic, from polyphonic mixtures of harmonic sounds. The integration of the analysis and synthesis parts is demonstrated with examples where two-voice acoustic guitar signals are analyzed into an object-based representation and resynthesized using sound source models. 0

