| C. Avendano, "Temporal Processing of Speech in a TimeFeature Space," Ph.D. Thesis, Oregon Graduate Institute of Science & Technology, April 1997. |
....speech spectrum### and the filter spectrum . In practice, processing is based on a short term Fourier transform ########### where n is the time index around which a windowed DFT is taken. If the analysis window is long and smooth enough then the product property still approximately holds [3]:############################ . Taking the logs of the magnitudes of both sides, we find that log log log #### , and thus in theory we can approximately remove , along with any constant portion of the log speech spectrum, by subtracting the time average over n of log from log . ....
C. Avendano, Temporal Processing of Speech in a Time-Feature Space, Ph.D. thesis, Oregon Graduate Institute, 1997.
....(3) The notation denotes the transpose of vector . m stand for the sequences of subband energy vectors of the th microphone signal, the target speech source and the noise sources, respectively. Note that equation (3) is not exactly valid, yet it is a fair approximation [4]. Besides, a typical frame rate is 100Hz which corresponds to a frame shift of 10ms. For a standard 342m s sound speed, it suggests that the propagation delays in the subband energy domain for a given source are roughly equal unless the micorphones are more distant from each other than about 3m. ....
C. Avendano, "Temporal Processing of Speech in a TimeFeature Space", Ph.D. Thesis, Oregon Graduate Institute of Science and Technology, Apr. 1997.
....) and the channel spectrum C( # ) In practice, processing is based on a short term Fourier transform X(n, # ) where n is the time index around which a windowed DFT is taken. If the analysis window is long and smooth enough then the product property still approximately holds: X(n, S(n, C( [1]) Here the channel has been assumed not to vary over time. Taking the logs of both sides, we find that log X(n, log C( log S(n, and thus in theory we could approximately remove C( along with any constant portion of the speech spectrum, by subtracting the time average ....
C. Avendano, Temporal Processing of Speech in a Time-Feature Space, Ph.D. thesis, Oregon Graduate Institute, 1997.
.... transfer function H(#) then the resulting signal will be x(t) h(t) # s(t) the convolution of h(t)ands(t) The short time Fourier transform of x(t) will be X(n, #) # H(w)S(n, #) provided that h(t) is short compared to the length of the windowing function used in the short time Fourier transform [Ave97b] In the log power spectral domain S(n, #)andH(#) will be approximately additive, and filtering of the log spectral trajectories will be e#ective for suppressing the e#ects of the unknown filtering, provided that H(#) is stationary or changes at rates outside the linguistically important 1 16 Hz ....
....speech with one form of spectral shaping and tested on speech with a di#erent spectral shaping. It is also possible, however, to design the filters automatically from training data. Data driven methods for designing filters have been successfully applied to both speech enhancement [HWA95, AH96, Ave97b] and to robust ASR [AvVH96, HAvVT97, Ave97b] The signal processing systems used for both the speech enhancement and speech recognition tasks are very similar: 1. A short time Fourier transform is performed on the input signal and the resulting complex spectrum is split into magnitude and phase ....
[Article contains additional citation context not shown here]
Carlos Avendano. Temporal Processing of Speech in a Time-Feature Space. PhD thesis, Oregon Graduate Institute, 1997.
....both speaker as well as channel information need to be suppressed and CMS mainly reduces the channel variability. It conforms our observation [5] that the speaker variability might not be concentrated in the DC but might be spread in the very low frequency ( 1Hz) in modulation spectrum [6]. This suggests that a proper modulation frequency (RASTA) filter eliminating low modulation frequency components could be effective in supressing speaker variability. From the nature of speaker and channel variability for different phonemes, it indicates that sonorants contain more speaker ....
Carlos Avendano, Temporal Processing of Speech in a Time-Feature Space, PhD Thesis, Oregon Graduate Institute of Science and Technology, 1997. 4
....the speech signal. For example, assuming that the environment acts like a time invariant filter, it has an approximately constant multiplicative effect on the short term frequency response [4, 76, 26] In general however, the environment may be non linear, time varying, noisy and not well modeled [7]. Given that the environment affects the time sequences, one way to gain an understanding of the effects is to analyze the environment in terms of its modulation spectrum and compare this to the modulation spectrum of speech. In this dissertation, the strategy will be to determine the relative ....
....ffl In [30] an RMS distance on the logarithmic spectral energies was reported to be meaningful for speech processing. That a logarithmic representation may be suitable is also supported by statistical modeling considerations [14] and a model of convolutional distortion for the environment [76, 37, 7]. ffl Based on auditory considerations [43, 16, 67, 34] a log like (Mel or Bark) warping of the frequency axis (to allow higher resolution at low frequencies) may be warranted. ffl Similarly, based on the theory of auditory masking, which holds that frequency resolution is limited to critical ....
[Article contains additional citation context not shown here]
Carlos Avendano. Temporal Processing of Speech in a Time-Feature Space. PhD thesis, Oregon Graduate Institute of Science and Technology, Portland, Oregon, April 1997.
No context found.
C. Avendano, "Temporal Processing of Speech in a TimeFeature Space," Ph.D. Thesis, Oregon Graduate Institute of Science & Technology, April 1997.
No context found.
C. Avendano, #Temporal Processing of Speech in a TimeFeature Space," Ph.D. Thesis, Oregon Graduate Institute of Science & Technology, April 1997.
....processing) such as the dynamic cepstrum [1] or the RASTA processing [2] used ad hoc designed processing and appear to be suboptimal. In this paperwe present the results that we have obtained by using a systematic data driven design of temporal processing. First, based on analytic results [3], we discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background. Then, we illustrate the data driven design of temporal processingwith an adaptive speechenhancementsystem. We conclude reporting on other ....
....to the time index n. The modification F (n; k ) can be interpreted as a set of time trajectory filters, each operating on a different time trajectory indexed by the center frequency k . A modified time domain signal y(n) can be obtained by applying a synthesis formula to (1) 6] It can beshown [3], that the time domain signal y(n) can also be obtained by directly filtering s(n) with an equivalent time domain linear time invariant filter. This equivalent filter dependson the inverse STFT of the modification F (n; k ) and the analysis and synthesis windows of the STFT. In this respect ....
[Article contains additional citation context not shown here]
Avendano, C. "Temporal Processing of Speech in a Time-Feature Space," Ph.D. Thesis, Oregon Graduate Institute of Science & Technology, April 1997.
....above considerations, it is reasonable to use a window with low aliasing, smooth edges, and long enough to satisfy the frequency and time domain conditions required. It is our experience that a Hamming window with a length at least 4 times the channel length provides a good approximation to (5. 17) [9]. Carlos Avendano, Temporal Processing of Speech in a Time Feature Space , Ph.D. thesis, Oregon Graduate Institute, April 1997 42 5.2.2 Discussion Fortunately, some commonly encountered channels, like telephone channels, have short impulse responses compared to the analysis window lengths ....
....constraints will determine specific behavior of the filters. 8.2. 1 Technique Considering that the effect of the channel can be approximated as multiplicative in the short time frequency domain, and approximately additive in the critical bands (see Chapter 5 and the discussion on this issue in [9]) we chose our CIT MIF modification to be implemented as FIR filters. The objective function to derive the RASTA like filter for each critical band can be written as J k = E 8 : J Gamma1 X j=1 J X i=j 1 [Y j (n; k) Gamma Y i (n; k) 2 9 = 8.3) From (8.3) we see that the ....
Avendano, C., and Hermansky, H. On the effects of short-term spectrum smoothing in channel normalization. to appear in IEEE Trans. on Speech and Audio Processing (July 1997). 106 Carlos Avendano, "Temporal Processing of Speech in a Time-Feature Space", Ph.D. thesis, Oregon Graduate Institute, April 1997 107
....not hold. We propose a technique which uses a high frequency resolution (i.e. very long analysis window) frequency bank analysis, performs the channel normalization by dc removal (or high pass filtering) on the time trajectories of the logarithmic spectral energies, and does a partial synthesis [3]. The partial reconstruction does not have to synthesize the complete speech signal but only a set of sub band signals with frequency and time resolution necessary for computing the appropriate ASR features. The algorithm is basically an extension of the traditional short term analysis and mean ....
Avendano, C.: Temporal Processing of Speech in a Time-Feature Space, PhD. Thesis, Oregon Graduate Institute of Science and Technology, March 1997.
....frequency variable, and the m operator denotes convolution with respect to the time dimension. If the analysis window is smooth and long relative to the length of h(n) it can be shown that the effect of the transmission medium can be considered as only multiplicative in the frequency dimension [3], i.e. X(m; k ) S(m; k )H( k ) 2) where H( k ) is the discrete Fourier transform (DFT) of h(n) The effect of the reverberation can not be approximated as multiplicative within a single analysis segment of the conventional short time analysis. Therefore, techniques which are typically ....
....to use a long analysis window. We can do this by using a medium time transform, say X(l; Omega j ) For example, for an impulse response with a reverberation time of T60 = 0:5 s, a smooth analysis window 2 s long would be adequate to render the channel as an approximately multiplicative term [3]. This representation has low time resolution and high frequency resolution compared to traditional short time representations. In this domain the approximation in (2) is close and we can apply normalization. Now the problem is that the normalized parameters have a lower time resolution than that ....
[Article contains additional citation context not shown here]
C. Avendano, "Temporal Processing of Speech in a TimeFeature Space," Ph.D. Thesis, Oregon Graduate Institute of Science & Technology, April 1997.
No context found.
Carlos Avendano, Temporal Processing of Speech in a Time-Feature Space, PhD Thesis, Oregon Graduate Institute of Science and Technology, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC