14 citations found. Retrieving documents...
H. Hon, A. Acero, X. Huang, J. Liu and M. Plumpe, Automatic generation of synthesis units for trainable text-to-speech systems, in: Proceedings of ICASSP

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Corpus-Based Unit Selection for Natural-Sounding Speech Synthesis - Yi (2003)   (Correct)

....the advent of using real human speech, rule based formant methods as shown in Figure 1 1 were popular. Unit selection was trivial with a database of very few (one) examples, while much manual tuning was required in the modification and generation stages. Hard decisions made by decision trees [13, 59, 148, 29] or omission of units from the corpus [9, 38] are typically made before the unit selection search to prune the search space. Intonation contours can be applied by waveform modification as part of a post processing stage [12] or it can be modelled in the unit selection search [29, 19, 21] Speech ....

H. Hon, A. Acero, X. Huang, J. Liu, and M. Plumpe, "Automatic generation of synthesis units for trainable text-to-speech synthesis," in Proc. ICASSP '98, Seattle, WA, May 1998, vol. 1, pp. 293--296.


Robust Splicing Costs And Efficient Search With BMM.. - Bulyko, Ostendorf.. (2002)   (Correct)

....points with a negligible loss in synthesis quality. 1. INTRODUCTION Recently, a growing amount of attention in speech synthesis research has been drawn toward unit selection methods, which use dynamic programming to search for speech segments in a database that minimize some cost function [1, 2, 3]. The cost function is designed to quantify distortion introduced when selected units are modified and concatenated. Typically there are two components to the unit selection cost function: the target cost, which is an estimate of distance between the database unit and the target, and the ....

Hon, H., Acero, A., Huang, X., Liu, J., and Plumpe, M., "Automatic generation of synthesis units for trainable text-tospeech systems", Proc. ICASSP, 293--296, 1998.


The Impact Of Speech Recognition On Speech Synthesis - Ostendorf, Bulyko (2002)   (1 citation)  (Correct)

....are typically not optimized for the phonetic segmentation task. To reduce forced alignment errors researchers have investigated automatic error correction methods [48] edge detector outputs as features [73] and non uniform HMM topologies (1 state HMM for fricatives and 2 state HMM for stops) [4]. Decision tree clustering has also been used in speech synthesis to define the inventory of units (similar to speech recognition) either by tying HMM parameters to maximize likelihood of the training data [24, 4] or by minimizing the average distance between units in each cluster [10] While ASR ....

.... and non uniform HMM topologies (1 state HMM for fricatives and 2 state HMM for stops) 4] Decision tree clustering has also been used in speech synthesis to define the inventory of units (similar to speech recognition) either by tying HMM parameters to maximize likelihood of the training data [24, 4] or by minimizing the average distance between units in each cluster [10] While ASR includes only spectral envelope features in the objective function, synthesis may include acoustic prosodic information such as pitch and duration. A more important difference, however, is that ASR uses the ....

H. Hon, A. Acero, X. Huang, J. Liu and M. Plumpe, "Automatic generation of synthesis units for trainable text-to-speech systems," Proc. ICASSP, 1:293-296, 1998.


Unit Selection for Speech Synthesis Using Splicing Costs.. - Bulyko, Ostendorf (2001)   (2 citations)  (Correct)

....we summarize the key advances and outline future work. 2. Background Recently, a growing amount of attention in speech synthesis research has been drawn toward unit selection methods, based on using dynamic programming to search for speech segments in a database that minimize some cost function [3, 4, 5]. The cost function is designed to quantify distortion introduced when selected units are modified and concatenated. Typically there are two components to the unit selection cost function: the target cost, which is an estimate of distance between the database unit and the target, and the ....

Hon, H., Acero, A., Huang, X., Liu, J., and Plumpe, M., "Automatic generation of synthesis units for trainable text-to-speech systems", In Proc. of ICASSP, 293--296, 1998.


Joint Prosody Prediction And Unit Selection For.. - Bulyko, Ostendorf (2001)   (4 citations)  (Correct)

....real time performance with an unrestricted vocabulary. 2. BACKGROUND Recently, a growing amount of attention in speech synthesis research has been drawn toward unit selection methods, based on using dynamic programming to search for speech segments in a database that minimize some cost function [7, 6, 1]. The cost function is designed to quantify distortion introduced when selected units are modified and concatenated. Typically there are two components to the unit selection cost function: the target cost,which is an estimate of distortion that the database unit will be subject to when modified ....

....# # and the first frame in unit # ### which follows unit # # . This approach is more robust than computing a distance between two consecutive frames, because it does not imply continuity at join points. However, it still can be improved by including F0, energy and amplitude in the distance metric [6], which we plan to implement for our future experiments. The units in the database can be of arbitrary size. It is, however, important to match the unit inventory to the output of the prosody prediction module in order to satisfy necessary conditions for composing the prosody prediction and the ....

H. Hon, A. Acero, X. Huang, J, Liu, and M. Plumpe, "Automatic generation of synthesis units for trainable text-tospeech systems," In Proceedings of ICASSP, 293--296, 1998.


Segment Pre-Selection In Decision-Tree Based Speech Synthesis.. - Donovan (2000)   (8 citations)  (Correct)

....the many hundreds of versions of each unit which may be available in the training data. Some previous systems reported in the literature have used simple heuristics to select a single version of each unit, 4] 9] while others have used more complex procedures to select multiple versions, 2] [6]. This paper reports on the results of a research effort to develop a pre selection algorithm for use with the decision tree based concatenative speech synthesiser described in [3] The paper is structured as follows. A brief summary of the construction andoperation of the synthesiser is given in ....

....1, 231 mins Algorithm 87, 231 mins Algorithm 0, 68 231 mins Algorithm 1, 68 mins Algorithm 87, 68 mins Figure 2: The results obtained synthesising test sentences from datasets pre selected using various algorithms from different amounts of training data. similar to algorithms used in [4] and [6]. It first removed very quiet and very short segments from consideration, before selecting the n segments in each leaf with the highest log liklihood of being observed using the acoustic Gaussian in that leaf. Finally, Algorithm 87 is that described in Section 5 operated with the optimised system ....

Hon, H., Acero, A., Huang, X., Liu, J., and Plumpe, M. (1998) Automatic Generation of Synthesis Units for Trainable Text-toSpeech Systems, Proc. ICASSP'98, Seattle, pp. 293--296.


Corpus-Based Speech Synthesis: Methods and Challenges - Möbius   (Correct)

.... contexts that do not occur in the database and were unseen during 105 training can be reconstructed and mapped appropriately, a standard procedure in speech recognition (Jelinek and Mercer, 1980; Young, 1992) A similar approach was implemented in Microsoft s TTS system (Huang et al. 1996; Hon et al. 1998). 7 Word and syllable concatenation Attempts to record and play back words have not been successful, largely due to the large and changing number of words and the need to make contextual adjustments. Allen, 1992, page 768) For restricted domains a version of the unit selection method might ....

Hon, Hsiao-Wuen, Alex Acero, Xuedong Huang, Jingsong Liu, and Mike Plumpe. 1998. Automatic generation of synthesis units for trainable text-to-speech sys111 tems. In Proceedings of the IEEE International Conference on Acoustics and Speech Signal Processing (Seattle, WA), volume 1, pages 293-296.


Survey of Data-Driven Approaches to Speech Synthesis - Ng (1998)   (Correct)

....this section, we describe, compare, and analyze several different data driven and machinelearning approaches to concatenative synthesis. In particular we examine the appropriate processing components of the following three speech synthesis systems: Bell Labs synthesizer [27] Microsoft s Whistler [11, 13], and ATR s Chatr [4] The Bell Labs synthesizer takes a more traditional diphone based approach to concatenative synthesis. An augmented diphone synthesis unit inventory is designed manually using expert linguistic knowledge. A relatively small speech corpus is then designed to cover all the ....

....measure gives more importance to unit distortions. Clearly a better distance measure that is more indicative of perceived speech quality is needed. The Whistler system also uses a dynamic programming search to find the optimal sequence of synthesis units (context dependent phones) to concatenate [11]. The search tries to minimize an objective function that takes into account phonetic mismatch (via HMM scores) unit concatenation distortion, and prosody mismatch distortion. Similar to the Chatr objective function, the concatenation cost between two units is zero if they occurred in sequence in ....

[Article contains additional citation context not shown here]

H. Hon, A. Acero, X. Huang, J. Liu, and M. Plumpe, "Automatic generation of synthesis units for trainable text-to-speech systems," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Seattle, WA, vol. 1, pp. 293--296, May 1998.


Flexible Speech Synthesis Using Weighted Finite State Transducers - Bulyko (1996)   (1 citation)  Self-citation (Acero)   (Correct)

....and o er exibility in terms of model adaptation, the quality of speech is not yet good enough for general applications. Synthetic speech of higher quality is obtained when statistical modeling techniques are applied to the concatenative approach. Decision trees have been used for unit selection [30, 11, 38, 31], while applications of HMMs include automatic speech segmentation [100, 55, 38] and smoothing waveforms at the concatenation points [70] 2.4 Unit Selection 2.4.1 Synthesis Unit Choosing the inventory of units is a subject of ongoing research. Diphone based systems have been o ered for many ....

....good enough for general applications. Synthetic speech of higher quality is obtained when statistical modeling techniques are applied to the concatenative approach. Decision trees have been used for unit selection [30, 11, 38, 31] while applications of HMMs include automatic speech segmentation [100, 55, 38] and smoothing waveforms at the concatenation points [70] 2.4 Unit Selection 2.4.1 Synthesis Unit Choosing the inventory of units is a subject of ongoing research. Diphone based systems have been o ered for many years [51] A diphone database contains the transitions between all pairs of ....

[Article contains additional citation context not shown here]

H. Hon, A. Acero, X. Huang, J. Liu, and M. Plumpe. Automatic generation of synthesis units for trainable text-to-speech systems. In Proceedings of the Intl. Conf. on Acoustic, Speech, and Signal Processing, volume 1, pages 293-296,


Journal of Information Computational Science 3: 1 (2006).. - Available At Http   (Correct)

No context found.

H. Hon, A. Acero, X. Huang, J. Liu and M. Plumpe, Automatic generation of synthesis units for trainable text-to-speech systems, in: Proceedings of ICASSP


Speech Parameter Generation Algorithms for.. - Tokuda.. (2000)   (7 citations)  (Correct)

No context found.

H. Hon, A. Acero, X. Huang, J. Liu and M. Plumpe, "Automatic generation of synthesis units for trainable text-tospeech synthesis," in Proc. ICASSP, 1998, pp.293--306.


Selecting Non-Uniform Units From A Very - Large Corpus For   (Correct)

No context found.

Hon,H.,Acero,A.,Huang,S.,Liu,J. andPlumpe,M., "Automatic generation of synthesis units for trainable textto -speech systems", ICASSP'98, vol.1, 293-296


Trainable Speech Synthesis With Trended Hidden Markov Models - John Dines And (2001)   (Correct)

No context found.

H. Hon, A. Acero, J. Liu, and M Plumpe. Automatic generation of synthesis units for trainable text-to-speech systems. In Proc. ICASSP-98, volume 1, pages 293--296, Seattle, WA, May 1998.


Efficient Integrated Response Generation from Multiple.. - Bulyko, Ostendorf (2002)   (Correct)

No context found.

H. Hon, A. Acero, X. Huang, J, Liu and M. Plumpe, Automatic generation of synthesis units for trainable text-to-speech systems, In Proceedings of ICASSP , 293-296. (1998).

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC