11 citations found. Retrieving documents...
R.E. Donovan and P.C. Woodland. A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language, Vol. 13, No. 3, pp. 223-241, 1999.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
The Impact Of Speech Recognition On Speech Synthesis - Ostendorf, Bulyko (2002)   (1 citation)  (Correct)

....more suitable for a given phonetic and prosodic context, hence reducing the amount of additional signal processing. Selection of variable size units from large singlespeaker speech databases usually involves minimizing distortion introduced when selected units are modified and concatenated, e.g. [54, 35, 34, 19, 24, 7]. This distortion is represented in terms of two cost functions: 1) a target cost, which is an estimate of the difference between the database unit u i and the target t i , and 2) a concatenation cost, which is an estimate of the quality of concatenation of two units u i 1 and u i . The task of ....

.... and non uniform HMM topologies (1 state HMM for fricatives and 2 state HMM for stops) 4] Decision tree clustering has also been used in speech synthesis to define the inventory of units (similar to speech recognition) either by tying HMM parameters to maximize likelihood of the training data [24, 4] or by minimizing the average distance between units in each cluster [10] While ASR includes only spectral envelope features in the objective function, synthesis may include acoustic prosodic information such as pitch and duration. A more important difference, however, is that ASR uses the ....

[Article contains additional citation context not shown here]

R. Donovan and P. Woodland, "A hidden Markov-modelbased trainable speech synthesizer," Computer Speech and Language, 13(3):223-241, 1999.


Flexible Speech Synthesis Using Weighted Finite State Transducers - Bulyko (1996)   (1 citation)  (Correct)

....and o er exibility in terms of model adaptation, the quality of speech is not yet good enough for general applications. Synthetic speech of higher quality is obtained when statistical modeling techniques are applied to the concatenative approach. Decision trees have been used for unit selection [30, 11, 38, 31], while applications of HMMs include automatic speech segmentation [100, 55, 38] and smoothing waveforms at the concatenation points [70] 2.4 Unit Selection 2.4.1 Synthesis Unit Choosing the inventory of units is a subject of ongoing research. Diphone based systems have been o ered for many ....

....and prosodic context, hence reducing the amount of additional signal processing. The unit selection process, however, becomes more complex as it requires a dynamic search. 2.4. 2 Selecting Units from Multiple Candidates Selection of variable size units from large single speaker speech databases [82, 42, 39, 11, 24, 38, 6, 31] is typically based on minimizing acoustic distortion introduced when selected units are modi ed and concatenated. This distortion is represented in terms of two cost functions (Figure 2.4) 1) target cost C (u i ; t i ) which is an estimate of the di erence between the database unit u i and ....

R. E. Donovan and P. C. Woodland. A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language, 13(3):223-241, 1999.


A Characterization of Speech Recognition on Modern.. - Agaram, Keckler, Burger (2001)   (8 citations)  (Correct)

....achieves speaker independent word recognition accuracies of 71 96 , depending on the complexity of the grammatical structure in the sentences. Both benchmarks are typical in terms of the algorithms used in modern recognition systems; SPHINX uses HMM based algorithms that are currently prevalent [16], and RASTA processing is also widely used [8] In general, current desktop machines have sufficient resources to perform dedicated large vocabulary speech recognition in real time. However, this performance is attained at the expense of substantial memory capacity and bandwidth requirements that ....

D. Robert and E. Woodland. A hidden markovmodel -based trainable speech synthesizer. Computer Speech and Language, (13:223-241), 1999.


Corpus-Based Speech Synthesis: Methods and Challenges - Möbius   (Correct)

....whole language. We also show that word or syllable based approaches are only feasible in strictly closed application domains. 87 1 Introduction It has been argued that the large number of concatenation points in a synthesized utterance produces a perceptual impression of unnaturalness (e.g. (Donovan and Woodland, 1999)) even if the spectral discontinuities at the concatenation points are reduced by a careful inventory design based on phonetic criteria. In diphone synthesis, there is a concatenation point in each segment. A paradigm shift occurred when researchers began to design corpus based synthesis ....

....because the authors argue that essential coarticulatory e ects are captured within the units as a consequence of the applied context oriented clustering algorithm. A modi ed version of the clustering method has been implemented in the English speech synthesizer developed at Cambridge University (Donovan and Woodland, 1999) and in the IBM speech synthesizer (Donovan and Eide, 1998) As stated previously, the concept of non uniform unit concatenation was rst proposed by Sagisaka (Sagisaka, 1988; Takeda, Abe, and Sagisaka, 1990) In this work we also encounter for the rst time the distinction between unit ....

[Article contains additional citation context not shown here]

Donovan, Robert E. and P. C. Woodland. 1999. A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language, 13:223-241.


Rare Events and Closed Domains: Two Questionable Concepts in.. - Möbius (2001)   (Correct)

....126 4 Conclusion The LNRE characteristics of language and speech are often unrecognized and the pertinent problems underestimated. For example, it is a common attitude to accept poor modeling of less frequently seen or unseen contexts because they are less frequently used in synthesis (Donovan and Woodland, 1999, page 228) The perverse nature of LNRE distributions is the following: the number of rare events is so large that the probability of encountering at least one of these events in a particular sample, such as in a sentence to be synthesized, approaches certainty. In this paper we have discussed ....

Donovan, Robert E. and P. C. Woodland. 1999. A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language, 13:223-241.


Current Status of the IBM Trainable Speech Synthesis.. - Donovan, Ittycheriah.. (2001)   (3 citations)  Self-citation (Donovan)   (Correct)

....synthesis system. This paper describes the current status of the system, which was previously introduced in [1] The system uses hidden Markov model (HMM) state sized segments as its basic synthesis units and decision trees in its segment search. The system is based on the work described in [2] [3], and also has similarities with that described in [4] and [5] Unlike [2] 3] the current system combines the decision tree approach with a dynamic programming search similar to that used in [6] 7] This paper is structured as follows. Section 2 describes the methods used to prepare a dataset ....

....was previously introduced in [1] The system uses hidden Markov model (HMM) state sized segments as its basic synthesis units and decision trees in its segment search. The system is based on the work described in [2] 3] and also has similarities with that described in [4] and [5] Unlike [2] [3] the current system combines the decision tree approach with a dynamic programming search similar to that used in [6] 7] This paper is structured as follows. Section 2 describes the methods used to prepare a dataset for use with the synthesiser. Section 3 describes the runtime operation of the ....

Donovan, R.E. & Woodland, P.C. (1999) A Hidden Markov Model Based Trainable Speech Synthesiser, Computer Speech and Language, Vol. 13, No. 3, pp. 223-- 242.


Segment Pre-Selection In Decision-Tree Based Speech Synthesis.. - Donovan (2000)   (8 citations)  Self-citation (Donovan)   (Correct)

....1. INTRODUCTION In recent years corpus based approaches to unit selection for concatenative speech synthesis have become increasing popular due to their improved sensitivity to unit context, both phonetic and prosodic, compared to earlier diphone and polyphone approaches, 1] 3] [4], 7] 8] These systems are usually based on large speech databases, typically from 30 minutes to several hours in duration, and use sophisticated search algorithms to determine which segments to concatenate to synthesise a given sentence. In many of these systems however, all, or nearly all, ....

....reduce the severity of the runtime pruning problems associated with quickly choosing between the many hundreds of versions of each unit which may be available in the training data. Some previous systems reported in the literature have used simple heuristics to select a single version of each unit, [4], 9] while others have used more complex procedures to select multiple versions, 2] 6] This paper reports on the results of a research effort to develop a pre selection algorithm for use with the decision tree based concatenative speech synthesiser described in [3] The paper is structured ....

[Article contains additional citation context not shown here]

Donovan, R.E., and Woodland, P.C. (1999) A Hidden Markov Model Based Trainable Speech Synthesiser, Computer Speech and Language, Vol. 13, No. 3, pp. 223--242.


High-Quality and Flexible Speech Synthesis with Segment Selection.. - Toda (2003)   (Correct)

No context found.

R.E. Donovan and P.C. Woodland. A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language, Vol. 13, No. 3, pp. 223-241, 1999.


Recent Improvements on ARTIC: Czech Text-to-Speech System - Jind Rich Matou (2004)   (Correct)

No context found.

Donovan, R. E., Woodland, P. C., "A Hidden MarkovModel -Based Trainable Speech Synthesizer", Computer Speech and Language 13:223--241, 1999.


Unknown -   (Correct)

No context found.

Donovan, R. E., and Woodland, P. C. 1999. A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language: 1--19.


Rare Events and Closed Domains: Two Delicate Concepts in Speech.. - Möbius (2003)   (2 citations)  (Correct)

No context found.

Robert E. Donovan and P. C. Woodland, "A hidden Markov-model-based trainable speech synthesizer," Computer Speech and Language, vol. 13, pp. 223--241, 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC