• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Pronunciation modelling in Speech Synthesis (1998)

by C Miller
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Sub-phonetic modeling for capturing pronunciation variation in conversational speech synthesis

by Kishore Prahallad, Alan W Black, Ravishankhar Mosur - in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing , 2006
"... In this paper we address the issue of pronunciation modeling for conversational speech synthesis. We experiment with two different HMM topologies (fully connected state model and forward connected state model) for sub-phonetic modeling to capture the deletion and insertion of sub-phonetic states dur ..."
Abstract - Cited by 14 (5 self) - Add to MetaCart
In this paper we address the issue of pronunciation modeling for conversational speech synthesis. We experiment with two different HMM topologies (fully connected state model and forward connected state model) for sub-phonetic modeling to capture the deletion and insertion of sub-phonetic states during speech production process. We show that the experimented HMM topologies have higher log likelihood than the traditional 5-state sequential model. We also study the first and second mentions of content words and their influence on the pronunciation variation. Finally we report phone recognition experiments using the modified HMM topologies. 1.
(Show Context)

Citation Context

... n-gram language model to predict word durations and segmental pronunciation [1]. Miller trained neural network models using syntactic and prosodic information to predict the pronunciation variations =-=[2]-=-. Jande used phonological rule system for adapting the pronunciation for faster speech rate [3]. Bennett et al., used acoustic models trained on single speaker database to label the alternate pronunci...

Automatic Building of Synthetic Voices from Large Multi-Paragraph Speech Databases

by Kishore Prahallad, Arthur R Toth
"... Large multi paragraph speech databases encapsulate prosodic and contextual information beyond the sentence level which could be exploited to build natural sounding voices. This paper discusses our efforts on automatic building of synthetic voices from large multi-paragraph speech databases. We show ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Large multi paragraph speech databases encapsulate prosodic and contextual information beyond the sentence level which could be exploited to build natural sounding voices. This paper discusses our efforts on automatic building of synthetic voices from large multi-paragraph speech databases. We show that the primary issue of segmentation of large speech file could be addressed with modifications to forced-alignment technique and that the proposed technique is independent of the duration of the audio file. We also discuss how this framework could be extended to build a large number of voices from public domain large multi-paragraph recordings. Index Terms: speech synthesis, large multi-paragraph speech databases, forced-alignment, public domain recordings
(Show Context)

Citation Context

...ed/uttered by human-beings. For example, it is known that the prosody and acoustics of a word spoken in a sentence significantly differs from that of spoken in isolation. The work done in [3] [4] [5] =-=[6]-=- [7] [8] suggests that a similar analogy of prosodic and acoustic difference exists for sentences spoken in isolation versus sentences spoken in paragraphs and similarly for paragraphs too. Some of th...

TTS from zero: Building synthetic voices for new languages

by John Kominek, Alexander I. Rudnicky , 2009
"... A developer wanting to create a speech synthesizer in a new voice for an under-resourced language faces hard problems. These include difficult decisions in defining a phoneme set and a laborious process of accumulating a pronunciation lexicon. Previously this has been handled through involvement of ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
A developer wanting to create a speech synthesizer in a new voice for an under-resourced language faces hard problems. These include difficult decisions in defining a phoneme set and a laborious process of accumulating a pronunciation lexicon. Previously this has been handled through involvement of a language technologies expert. By definition, experts are in short supply. The goal of this thesis is to lower barriers facing a non-technical user in building “TTS from Zero. ” Our approach focuses on simplifying the lexicon building task by having the user listen to and select from a list of pronunciation alternatives. The candidate pronunciations are predicted by grapheme-to-phoneme (G2P) rules that are learned incrementally as the user works through the vocabulary. Studies demonstrate success for Iraqi, Hindi, German, and Bulgarian, among others. We compare various word selection strategies that the active learner uses to acquire maximally predictive rules. Incremental G2P learning enables iterative voice building. Beginning with 20 minutes of recordings, a bootstrapped synthesizer provides pronunciation examples for lexical review, which is fed into the next round of training with more recordings to create a larger, better voice... and so
(Show Context)

Citation Context

...ic. Miller assessed the ability of artificial neural networks to execute post-lexical rules in a speech synthesizer. Timedelay neural networks converted from the phonemic layer to the narrow phonetic =-=[118]-=-. base allophone base allophone base allophone /d/ d ‰ d dx /m/ m m§ m em /u/ u ∏ uw /t/ t t¬ ‰ ? t dx /n/ n n§ n en /û/ û û≤ π ax ix /h/ h Ó hh hv /l/ l l§ l el Table 2.8 TIMIT-based allophones used ...

Re-Engineering Letter-to-Sound Rules

by Martin Jansche , 2001
"... Using finite-state automata for the text analysis component in a text-to-speech system is problematic in several respects: the rewrite rules from which the automata are compiled are difficult to write and maintain, and the resulting automata can become very large and therefore inefficient. Convertin ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Using finite-state automata for the text analysis component in a text-to-speech system is problematic in several respects: the rewrite rules from which the automata are compiled are difficult to write and maintain, and the resulting automata can become very large and therefore inefficient. Converting the knowledge represented explicitly in rewrite rules into a more efficient format is difficult. We take an indirect route, learning an efficient decision tree representation from data and tapping information contained in existing rewrite rules, which increases performance compared to learning exclusively from a pronunciation lexicon.

Contents

by unknown authors
"... ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...quality. A model trained from data should learn the correct surface realization of a particular phone in a particular setting, regardless of what the symbol attached to that phone during training is. =-=[8]-=- Some Scottish models, for example, should learn to weaken or drop /l/at the end of a word in certain contexts and to realize it as dark otherwise. Considering the very 1Located at http://hts.sp.nitec...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University