Results 1 - 10
of
10
Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis
, 2003
"... One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rulebased or data-driven models are often underestimated ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rulebased or data-driven models are often underestimated or even unrecognized. This paper discusses these problems in the contexts of morphology, syllabification, segmental duration and unit selection, and also suggests possible solutions. The design of databases for restricted application domains, where the distributions of linguistic and phonetic factors are known, is also critically reviewed.
Multilingual Syllabification Using Weighted Finite-State Transducers
- In Proc. 3rd ESCA Workshop on Speech Synthesis (Jenolan Caves
, 1998
"... Just as vowels with consonants are the matter of syllables, so also syllables are the matter for the construction of nouns and verbs, and of the elements which are made out of them. Antony the Rhetorician of Tagrit (y c. 445?) Knowledge of Rhetoric, Book Five, Canon One This paper describes an appro ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
(Show Context)
Just as vowels with consonants are the matter of syllables, so also syllables are the matter for the construction of nouns and verbs, and of the elements which are made out of them. Antony the Rhetorician of Tagrit (y c. 445?) Knowledge of Rhetoric, Book Five, Canon One This paper describes an approach to syllabification that has been incorporated into the English and German text-to-speech systems at Bell Labs. Implemented as a weighted finite-state transducer, the syllabifier is easily integrated – via mathematical composition – into the finite-state based text analysis component of the textto-speech system. The weights are based on frequencies of onset, nucleus and coda types obtained from training data. While the training data is language-dependent, the formal approach is multilingual. 1.
Inducing Probabilistic Syllable Classes Using Multivariate Clustering
- IN PROC. OF ACL
, 2000
"... An approach to automatic detection of syllable structure is presented. We demonstrate a novel application of EM-based clustering to multivariate data, exemplied by the induction of 3- and 5-dimensional probabilistic syllable classes. The qualitative evaluation shows that the method yields ph ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
An approach to automatic detection of syllable structure is presented. We demonstrate a novel application of EM-based clustering to multivariate data, exemplied by the induction of 3- and 5-dimensional probabilistic syllable classes. The qualitative evaluation shows that the method yields phonologically meaningful syllable classes. We then propose a novel approach to grapheme-to-phoneme conversion and show that syllable structure represents valuable information for pronunciation systems.
Hybrid Language Models Using Mixed Types of Sub-lexical Units for Open Vocabulary German LVCSR
"... German is a highly inflected language with a large number of words derived from the same root. It makes use of a high degree of word compounding leading to high Out-of-vocabulary (OOV) rates, and Language Model (LM) perplexities. For such languages the use of sub-lexical units for Large Vocabulary C ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
German is a highly inflected language with a large number of words derived from the same root. It makes use of a high degree of word compounding leading to high Out-of-vocabulary (OOV) rates, and Language Model (LM) perplexities. For such languages the use of sub-lexical units for Large Vocabulary Continuous Speech Recognition (LVCSR) becomes a natural choice. In this paper, we investigate the use of mixed types of sub-lexical units in the same recognition lexicon. Namely, morphemic or syllabic units combined with pronunciations called graphones, normal graphemic morphemes or syllables along with full-words. This mixture of units is used for building hybrid LMs suitable for open vocabulary LVCSR where the system operates over an open, constantly changing vocabulary like in broadcast news, political debates, etc. A relative reduction of around 5.0 % in Word Error Rate (WER) is obtained compared to a traditional full-words system. Moreover, around 40 % of the OOVs are recognized. Index Terms: open vocabulary, morpheme, syllable, graphone 1.
Probabilistic Context-Free Grammars for Syllabification and Grapheme-to-Phoneme Conversion
, 2001
"... We investigated the applicability of probabilistic context-free grammars to syllabification and grapheme-to-phoneme conversion. The results show that the standard probability model of context-free grammars performs very well in predicting syllable boundaries. However, our results indicate that the s ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We investigated the applicability of probabilistic context-free grammars to syllabification and grapheme-to-phoneme conversion. The results show that the standard probability model of context-free grammars performs very well in predicting syllable boundaries. However, our results indicate that the standard probability model does not solve grapheme-to-phoneme conversion sufficiently although, we varied all free parameters of the probabilistic reestimation procedure.
Vocabulary German LVCSR
, 2016
"... German is a highly inflected language with a large number of words derived from the same root. It makes use of a high de-gree of word compounding leading to high Out-of-vocabulary (OOV) rates, and Language Model (LM) perplexities. For such languages the use of sub-lexical units for Large Vocabulary ..."
Abstract
- Add to MetaCart
(Show Context)
German is a highly inflected language with a large number of words derived from the same root. It makes use of a high de-gree of word compounding leading to high Out-of-vocabulary (OOV) rates, and Language Model (LM) perplexities. For such languages the use of sub-lexical units for Large Vocabulary Continuous Speech Recognition (LVCSR) becomes a natural choice. In this paper, we investigate the use of mixed types of sub-lexical units in the same recognition lexicon. Namely, mor-phemic or syllabic units combined with pronunciations called graphones, normal graphemic morphemes or syllables along with full-words. This mixture of units is used for building hy-brid LMs suitable for open vocabulary LVCSR where the sys-tem operates over an open, constantly changing vocabulary like in broadcast news, political debates, etc. A relative reduction of around 5.0 % in Word Error Rate (WER) is obtained compared to a traditional full-words system. Moreover, around 40 % of the OOVs are recognized. Index Terms: open vocabulary, morpheme, syllable, graphone 1.
AUTOMATIC SEGMENTATION OF MANIPURI (MEITEILON) WORD INTO SYLLABIC UNITS
"... ABSTRACT ..."
(Show Context)
Are Rule-based Syllabification Methods Adequate for Languages with Low Syllabic Complexity? The Case of Italian
"... Syllabification information is a valuable component in speech synthesis systems. Linguistic rule-based methods have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been shown to be incorrect for the English language where data-driven algo ..."
Abstract
- Add to MetaCart
Syllabification information is a valuable component in speech synthesis systems. Linguistic rule-based methods have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been shown to be incorrect for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this paper, three rule-based automatic syllabification systems are compared and two data-driven (Syllabification by Analogy and the Look-Up Procedure) on a language with lower syllabic complexity- Italian. Using a leave-one-out procedure on 44,720 words, the best data-driven algorithm (Syllabification by Analogy) achieved 97.70 % word accuracy while the best rule-based method correctly syllabified 89.77 % words. These results show that data-driven methods can also outperform rule-based methods on Italian syllabification, indicating that these may be the best approaches to the syllabification component of speech synthesis systems.
Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis
"... One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rulebased or data-driven models are often underestimated ..."
Abstract
- Add to MetaCart
(Show Context)
One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rulebased or data-driven models are often underestimated or even unrecognized. This paper discusses these problems in the contexts of morphology, syllabification, segmental duration and unit selection, and also suggests possible solutions. The design of databases for restricted application domains, where the distributions of linguistic and phonetic factors are known, is also critically reviewed. 1.
Rare Events and Closed Domains: Two Questionable Concepts in Speech Synthesis
, 2001
"... In this paper we intend to point out two common concepts in speech synthesis that we consider questionable, if not misguided and wrong. The first of these concepts is the treatment of phenomena in language and speech that are known or assumed to have low frequencies of occurrence. The second concept ..."
Abstract
- Add to MetaCart
In this paper we intend to point out two common concepts in speech synthesis that we consider questionable, if not misguided and wrong. The first of these concepts is the treatment of phenomena in language and speech that are known or assumed to have low frequencies of occurrence. The second concept that we consider questionable is the notion of a "restricted" application domain. We conclude that word or syllable concatenation schemes are only feasible in strictly closed domains, i.e. those domains that have a fixed and unchanging vocabulary.