19 citations found. Retrieving documents...
J.L. Gauvain, L. Lamel and G. Adda, "Partitioning and Transcription of Broadcast News Data," Proc. of the International Conference on Speech and Language Processing, vol. 4, pp. 1335-1338, Sydney, Dec 1998.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Detection Of Speaker Changes In An Audio Document - Delacourt, Kryze, Wellekens (1999)   (3 citations)  (Correct)

....obtaining speaker homogeneous segments: resulting segments should be as long as possible and related to a single speaker. This problem received attention recently in the literature [1, 2, 3, 4, 5] since it can be used as a preliminary step in several indexing applications: news transcription tasks [6, 7], automatic grouping speech messages[8] or speaker tracking [9, 10] The segmentation algorithm we describe in this paper is designed to be embedded in a speaker based indexing system. The goal of this system is to know who speaks and when. This indexing system is divided in two parts: first, the ....

J.-L. Gauvain, L. Lamel, and G. Adda, "Partitioning and transcription of broadcast news data," in ICSLP, 1998.


Improving Speaker Diarization - Claude Barras Xuan (2004)   (1 citation)  Self-citation (Gauvain)   (Correct)

No context found.

J.L. Gauvain, L. Lamel and G. Adda, "Partitioning and Transcription of Broadcast News Data," Proc. of the International Conference on Speech and Language Processing, vol. 4, pp. 1335-1338, Sydney, Dec 1998.


Automatic Processing of Broadcast Audio in Multiple Languages - Lori Lamel And (2002)   (1 citation)  Self-citation (Gauvain)   (Correct)

....and a speech recognizer. The goal of audio partitioning is to divide the acoustic signal into homogeneous segments, labeling and structuring the acoustic content of the data, and identifying and removing non speech segments. The BN audio partitioner relies on an audio stream mixture model [12]. While it is possible to transcribe the continuous stream of audio data without any prior segmentation, partitioning offers several advantages over this straightforward solution. Additional nonlinguistic information can be extracted from the audio signal, such as the segmentation into speaker ....

J.L. Gauvain et al., "Partitioning and Transcription of Broadcast News Data," ICSLP'98.


Transcribing Audio-Video Archives - Barras, Allauzen, Lamel, Gauvain (2002)   (3 citations)  Self-citation (Gauvain Lamel)   (Correct)

....acoustic conditions for which accurate manual transcription was not possible. About 40 minutes of data were discarded. 3. TRANSCRIPTION SYSTEM DESCRIPTION The LIMSI broadcast news transcription system has two main components, the audio partitioner and the word recognizer [3] Data partitioning [4] serves to divide the continuous audio stream into homogeneous segments, associating cluster, gender and bandwidth labels with each segment. The speech recognizer uses continuous density (CD) HMMs with Gaussian mixture for acoustic modeling and n gram statistics estimated on large text corpora ....

....(FSE) and resulting word error rate (WER) on the complete Eurodelphes corpus and a 1 hour test subset using baseline and adapted partitioning models. 4.3. Speech non speech detection As was noted in the error analysis (see Section 4. 1) some speech segments were discarded by the partitioner [4], resulting in unrecoverable errors for the transcription system. Comparing the automatic labeling with the manual transcriptions, there is a 9.5 speech non speech frame segmentation error using the baseline models, with 5.6 missed detections (MD) of speech frames, and 4.8 of false alarms (FA) ....

[Article contains additional citation context not shown here]

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and transcription of broadcast news data," in Proc. ICSLP, 1998.


The LIMSI Broadcast News Transcription System - Gauvain, Lamel, Adda (2002)   (13 citations)  Self-citation (Gauvain Lamel Adda)   (Correct)

....a speaker in a particular background and channel condition, is in turn modeled by a mixture of Gaussians. The segment boundaries and labels are jointly identified using the iterative procedure described below. 3. 2 Audio Stream Mixture Model The segmentation and labeling procedure introduced in [17, 18] is shown in Figure 1. First, the non speech segments are detected (and rejected) using Gaussian mixture models (GMMs) These GMMs, each with 64 Gaussians, serve to detect speech, pure music and other (background) The acoustic feature vector used for segmentation contains 38 parameters. It is the ....

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," Proc. ICSLP'98, 5, pp. 1335-1338, Sydney, Australia, December 1998.


Automatic Transcription Of Compressed Broadcast Audio - Barras, Lamel, Gauvain (2001)   (1 citation)  Self-citation (Gauvain Lamel)   (Correct)

....explore two ways to reduce the difference between PCM and compressed data, either by using bandwidth limited acoustic models, or by training the models on compressed data. 2. AUTOMATIC TRANSCRIPTION SYSTEM The LIMSI broadcast news automatic transcription system consists of an audio partitioner [7] and a speech recognizer [6] Combined with an indexation module, it has been used for building the SDR indexation system [8] The recognition system initially developed for American English has been ported to the French language [1, 2] which is one of the target languages of the ALERT project ....

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," ICSLP'98, 5, pp. 1335-1338, Dec. 1998.


Fast Decoding for Indexation of Broadcast Data - Gauvain, Lamel (2000)   Self-citation (Gauvain Lamel)   (Correct)

....out to address the influence acoustic model size and of language model order on performance. We then discuss the impact of the word error rate on the information retrieval process. 2. SYSTEM OVERVIEW The LIMSI broadcast news automatic indexation system [3] consists of an audio partitioner [6], a speech recognizer [7, 8] and an indexation module [5] The goal of audio partitioning is to divide the acoustic signal into homogeneous segments, labeling and structuring the acoustic content of the data. Partitioning consists of identifying and removing non speech segments, and then ....

....decoding. Note that processing the non speech segments as if they were speech does not significantly increase the word error rate, but does considerably increase the processing time. The partitioning approach used in the LIMSI BN transcription system relies on an audio stream mixture model [6]. Each component audio source, representing a speaker in a particular background and channel condition, is in turn modeled by a GMM. The segment boundaries and labels are jointly identified by an iterative maximum likelihood segmentation clustering procedure using GMMs and agglomerative ....

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," ICSLP'98, 5, pp. 13351338, Dec. 1998.


Broadcast News Transcription in Mandarin - Chen, Lamel, Adda, Gauvain (2000)   (1 citation)  Self-citation (Gauvain Lamel Adda)   (Correct)

....the segment, associating start and end times and optionally a confidence measure with each word. Data partitioning, developed for the American English system, is based on an iterative maximum likelihood segmentation clustering procedure using Gaussian mixture models and agglomerative clustering [5]. In contrast to partitioning algorithms that incorporate phoneme recognition, this approach is languageindependent, and the same models are used to partition English, French, German and Mandarin data. The result of the partitioning process is a set of speech segments with speaker, gender and ....

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," Proc. ICSLP'98, 5, pp. 1335--1338, Sydney, Australia, Dec. 1998.


An Overview of Speech Recognition Activities at LIMSI - Gauvain, Adda..   Self-citation (Gauvain Lamel Adda)   (Correct)

....on information channels is increasing at a close rate. Therefore processing time is an important factor in making a speech transcription system viable for audio data mining and other related applications. The LIMSI broadcast news automatic indexation system [11] consists of an audio partitioner [9], a speech recognizer [10, 12] and an indexation module [6] The transcription components are shown in Figure 1. Partitioning the Audio stream The goal of audio partitioning is to divide the acoustic signal into homogeneous segments, removing non speech segments, and labeling and structuring the ....

....be significantly improved. Finally, eliminating non speech segments and dividing the data into shorter segments (which can still be several minutes long) substantially reduces the computation time and simplifies decoding. The LIMSI partitioning approach relies on an audio stream mixture model [9]. Each component audio source, representing a speaker in a particular background and channel condition, is in turn modeled by a GMM. The segment boundaries and labels are jointly identified by an iterative maximum likelihood segmentation clustering procedure using GMMs and agglomerative ....

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," ICSLP'98, 5, pp. 13351338, Dec. 1998.


OLIVE: Speech Based Video Retrieval - Franciska De Jong (1998)   (2 citations)  Self-citation (Gauvain)   (Correct)

....HMM system with a 65k word trigram language model. Decoding is carried out in multiple passes, incorporating cluster based test set acoustic adaptation. Prior to word recognition, the acoustic signal is partitioned into homegenous segments, and appropriate labels are associated with the segments[2]. This partitioning algorithm first detects (and rejects) non speech segments using Gaussian mixture models (GMMs) An iterative maximum likelihood segmentation clustering procedure is then applied to the speech segments using GMMs and an agglomerative clustering algorithm. The result of the ....

....tree. Word recognition is performed in three steps: initial hypothesis generation, word graph generation, and final hypothesis generation. The initial hypotheses are used for cluster based acoustic model adaptation. Taking advantage of the corpora available through the LDC, the speech recognizer[1, 2] has been developed and tested on American English. The acoustic models are trained on 150 hours of transcribed audio data, with the language models trained on 200M words broadcast news transcriptions and 400M words of newspaper and newswire texts. Using broadcast data collected in Olive, LIMSI ....

[Article contains additional citation context not shown here]

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," Proc. ICSLP'98, Sydney, Nov. 1998, pp. 1335-1338.


The LIMSI 1999 Hub-4E Transcription System - Gauvain, Lamel, Adda   Self-citation (Gauvain Lamel Adda)   (Correct)

....the Nov98 evaluation test set used for system development, as well as the 1999 evaluation test set. All the reported runs were done on a Compaq XP1000 500MHz machine with Digital Unix. 2. SYSTEM OVERVIEW The LIMSI broadcast news automatic transcription system [3] consists of an audio partitioner [9], and a speech recognizer [4, 11] The goal of audio partitioning is to divide the acoustic signal into homogeneous segments, labeling and structuring the acoustic content of the data. Partitioning consists of identifying and removing non speech segments, and then clustering the speech segments ....

....cluster, gender and telephone wideband labels, which can be used to generate metadata annotations. While it is possible to transcribe the continuous stream of audio data without any prior segmentation, some of the advantages partitioning offers over such a straight forward solution are given in [9]. The partitioning approach used in the LIMSI BN transcription system relies on an audio stream mixture model [9] Each component audio source, representing a speaker in a particular background and channel condition, is in turn modeled by a GMM. The segment boundaries and labels are jointly ....

[Article contains additional citation context not shown here]

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," Proc. ICSLP'98, 5, pp. 13351338, Sydney, Dec. 1998.


The LIMSI SDR System for TREC-9 - Gauvain, Lamel, Barras, Adda, de.. (2000)   (7 citations)  Self-citation (Gauvain Lamel Adda)   (Correct)

....the terse query condition in this year s evaluation. Comparative results are given on the development queries from SDR 99 and this year s query set, and some conclusions are made. 2. TRANSCRIPTION SYSTEM OVERVIEW The LIMSI broadcast news transcription system [5] consists of an audio partitioner [10] and a speech recognizer [11, 12] The goal of audio partitioning is to divide the acoustic signal into homogeneous segments, labeling and structuring the acoustic content of the data. Partitioning consists of identifying and removing non speech segments, and then clustering the speech segments ....

....to each segment. The result of the partitioning process is a set of speech segments with cluster, gender and telephone wideband labels, which can be used to generate metadata annotations. The partitioning approach used in the LIMSI BN transcription system relies on an audio stream mixture model [10]. Each component audio source, representing a speaker in a particular background and channel condition, is modeled by a GMM. The segment boundaries and labels are jointly identified by an iterative maximum likelihood segmentation clustering procedure using GMMs and agglomerative clustering. For ....

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," ICSLP'98, 5, pp. 1335-1338, December 1998.


Language-Based Multimedia Information Retrieval - De Jong (2000)   (2 citations)  Self-citation (Gauvain)   (Correct)

No context found.

J.L. Gauvain, L. Lamel & G. Adda (1998), "Partitioning and Transcription of Broadcast News Data." In : Proceedings of ICSLP'98, Sydney, pp. 1335-1338.


The LIMSI SDR System for TREC-8 - Gauvain, de Kercadio, Lamel, Adda   (7 citations)  Self-citation (Gauvain Lamel Adda)   (Correct)

....a decision tree. Audio Partitioner The goal of partitioning is to divide the continuous audio stream into homogeneous acoustic segments, to remove non speech segments and to assign bandwidth and gender labels to each segment. The audio partitioning procedure, introduced for the Nov 97 evaluation [4, 5] and used in the LIMSI Nov 98 Hub 4E system [9] is as follows: 1. First, the non speech segments are detected (and rejected) using Gaussian mixture models (GMMs) Four GMMs, each with 64 Gaussians serve to detect speech, pure music and other (background) All test segments labeled as music or ....

J.L. Gauvain, L. Lamel and G. Adda, "Partitioning and Transcription of Broadcast News Data," Proc. ICSLP'98, 5, pp. 1335-1338, Sydney, December 1998.


Recent Advances in Transcribing Television and Radio.. - Gauvain, Lamel, Adda..   (2 citations)  Self-citation (Gauvain Lamel Adda)   (Correct)

....hypothesis are used for cluster based acoustic model adaptation using the MLLR technique. The final hypothesis is generated using a 4 gram interpolated with a category trigram model with 270 automatically generated word classes [8] Our development work aimed at improving the partitioning algorithm[5, 6] and improving the acoustic and language models. The main differences from our Nov97 system are the use of additional acoustic and language model training data, the use of divisive decision tree clustering instead of agglomerative clustering for state tying, the generation of word graphs using ....

....the recognizer performance after each step on three test sets. All of our system development was carried out using the eval96 data. The frame level segmentation error (as used in [7] was evaluated on the eval96 test data using the manual segmentation provided in the reference transcriptions [6]. The average speech non speech segmentation frame error rate on the 4 half hour shows was 3.7 , and the gender label frame error was 1 . The first show had a significantly higher frame error rate of 7.9 due to deletion of a long, very noisy speech segment. In general more clusters are found than ....

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," ICSLP'98, 5, pp. 13351338, Sydney, Dec. 1998.


The LIMSI 1998 Hub-4E Transcription System - Jean-Luc Gauvain Lori   Self-citation (Gauvain Lamel Adda)   (Correct)

....of the BN acoustic data. The recognition vocabulary contains 65K words and has a lexical coverage of over 99 on all evaluation test sets. A pronunciation graph is associated with each word, represented using a set of 48 phones. Our development work was aimed at improving the partitioning algorithm[4, 5] and improving the acoustic and language models. The main differences relative to our Nov97 system are the use of additional acoustic and language model training data, the use of divisive decision tree clustering instead of agglomerative clustering for state tying, the generation of word graphs ....

J.L. Gauvain, L. Lamel, G. Adda, "Partitioning and Transcription of Broadcast News Data," ICSLP'98, 5, pp. 1335-1338, Sydney, Dec. 1998.


Speaker Change Detection and Speaker Clustering Using VQ.. - MORI, NAKAGAWA (2001)   (1 citation)  (Correct)

No context found.

J-L. Gauvain, L. Lamel and G. Adda, "Partitioning and transcription of broadcast news data", Proc. ICSLP '99, pp.1335-1338 (1998)


The Alert System: Advanced Broadcast Speech Recognition.. - Rigoll (2001)   (3 citations)  (Correct)

No context found.

J.L. Gauvain, L. Lamel, G. Adda. Partitioning and Transcription of Broadcast News Data. Proc ICSLP'98, Sydney, 1998.


Transcriber: Development and Use of a Tool for.. - Barras, Geoffrois.. (2000)   (18 citations)  (Correct)

No context found.

Gauvain, J.-L., Lamel, L., Adda, G., 1998. Partitioning and Transcription of Broadcast News Data. In: Proc. 5th Int. Conf. on Spoken Language Processing (ICSLP'98), Sydney, Australia, 30 November4 December 1998, pp. 13351338.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC