| S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion," Proc. Broadcast News Trans. and Under. Workshop , 1998. |
....of these acoustic changes will correspond to speaker turns. The second step is used then to validate or discard these possible turns. The techniques used for the first step can be classified into three different groups: phone decoding [3] hypothesis testing [4] and distance based segmentation [5, 6]. Distance based segmentation approaches have proved to be more robust for non collaborative speaker segmentation, and thus, in this paper we will use an algorithm called DISTBIC (see [6] for details) to segment the audio data. DISTBIC is also a two step segmentation technique, which is inspired ....
....have proved to be more robust for non collaborative speaker segmentation, and thus, in this paper we will use an algorithm called DISTBIC (see [6] for details) to segment the audio data. DISTBIC is also a two step segmentation technique, which is inspired on the BIC algorithm developed by IBM [5]. In the first step the distance between adjacent windows is obtained every 100ms. This result in a distance signal d(t) In our implementation we use the symmetrical KullbackLeibler distance [7] Then, the significant peaks of d(t) are considered as turn candidates. In the second step the turn ....
[Article contains additional citation context not shown here]
S. S. Chen and P. S. Gopalakrishnan, "Speaker environment and channel change detection and clustering via de bayesian information criterion," in DARPA Speech Recognition Workshop, Landsdowne, VA, Feb. 1998, pp. 127--132.
....d(t 2) Figure 1. Sliding windows to speaker turns. The second step is used then to validate or discard these possible turns. The techniques used for the first step can be classified into three different groups: phone decoding ( 7, 8] hypothesis testing ( 14] and distancebased segmentation ([12, 4, 5]) Distance based segmentation approaches have proved to be more robust for noncollaborative speaker segmentation, and thus, in this paper we will use an algorithm called DISTBIC (see [5] for details) to segment the audio data. DISTBIC is also a twostep segmentation technique, which is inspired on ....
....have proved to be more robust for noncollaborative speaker segmentation, and thus, in this paper we will use an algorithm called DISTBIC (see [5] for details) to segment the audio data. DISTBIC is also a twostep segmentation technique, which is inspired on the BIC algorithm developed by IBM ([4]) In the first step the distance between adjacent windows is obtained every 100ms. This result in a distance signal d(t) see Figure 1. In our implementation we use the symmetrical Kullback Leibler [12] distance. The significant peaks of d(t) are considered as turn candidates. In the second step ....
[Article contains additional citation context not shown here]
S. S. Chen and P. S. Gopalakrishnan. Speaker environment and channel change detection and clustering via de bayesian information criterion. In DARPA Speech Recognition Workshop, pages 127--132, Landsdowne, VA, February 1998.
....adjacent windows is obtained every 100ms yielding in a distance signal . In our implementation we use the symmetrical kullback Leibler [4] distance. The significant peaks of 339 are considered as turn candidates. In the second step the turn candidates are validated using the criteria [7]. To that end, the acoustic vectors of adjacent segments are modeled separately using Gaussian models. The model of the union of the acoustic vectors of both segments is also computed and then the criteria is used to check is the likelihood of the union is greater than the likelihood of both ....
S. S. Chen and P. S. Gopalakrishnan, "Speaker environment and channel change detection and clustering via de bayesian information criterion," in DARPA Speech Recognition Workshop, 1998.
....are more clusters than speakers, as a cluster can represent a speaker in a given acoustic environment. The second measure is the cluster purity, defined as the percentage of frames in the given cluster associated with the most represented speaker in the cluster. A similar measure was proposed in [1], but at the segment level. The table shows the weighted average cluster purities for the 4 shows. On average 96 of the data in a cluster comes from a single speaker. When clusters are impure, they tend to include speakers with similar acoustic conditions. The best cluster coverage is a ....
S.S. Chen, P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", DARPA Broadcast News Transcription and Understanding Workshop, pp. 127-132, February 1998.
....impact on errors because if a detected boundary is somewhat displaced and is outside the tolerance window two errors will occur: a deletion and an insertion. This segmentation system is sufficiently accurate and at the same time much less computational intensive than for instance the more used BIC [2, 3] that evaluates three full covariance matrices at each time frame. 3. SPEECH NON SPEECH DISCRIMINATION After the acoustic segmentation stage each segment is classified using a speech non speech discriminator, tagging audio portions without speech, with too much noise or pure music. This ....
....and the two closer ones are considered for joining in a new cluster. Clusters are linked together until the distances exceed a pre defined value. At that point the clustering ends. Several appropriate distance measures can be used, namely the KL2 [1] the generalized likelihood ratio or the BIC [2, 3]. Our first experiments were conducted using the KL2 metrics to evaluate cluster distances. Latter on, we developed a more efficient distance measure based on the BIC. The distance measure when comparing two clusters using the BIC can be stated as a model selection criterion where one model is ....
S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion," in DARPA Proc. Speech Recognition Workshop, 1998.
....identification. Our speaker clustering algorithm makes use of gender detection. Speech segments with different gender classification are clustered separately. We used bottom up hierarchical clustering [4] We developed an efficient distance measure based on the Bayesian Information Criterion (BIC) [7, 8]. An adjacency term is used instead of the BIC threshold [6] Empirically clusters having adjacent speech segments are closer in time and the probability of belonging to the same speaker must be higher. Using this we obtained a cluster purity greater than 97 with a mean number of clusters per ....
S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion, " in DARPA Proc. Speech Recognition Workshop, 1998.
....step produces segments that contain more than one speaker then the speaker models will be estimated incorrectly. Therefore, we are investigating into a new speaker change detection method. Two approaches commonly used for speaker change detection are energy based [1] 2] and distance based [3][4][5] and. The second approach [1] 2] assumes that the probability of a speaker change is higher around silence regions. It uses speech silence detector to identify the speaker change locations. The distance based method searches for the speaker change candidates at the maxima of the distances ....
....label speech (CLD) and the total amount of speech (TS) 1 CLD TS. Since overlapped speech regions belong to both speakers, they are not taken into account in the scoring process. 4 . SYSTEM DESCRIPTION Most of the speaker segmentation systems use Melfrequency Cepstral coefficients [1] 2] 3][4][5] However, in previous experiments using the development database, Line Spectral Pair has shown around 20 improvement over the Mel frequency Cepstral coefficients. Therefore we use 24 LSP [8] coefficients as features for both systems. They are computed every 10ms using a 32ms Hamming window. ....
S. Chen, P. Gopalakrishnan "Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion", DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, VA, 1998.
....television broadcast news, a small part of which has been used in this work. According to our plans, by the end of this year about 100 hours of transcribed material will be available for development and evaluation purposes. 3. SEGMENTATION AND CLUSTERING The Bayesian Information Criterion (BIC) [2] is applied to segment the input audio stream into acoustically homogeneous chunks. Gaussians mixture models are then used to classify segments in terms of acoustic source and channel. Emission probability densities consist of mixtures of European Language Resources Association. ....
....matrices. Observations are 39 dimension vectors (see Section 4.1) Six classes are considered for classification: female male wide band speech, female male narrowband speech, pure music, and silence plus other non speech events. Clustering of speech segments is done by a bottom up scheme [2, 3] that groups segments which are acoustically close with respect to the BIC. As a result, this step should gather segments of the same speaker. To evaluate the segmentation algorithm in detecting the break points, recall and precision are computed with respect to target (manually annotated) ....
Chen, S. S. and Gopalakrishnan, P. S., "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", in Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, 1998.
....as follows: Non speech passages were eliminated using a Gaussian Mixture Model (GMM) decoder that recognizes speech and non speech. Subsequently, the passages of speech are divided at changes in speaker or background conditions using the Bayesian Information Criterion (BIC) as de scribed in [Chen 1998]. The segmentation used in the 1997 HUB4 evaluation was based on using gender dependent phone decoders (PHONE DEC. with additional non speech units (see [Beyerlein 1998] approach WER ( PHONE DEC. SNN (1997) 22.6 GMM BIC bottom up (1998) 21.0 NIST PE ideal cl. 20.0 Table 2: Word ....
S.S. Chen, P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", in Proc. DARPA Broadcast News Transcription and Understanding Workshop, VA, Feb. 1998.
....Each cluster is assumed to identify a speaker or more precisely, a speaker in a given acoustic condition. The segmentation procedures can be classified into three approaches: those based on phone decoding [25, 31, 42] distance based segmentations [29, 40] and methods based on hypothesis testing [12, 43]. Our partitioning approach, which is not based on such a two step procedure, relies on an audio stream mixture model. Each component audio source, representing a speaker in a particular background and channel condition, is in turn modeled by a mixture of Gaussians. The segment boundaries and ....
....are more clusters than speakers, as a cluster can represent a speaker in a given acoustic environment. The second measure is the cluster purity, defined as the percentage of frames in the given cluster associated with the most represented speaker in the cluster. A similar measure was proposed in [12], but at the segment level. The table shows the weighted average cluster purities for the 4 shows. On average 96 of the data in a cluster comes from a single speaker. When clusters are impure, they tend to include speakers with similar acoustic conditions. The best cluster coverage is a ....
S.S. Chen, P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", Proc. DARPA BroadcastNews Transcription & Understanding Workshop, pp. 127-132, February 1998.
....been presented. Many of them include an algorithm for segmenting the audio stream based on some distance measure, like the Hotelling s T test [1] the Kullback Leibler distance [2, 3] the generalized likelihood ratio [4] the entropy loss [3] and the Bayesian Information Criterion (BIC) [5, 6, 7, 8, 9, 10]. In the BIC, a threshold is usually introduced to tune the algorithm to the particular data under processing. This This work was partially financed by the European Commission under the projects CORETEX (IST 1999 11876) and ECHO (IST 1999 11994) means that whenever the segmenter is ported on a ....
....Segmenting an audio stream means to detect the time indexes corresponding to changes in the nature of audio, in order to isolate segments that are homogeneous in terms of bandwidth and speaker. Our technique bases segmentation on a statistical model selection criterion, by applying the BIC [13, 5]. Let x 1 : x k : xn be an ordered sample of data in the d space, and let k be the index supposed to be a change. The decision goes through the definition of two different statistical models, a two segment model M k which assumes: x 1 : x k iid N d (x; 1 ; 1 ) 1) x k 1 : ....
S. S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion," in Proc. of the DARPA Broadcast News Transcr. & Underst. Workshop, Lansdowne, VA, 1998.
....distance between two consecutive parts of the speech signal have been investigated in [4, 5, 6] These segmentation algorithms suffer from a lack of stability since they rely on thresholding distance values. A segmentation algorithm based on the Bayesian Information Criterion (BIC) is presented in [7], but proves to require long speech segments. In this paper, we propose an algorithm which takes advantages of the two latter types of segmentation techniques. A first pass is operated, which essentially consists in a distance based segmentation to detect the most likely speaker changing points. ....
....general. It results in an over segmentation and hence, short segments. A second pass is therefore introduced, which aims at refining this segmentation before performing hierarchical clustering. Consecutive segments will be merged if they satisfy the Bayesian Information Criterion (BIC) detailed in [7]. The BIC is a likelihood criterion penalized by the model complexity. Given Z = fz 1 ; z NZ g a sequence of NZ cepstral acoustic vectors and L(Z; M ) the likelihood of Z for the model M . The BIC for the model M is given by: BIC(M ) log L(Z; M ) Gamma m log NZ where m is the number ....
[Article contains additional citation context not shown here]
S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion," in DARPA speech recognition workshop, 1998.
....technique. Keywords: segmentation, likelihood ratio 1. INTRODUCTION The speaker based segmentation consists in obtaining speaker homogeneous segments: resulting segments should be as long as possible and related to a single speaker. This problem received attention recently in the literature [1, 2, 3, 4, 5] since it can be used as a preliminary step in several indexing applications: news transcription tasks [6, 7] automatic grouping speech messages[8] or speaker tracking [9, 10] The segmentation algorithm we describe in this paper is designed to be embedded in a speaker based indexing system. The ....
....This work was supported by the Centre National d Etudes des Telecommunications (CNET) under the grant n o 98 1B Bayesian Information Criterion is used during a second pass to validate or discard the previously detected changing points. This criterion has been used for segmentation by S. Chen in [3], but proves to require long speech segments ( 3s) Section 2 details our segmentation technique. Performances of this segmentation algorithm are assessed in section 3 with criteria described in section 3.2. Results are commented in section 3.3 and comparison with the algorithm proposed by S.Chen ....
[Article contains additional citation context not shown here]
S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," in DARPA speech recognition workshop, 1998.
....between multiple transformation classes are learned from the training set and used in testing to incorporate the adaptation data from neighboring classes for the estimation of the transform of a target class. For the Broadcast News task, clustering techniques with various distance metrics are used [4, 5, 6, 7] to find pools of adaptation data that will share transformations. Common among most of these approaches is that Text Independent Gaussian Mixture Models (TIGMMs) are used to characterize the individual adaptation data chunks (i.e. the smallest fragments of adaptation data) The second approach is ....
....found by the MLLR based clustering algorithm reached 45.9 of this range in comparison to 38.9 using the TIGMM approach. Grouping the messages in 120 clusters using the supervisory information about speaker identity resulted in 43.2 of this range. Using the evaluation criterion reported in [5], cluster purity can be computed as the percentage of messages in a cluster that are from the most frequently represented speaker (dominating speaker) in that cluster. Figure 1 shows this metric for the 120 cluster configuration found by the TIGMM approach, figure 2 for the MLLR based approach. ....
S. Chen and P. S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion," Proc. DARPA Speech Recognition Workshop, pp. 127-132, 1998.
....was used to break the input into chunks that are 20 to 30 seconds long. These chunks were chopped into 2 to 30 seconds long based on silences produced from a fast word recognizer. These segments were further refined using a speaker change detector. 4. 2 Speaker change detection Motivated by [5], we explored both the BIC algorithm described in [5] and a related, but somewhat simpler, method based on Hotelling s T 2 test [3] Assume C(t) 0 t T, is a feature sequence, b is a speaker change point, C(t) 0t b , C(t) b tT , and C(t) 0tT are from Gaussian sources with 1 , 2 , ....
....30 seconds long. These chunks were chopped into 2 to 30 seconds long based on silences produced from a fast word recognizer. These segments were further refined using a speaker change detector. 4. 2 Speaker change detection Motivated by [5] we explored both the BIC algorithm described in [5], and a related, but somewhat simpler, method based on Hotelling s T 2 test [3] Assume C(t) 0 t T, is a feature sequence, b is a speaker change point, C(t) 0t b , C(t) b tT , and C(t) 0tT are from Gaussian sources with 1 , 2 , and as mean and 1 2 , and as covariance. In the ....
[Article contains additional citation context not shown here]
S. Chen and P. Gopalakrishnan "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", Proceedings of the Broadcast News transcription and Understanding Workshop, Lansdowne, VA, pp. 127-132, Feb. 1998.
....follows: ffl Non speech passages were eliminated using a Gaussian Mixture Model (GMM) decoder that recognizes speech and non speech. ffl Subsequently, the passages of speech are divided at changes in speaker or background conditions using the Bayesian Information Criterion (BIC) as described in [Chen 1998]. The segmentation used in the 1997 HUB4 evaluation was based on using gender dependent phone decoders (PHONE DEC. with additional non speech units (see [Beyerlein 1998] For a comparison of these two segmentations with the ideal NIST PE segmentation, the following quantities were ....
S.S. Chen, P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", in Proc. DARPA Broadcast News Transcription and Understanding Workshop, VA, Feb. 1998.
....information available in the distance between the two clusters to which the segments belong. The interaction between the acoustic change detection and clustering systems gives us a substantial improvement over results previously reported on the 1997 Hub 4 Broadcast News test set that we employed [1][2] Feedback of clustering information improved the Equal Error Rate (EER) of our acoustic change detection (ACD) system from 26.5 to 18 . 1. INTRODUCTION We have been working with the well know Broadcast News database [4] of TV and radio recordings. This material contains multiple speakers, a ....
....break points from the hypothesis generator) an actual break point is inserted by comparing the fit of a single multidimensional Gaussian model for the entire segment with separate models for each side of the break. We compare these alternatives using the Bayesian Information Criterion (BIC) [1], a likelihood measurement penalized by the complexity of the assumed model. Given a set of N vectors X= x I : i=0. N 1 that we are trying to represent through a model M, the BIC score would be: BIC M L X M M N ( log ( # ( log( l 2 where the penalty weight l should ....
[Article contains additional citation context not shown here]
Scott Shaobing Chen, P.S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion", 1998 DARPA Broadcast News Transcription & Understanding Workshop.
....N and the test is applied again. This process is repeated until a turn is found, but when no turn is found and the window size exceeds a predefined size without a segment boundary, the earliest frames are dropped from consideration but their statistics are merged with that of the new window. See [4] for details. 4. AUDIO BASED SPEAKER IDENTIFICATION Speaker enrollment is a prelude to audio based speaker identification. Utterances, represented by mel cepstral vectors, from each speaker of interest are modeled as a mixture of Gaussian distributions through a clustering process [5] Let fM i ....
S. S. Chen and P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering Via the Bayesian Information Criterion," Proc., DARPA Workshop, 1998, pp. 127--132.
....show, as a cluster can represent a speaker in a given acoustic environment. We looked at two measures of cluster homogeneity: the cluster purity, defined as the percentage of frames in the given cluster associated with the most represented speaker in the cluster (A similar measure was proposed in [2], but at the segment level. and the best cluster coverage which is a measure of the dispersion of a given speaker s data across clusters. The average cluster purity for eval96 test data was 96 . Impure clusters tend to merge data with similar acoustic conditions. The best cluster coverage was ....
S.S. Chen, P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", DARPA Broadcast News Transcription & Understanding Workshop, pp. 127-132, Feb. 1998.
....are more clusters than speakers, as a cluster can represent a speaker in a given acoustic environment. The second measure is the cluster purity, defined as the percentage of frames in the given cluster associated with the most represented speaker in the cluster. A similar measure was proposed in [1], but at the segment level. The table shows the weighted average cluster purities for the 4 shows. On average 96 of the data in a cluster comes from a single speaker. When clusters are impure, they tend to include speakers with similar acoustic conditions. The best cluster coverage is a ....
S.S. Chen, P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", DARPA Broadcast News Transcription & Understanding Workshop, pp. 127-132, Feb. 1998.
....speech) or changes in background conditionsoccurring while a speaker is speaking. For example, hesitation or volume change may over segment the speech of the single speaker. The segmentation engine uses the Bayesian Information Criterion (BIC) to partition the frames produced by the frontend [8]. The basic problem may be viewed as a two class classification where the object is to determine whether N consecutive audio frames constitute a single homogeneous window of frames (or segment) W or two such windows: W 1 and W 2 with the boundary frame or turn occurring at the ith frame. In ....
S. S. Chen, et al. "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion." Proc. DARPA Workshop, 1998.
....clusters per file. There are more clusters than speakers, as a cluster can represent a speaker in a given acoustic environment. We define the cluster purity to be the of frames in the given cluster coming from the most represented speaker in the cluster. A similar measure was proposed in [1], but at the segment level. The table shows the weighted average cluster purities for the 4 shows. When clusters are impure, they tend to include speakers with similar acoustic conditions. The best cluster coverage is a measure of the dispersion of a given speaker s data across clusters. We ....
S.S. Chen, P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", DARPA Broadcast News Transcription and Understanding Workshop, pp. 127-132, Feb. 1998.
....is recognized should be also segmented according to speaker identity. Automatic segmentation of an audio stream is frequently introduced as an efficient method to improve performances of adaptative speech recognizers. The problem of acoustic segmentation has been often addressed for a few years [7, 8, 11, 5, 6, 9] : it con This work was supported by the ESPRIT Long Term Research Project THISL (23495) 1 http: www.dcs.shef.ac.uk research groups spandh projects thisl Chop Front End Re cluster MEL Cepstrums Segments Limits Segments Speech Signal Chop and Recluster Clusters Figure 1: Chop and Recluster ....
....already been proposed to perform automatic segmentation of an audio stream according to speaker identity. These algorithms are generally based on a Chop or splitting procedure followed by a Recluster or merging procedure. There are many approaches to detect acoustic speaker changes, as reported in [6] : Decoder based splitting. The segments boundaries are set according to information provided by a recognizer which decodes the spoken audio stream at first (e.g. possible speaker changes are at every silence locations) Model based splitting. If speaker models are trained beforehand for every ....
[Article contains additional citation context not shown here]
S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion", Proc. of DARPA Broadcast News Transcription and Understanding Workshop, 1998.
....algorithms based on a distance between two consecutive parts of the speech signal have been investigated in [5, 6, 7] The problem, then, lies in the choice of a relevant threshold for distance values. A segmentation algorithm based on the Bayesian Information Criterion (BIC) is presented in [8], but proves to require long speech segments. Our segmentation technique takes advantages of these two types of segmentation techniques. First, a distance based segmentation combined with a thresholding process as robust as possible, is operated to detect the most likely speaker changing points. ....
....one speaker only. 2.1. The Bayesian Information Criterion procedure The first technique for dissimilarity measurement is based on the comparison of two parametric statistical models corresponding to two adjacent windows. This comparison is performed using BIC computation, proposed by Chen in [8]. The BIC is a likelihood criterion penalized by the model complexity. Given X = fx 1 ; xng a sequence of NX acoustic vectors, and L(X ; M ) the likelihood of X for the model M , the BIC value is determined by: BIC(M ) log L(X ; M ) Gamma m 2 log NX , where m is the number of ....
S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," in DARPA speech recognition workshop, 1998.
....HMM states were built using the relatively clean data from the F0 and F1 conditions, whereas the Gaussian mixtures were trained on the complete set of training data. We also designed a successful automatic segmentation and clustering algorithm which is based on the Bayesian information criterion [5]. After baseline decoding, we performed iterative MLLR unsupervised adaptation on both means and variances [9] for each cluster. In this paper we present algorithmic improvements we have made this year. Some of the highlights are: Bayesian Information Criteria (BIC) applied to choosing the number ....
....compared with the existing phone set. The results were significantly worse, cf. Table 10, but as seen in section 5.4. it helped yield an improved system when mixed with other pre existing systems using rover. 5. 1998 IBM SYSTEM 5.1. Segmentation We first applied the BIC change detection scheme [5] to detect acoustic changes in the data. According to the detected changes, the entire audio stream was chopped into turns. Because some turns were quite long, we further chop each turn into smaller segments according to the silence information; also the silence information was used to prevent ....
[Article contains additional citation context not shown here]
S. Chen et al, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", Proc. of DARPA Speech Recognition Workshop, Feb 8--11, Lansdowne VA, 1998.
....as described in section . The more detailed, second pass decode which makes use of the adaptation transformation calculated for each cluster is described in section . 5.1. Segmentation and Music Detection The Bayesian Information Criterion (BIC) is used to detect acoustic changes in the data [3]; the unpartitioned audio stream is divided into segments based on the times at which changes are detected. Once segmented, the data is classified as one of five acoustic conditions, one of which is pure music, by means of a Gaussian mixture classifier [4] The single model for music segments ....
....to accumulate enough data to robustly perform adaptation, with one adaptation transformation estimated for each cluster. The segments are clustered using a maximum linkage, bottom up clustering procedure with a single Gaussian model for each segment and a log likelihood ratio distance measure [3]. The bottom up clustering procedure terminates where the BIC criterion reaches its maximum. The real time factor is approximately . 5.3. First Pass Decode The first pass decode, whose output serves as the input script for adaptation, is tuned to run at slightly less than 1.8 times real ....
S. Chen et al., "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion ", Proc. of DARPA Speech Recognition Workshop, Feb 811, Lansdowne VA, 1998.
No context found.
S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion," Proc. Broadcast News Trans. and Under. Workshop , 1998.
No context found.
S. S. Chen and P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering Via the Bayesian Information Criterion," Proc., DARPA Workshop, 1998, pp. 127--132.
No context found.
S.S. Chen and P.S. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion," Proceedings of DARPA Landsdowne,VA, Feb. 1998.
No context found.
S. S. Chen and P. S. Gopalakrishnan, "Speaker Environment And Channel Change Detection And Clustering Via The Bayesian Information Criterion," in DARPA Speech Recognition Workshop Proc., 1998.
No context found.
S. S. Chen and P. S. Gopalakrishnan, "Speaker environment and channel change detection and clustering via de bayesian information criterion," in DARPA Speech Recognition Workshop, Landsdowne, VA, Feb. 1998, pp. 127--132.
No context found.
S. S. Chen and P. S. Gopalakrishnan, \Speaker environment and channel change detection and clustering via de bayesian information criterion," in DARPA Speech Recognition Workshop, 1998.
No context found.
S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," in Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998.
No context found.
S. S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," Tech. Rep., IBM T.J. Watson Research Center, 1998.
No context found.
S. Chen, P. Gopalakrishnan, "Speaker, environmentand channel change detection and clustering via the Bayesian Information Criterion", Proc. DARPA Speech Recognition Workshop pp.127-132 (1998)
No context found.
S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion," Proc. Broadcast News Trans. and Under. Workshop , 1998.
No context found.
S. Chen, P. Gopalakrishnan, "Speaker, Environment and Channel Change Detection and Clustering via The Bayesian Information Criterion," Proc. Broadcast News Trans. & Under. Workshop, 1998.
No context found.
S. S. Chen and P. S. Gopalkrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion," IBM Technical Journal, 1998.
No context found.
S. S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," Tech. Rep., IBM T.J. Watson Research Center, 1998.
No context found.
S. S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," Tech. Rep., IBM T.J. Watson Research Center, 1998.
No context found.
S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the bayesian information criterion," in Proceedings of Broadcast News Transcription and Understanding Workshop, Feb 1998.
No context found.
S. S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," Tech. Rep., IBM T.J. Watson Research Center, 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC