Results 1  10
of
63
Information Theoretic Approach to Speaker Diarization of Meeting Data
 IEEE Trans. on Audio, Speech and Language Processing
, 2009
"... Abstract—A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the Information Bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the par ..."
Abstract

Cited by 27 (14 self)
 Add to MetaCart
(Show Context)
Abstract—A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the Information Bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen–Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a stateoftheart system on the NIST RT06 (Rich Transcription) data set for speaker diarization of meetings. The IBbased system achieves a diarization error rate of 23.2 % compared to 23.6 % for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in fasterthanrealtime diarization. Index Terms—Information bottleneck (IB), meetings data, speaker diarization. I.
Maximum likelihood and the information bottleneck
 Advances in Neural Information Processing Systems 15
, 2002
"... The information bottleneck (IB) method is an informationtheoretic formulation for clustering problems. Given a joint distribution ¢¤£¦¥¨§�©� � , this method constructs a new variable � that defines partitions over the values of � that are informative about �. Maximum likelihood (ML) of mixture mode ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
(Show Context)
The information bottleneck (IB) method is an informationtheoretic formulation for clustering problems. Given a joint distribution ¢¤£¦¥¨§�©� � , this method constructs a new variable � that defines partitions over the values of � that are informative about �. Maximum likelihood (ML) of mixture models is a standard statistical approach to clustering problems. In this paper, we ask: how are the two methods related? We define a simple mapping between the IB problem and the ML problem for the multinomial mixture model. We show that under this mapping the problems are strongly related. In fact, for uniform input distribution over � or for large sample size, the problems are mathematically equivalent. Specifically, in these cases, every fixed point of the IBfunctional defines a fixed point of the (log) likelihood and vice versa. Moreover, the values of the functionals at the fixed points are equal under simple transformations. As a result, in these cases, every algorithm that solves one of the problems, induces a solution for the other. 1
Information bottleneck for gaussian variables
 in Advances in Neural Information Processing Systems 16
, 2003
"... ∗ Both authors contributed equally The problem of extracting the relevant aspects of data was addressed through the information bottleneck (IB) method, by (soft) clustering one variable while preserving information about another relevance variable. An interesting question addressed in the current ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
∗ Both authors contributed equally The problem of extracting the relevant aspects of data was addressed through the information bottleneck (IB) method, by (soft) clustering one variable while preserving information about another relevance variable. An interesting question addressed in the current work is the extension of these ideas to obtain continuous representations (embeddings) that preserve relevant information, rather than discrete clusters. We give a formal definition of the general continuous IB problem and obtain an analytic solution for the optimal representation for the important case of multivariate Gaussian variables. The obtained optimal representation is a noisy linear projection to eigenvectors of the normalized correlation matrix Σ xyΣ −1 x, which is also the basis obtained in Canonical Correlation Analysis. However, in Gaussian IB, the compression tradeoff parameter uniquely determines the dimension, as well as the scale of each eigenvector. This introduces a novel interpretation where solutions of different ranks lie on a continuum parametrized by the compression level. Our analysis also provides analytic expression for the optimal tradeoff the information curve in terms of the eigenvalue spectrum. 1
Associative clustering for exploring dependencies between functional genomics data sets
 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
, 2005
"... ..."
(Show Context)
Spiking neurons can learn to solve information bottleneck problems and to extract independent components
 Neural Comput
, 2009
"... ..."
(Show Context)
PACBayesian Analysis of Coclustering and Beyond
"... We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approa ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
(Show Context)
We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in cooccurrence matrices. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance in the MovieLens collaborative filtering task. Our coclustering model can also be seen as matrix trifactorization and the results provide generalization bounds, regularization
An information theoretic tradeoff between complexity and accuracy
 In Proceedings of the COLT
, 2003
"... ..."
(Show Context)
Information bottleneck for non cooccurrence data
 In Advances in Neural Information Processing Systems 19
, 2007
"... We present a general modelindependent approach to the analysis of data in cases when these data do not appear in the form of cooccurrence of two variables X, Y, but rather as a sample of values of an unknown (stochastic) function Z(X, Y). For example, in gene expression data, the expression level ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
We present a general modelindependent approach to the analysis of data in cases when these data do not appear in the form of cooccurrence of two variables X, Y, but rather as a sample of values of an unknown (stochastic) function Z(X, Y). For example, in gene expression data, the expression level Z is a function of gene X and condition Y; or in movie ratings data the rating Z is a function of viewer X and movie Y. The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of cooccurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDLlike model complexity control and prediction of missing values of Z. The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative filtering) it improves the currently best known results. 1
Cluster ranking with an application to mining mailbox networks
 In ICDM ’06: Proceedings of the Sixth International Conference on Data Mining
, 2006
"... We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters—the integrated cohesion—which ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters—the integrated cohesion—which is applicable to arbitrary weighted networks. We then present CRank: a new cluster ranking algorithm. Given a network with arbitrary pairwise similarity weights, CRank creates a list of overlapping clusters and ranks them by their integrated cohesion. We provide extensive theoretical and empirical analysis of CRank and show that it is likely to have high precision and recall. A main component of CRank is a heuristic algorithm for finding sparse vertex separators. At the core of this algorithm is a new connection between the well known measure of vertex betweenness and multicommodity flow. Our experiments focus on mining mailbox networks. A mailbox network is an egocentric social network, consisting of contacts with whom an individual exchanges email. Ties among contacts are represented by the frequency of their co–occurrence on message headers. CRank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths. We demonstrate the effectiveness of CRank on the Enron data set, consisting of 130 mailbox networks. 1
Learning and generalization with the information bottleneck method
, 2008
"... The Information Bottleneck (IB) method, introduced in [22], is an informationtheoretic framework for extracting relevant components of an ‘input ’ random variable X, with respect to an ‘output ’ random variable Y. This is performed by finding a compressed, nonparametric and modelindependent repres ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
The Information Bottleneck (IB) method, introduced in [22], is an informationtheoretic framework for extracting relevant components of an ‘input ’ random variable X, with respect to an ‘output ’ random variable Y. This is performed by finding a compressed, nonparametric and modelindependent representation