Results 1  10
of
73
Measuring statistical dependence with HilbertSchmidt norms
 PROCEEDINGS ALGORITHMIC LEARNING THEORY
, 2005
"... We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the HilbertSchmidt norm of the crosscovariance operator (we term this a HilbertSchmidt Independence Criterion, or HSIC). Th ..."
Abstract

Cited by 157 (43 self)
 Add to MetaCart
We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the HilbertSchmidt norm of the crosscovariance operator (we term this a HilbertSchmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernelbased independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no userdefined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernelbased criteria, and of other recently published ICA methods.
Data driven image models through continuous joint alignment
 PAMI
, 2006
"... This paper presents a family of techniques that we call congealing for modeling image classes from data. The idea is to start with a set of images and make them appear as similar as possible by removing variability along the known axes of variation. This technique can be used to eliminate “nuisance ..."
Abstract

Cited by 87 (4 self)
 Add to MetaCart
(Show Context)
This paper presents a family of techniques that we call congealing for modeling image classes from data. The idea is to start with a set of images and make them appear as similar as possible by removing variability along the known axes of variation. This technique can be used to eliminate “nuisance” variables such as affine deformations from handwritten digits or unwanted bias fields from magnetic resonance images. In addition to separating and modeling the latent images—i.e., the images without the nuisance variables—we can model the nuisance variables themselves, leading to factorized generative image models. When nuisance variable distributions are shared between classes, one can share the knowledge learned in one task with another task, leading to efficient learning. We demonstrate this process by building a handwritten digit classifier from just a single example of each class. In addition to applications in handwritten character recognition, we describe in detail the application of bias removal from magnetic resonance images. Unlike previous methods, we use a separate, nonparametric model for the intensity values at each pixel. This allows us to leverage the data from the MR images of different patients to remove bias from each other. Only very weak assumptions are made about the distributions of intensity values in the images. In addition to the digit and MR applications, we discuss a number of other uses of congealing and describe experiments about the robustness and consistency of the method.
A class of Rényi information estimators for multidimensional densities
 Annals of Statistics
, 2008
"... A class of estimators of the Rényi and Tsallis entropies of an unknown distribution f in R m is presented. These estimators are based on the kth nearestneighbor distances computed from a sample of N i.i.d. vectors with distribution f. We show that entropies of any order q, including Shannon’s entro ..."
Abstract

Cited by 63 (3 self)
 Add to MetaCart
(Show Context)
A class of estimators of the Rényi and Tsallis entropies of an unknown distribution f in R m is presented. These estimators are based on the kth nearestneighbor distances computed from a sample of N i.i.d. vectors with distribution f. We show that entropies of any order q, including Shannon’s entropy, can be estimated consistently with minimal assumptions on f.Moreover, we show that it is straightforward to extend the nearestneighbor method to estimate the statistical distance between two distributions using one i.i.d. sample from each.
Inference of nonoverlapping camera network topology by measuring statistical dependence
 IN PROC. IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION
, 2005
"... We present an approach for inferring the topology of a camera network by measuring statistical dependence between observations in different cameras. Two cameras are considered connected if objects seen departing in one camera are seen arriving in the other. This is captured by the degree of statisti ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
(Show Context)
We present an approach for inferring the topology of a camera network by measuring statistical dependence between observations in different cameras. Two cameras are considered connected if objects seen departing in one camera are seen arriving in the other. This is captured by the degree of statistical dependence between the cameras. The nature of dependence is characterized by the distribution of observation transformations between cameras, such as departure to arrival transition times, and color appearance. We show how to measure statistical dependence when the correspondence between observations in different cameras is unknown. This is accomplished by nonparametric estimates of statistical dependence and Bayesian integration of the unknown correspondence. Our approach generalizes previous work which assumed restricted parametric transition distributions and only implicitly dealt with unknown correspondence. Results are shown on simulated and real data. We also describe a technique for learning the absolute locations of the cameras with Global Positioning System (GPS) side information.
Kernel methods for measuring independence
 Journal of Machine Learning Research
, 2005
"... We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prov ..."
Abstract

Cited by 59 (19 self)
 Add to MetaCart
We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlationbased dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.
Beyond independent components: trees and clusters
 Journal of Machine Learning Research
, 2003
"... We present a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a treestructured graphical model. This treedependent component analysi ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
We present a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a treestructured graphical model. This treedependent component analysis (TCA) provides a tractable and flexible approach to weakening the assumption of independence in ICA. In particular, TCA allows the underlying graph to have multiple connected components, and thus the method is able to find “clusters ” of components such that components are dependent within a cluster and independent between clusters. Finally, we make use of a notion of graphical models for time series due to Brillinger (1996) to extend these ideas to the temporal setting. In particular, we are able to fit models that incorporate treestructured dependencies among multiple time series.
Efficient Variant of Algorithm for FastICA for Independent Component Analysis Attaining the CramérRao Lower Bound
 IEEE Trans. Neural Net
, 2006
"... Abstract—FastICA is one of the most popular algorithms for independent component analysis (ICA), demixing a set of statistically independent sources that have been mixed linearly. A key question is how accurate the method is for finite data samples. We propose an improved version of the FastICA alg ..."
Abstract

Cited by 54 (5 self)
 Add to MetaCart
(Show Context)
Abstract—FastICA is one of the most popular algorithms for independent component analysis (ICA), demixing a set of statistically independent sources that have been mixed linearly. A key question is how accurate the method is for finite data samples. We propose an improved version of the FastICA algorithm which is asymptotically efficient, i.e., its accuracy given by the residual error variance attains the Cramér–Rao lower bound (CRB). The error is thus as small as possible. This result is rigorously proven under the assumption that the probability distribution of the independent signal components belongs to the class of generalized Gaussian (GG) distributions with parameter, denoted GG () for 2. We name the algorithm efficient FastICA (EFICA). Computational complexity of a Matlab implementation of the algorithm is shown to be only slightly (about three times) higher than that of the standard symmetric FastICA. Simulations corroborate these claims and show superior performance of the algorithm compared with algorithm JADE of Cardoso and Souloumiac and nonparametric ICA of Boscolo et al. on separating sources with distributionGG ( ) with arbitrary, as well as on sources with bimodal distribution, and a good performance in separating linearly mixed speech signals. Index Terms—Algorithm FastICA, blind deconvolution, blind source separation, Cramér–Rao lower bound (CRB), independent component analysis (ICA). I.
Factorial coding of natural images: how effective are linear models in removing higherorder dependencies?
 JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A
, 2006
"... The performance of unsupervised learning models for natural images is evaluated quantitatively by means of information theory. We estimate the gain in statistical independence (the multiinformation reduction) achieved with independent component analysis (ICA), principal component analysis (PCA), z ..."
Abstract

Cited by 37 (12 self)
 Add to MetaCart
The performance of unsupervised learning models for natural images is evaluated quantitatively by means of information theory. We estimate the gain in statistical independence (the multiinformation reduction) achieved with independent component analysis (ICA), principal component analysis (PCA), zerophase whitening, and predictive coding. Predictive coding is translated into the transform coding framework, where it can be characterized by the constraint of a triangular filter matrix. A randomly sampled whitening basis and the Haar wavelet are included into the comparison as well. The comparison of all these methods is carried out for different patch sizes, ranging from 2x2 to 16x16 pixels. In spite of large differences in the shape of the basis functions, we find only small differences in the multiinformation between all decorrelation transforms (5% or less) for all patch sizes. Among the secondorder methods, PCA is optimal for small patch sizes and predictive coding performs best for large patch sizes. The extra gain achieved with ICA is always less than 2%. In conclusion, the `edge filters&amp;amp;amp;amp;lsquo; found with ICA lead only to a surprisingly small improvement in terms of its actual objective.
Estimation of Rényi entropy and mutual information based on generalized nearestneighbor graphs
, 2010
"... We present simple and computationally efficient nonparametric estimators of Rényi entropy and mutual information based on an i.i.d. sample drawn from an unknown, absolutely continuous distribution over R d. The estimators are calculated as the sum of pth powers of the Euclidean lengths of the edges ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
We present simple and computationally efficient nonparametric estimators of Rényi entropy and mutual information based on an i.i.d. sample drawn from an unknown, absolutely continuous distribution over R d. The estimators are calculated as the sum of pth powers of the Euclidean lengths of the edges of the ‘generalized nearestneighbor ’ graph of the sample and the empirical copula of the sample respectively. For the first time, we prove the almost sure consistency of these estimators and upper bounds on their rates of convergence, the latter of which under the assumption that the density underlying the sample is Lipschitz continuous. Experiments demonstrate their usefulness in independent subspace analysis. 1
Nonlinear Extraction of Independent Components of Natural Images Using Radial Gaussianization
, 2009
"... We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transforma ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transformation of independent nongaussian sources. Here, we examine a complementary case, in which the source is nongaussian and elliptically symmetric. In this case, no invertible linear transform suffices to decompose the signal into independent components, but we show that a simple nonlinear transformation, which we call radial gaussianization (RG), is able to remove all dependencies. We then examine this methodology in the context of natural image statistics. We first show that distributions of spatially proximal bandpass filter responses are better described as elliptical than as linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either nearby pairs or blocks of bandpass filter responses is significantly greater than that achieved by ICA. Finally, we show that the RG transformation may be closely approximated by divisive normalization, which has been used to model the nonlinear response properties of visual neurons.