Results 1  10
of
25
Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces
 Journal of Machine Learning Research
, 2004
"... We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a lowdimensional ..."
Abstract

Cited by 162 (34 self)
 Add to MetaCart
We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a lowdimensional “effective subspace ” for X which retains the statistical relationship between X and Y. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y. We present experiments that compare the performance of the method with conventional methods.
Beyond independent components: trees and clusters
 Journal of Machine Learning Research
, 2003
"... We present a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a treestructured graphical model. This treedependent component analysi ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
(Show Context)
We present a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a treestructured graphical model. This treedependent component analysis (TCA) provides a tractable and flexible approach to weakening the assumption of independence in ICA. In particular, TCA allows the underlying graph to have multiple connected components, and thus the method is able to find “clusters ” of components such that components are dependent within a cluster and independent between clusters. Finally, we make use of a notion of graphical models for time series due to Brillinger (1996) to extend these ideas to the temporal setting. In particular, we are able to fit models that incorporate treestructured dependencies among multiple time series.
Learning metrics via discriminant kernels and multidimensional scaling: Toward expected euclidean representation
 IN: ICML’03. (HONG KONG, 2003)
, 2003
"... Distancebased methods in machine learning and pattern recognition have to rely on a metric distance between points in the input space. Instead of specifying a metric a priori, we seek to learn the metric from data via kernel methods and multidimensional scaling (MDS) techniques. Under the classifi ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Distancebased methods in machine learning and pattern recognition have to rely on a metric distance between points in the input space. Instead of specifying a metric a priori, we seek to learn the metric from data via kernel methods and multidimensional scaling (MDS) techniques. Under the classification setting, we define discriminant kernels on the joint space of input and output spaces and present a specific family of discriminant kernels. This family of discriminant kernels is attractive because the induced metrics are Euclidean and Fisher separable, and MDS techniques can be used to find the lowdimensional Euclidean representations (also called feature vectors) of the induced metrics. Since the feature vectors incorporate information from both input points and their corresponding labels and they enjoy Fisher separability, they are appropriate to be used in distancebased classifiers.
Learning graphical models for stationary time series
 IEEE Transactions on Signal Processing, 52(8):2189 – 2199
, 2004
"... Probabilistic graphical models can be extended to time series by considering probabilistic dependencies between entire time series. For stationary Gaussian time series, the graphical model semantics can be expressed naturally in the frequency domain, leading to interesting families of structured tim ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Probabilistic graphical models can be extended to time series by considering probabilistic dependencies between entire time series. For stationary Gaussian time series, the graphical model semantics can be expressed naturally in the frequency domain, leading to interesting families of structured time series models that are complementary to families defined in the time domain. In this paper, we present an algorithm to learn the structure from data for directed graphical models for stationary Gaussian time series. We describe an algorithm for efficient forecasting for stationary Gaussian time series whose spectral densities factorize in a graphical model. We also explore the relationships between graphical model structure and sparsity, comparing and contrasting the notions of sparsity in the time domain and the frequency domain. Finally, we show how to make use of Mercer kernels in this setting, allowing our ideas to be extended to nonlinear models. 1
The kernel mutual information
 In IEEE ICASSP
, 2003
"... We introduce a new contrast function, the kemel mutual information (KMIj, to measure the degree of independence of continuous random variables. This contrast function provides an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estima ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
We introduce a new contrast function, the kemel mutual information (KMIj, to measure the degree of independence of continuous random variables. This contrast function provides an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estimate of the mutual information between a discretised approximation of the continuous random variables. We show that Bach and Jordan’s kernel generalised variance (KGV) is also an upper bound on the same kernel density estimate, but is looser. Finally, we suggest that the addition of a regularising term in the KGV causes it to approach the KMI, which motivates the introduction of this regularisation. 1.
Probabilistic Distance Measures in Reproducing Kernel Hibert Space
, 2004
"... Probabilistic distance measures are important quantities in many research areas. For example, the Chernoff distance (or the Bhattacharyya distance as its special example) is often used to bound the Bayes error in a pattern classification task and the KullbackLeibler (KL) distance is a key quantity ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Probabilistic distance measures are important quantities in many research areas. For example, the Chernoff distance (or the Bhattacharyya distance as its special example) is often used to bound the Bayes error in a pattern classification task and the KullbackLeibler (KL) distance is a key quantity in the information theory literature. However, computing these distances is a difficult task and analytic solutions are not available except under some special circumstances. One popular example is the Gaussian density. The Gaussian density employs only up to secondorder statistics and hence is rather limited. In this paper, we enhance this capacity through a nonlinear mapping from original data space to reproducing kernel Hilbert space, which is implemented by a kernel embedding. Since this mapping is nonlinear, we present a new approach to study these distances whose feasibility and efficiency are demonstrated using experiments.
A comprehensive assessment of methods for denovo reverseengineering of genomescale regulatory networks
 Genomics
, 2011
"... Denovo reverseengineering of genomescale regulatory networks is an increasingly important objective for biological and translational research. While many methods have been recently developed for this task, their absolute and relative performance remains poorly understood. The present study condu ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Denovo reverseengineering of genomescale regulatory networks is an increasingly important objective for biological and translational research. While many methods have been recently developed for this task, their absolute and relative performance remains poorly understood. The present study conducts a rigorous performance assessment of 32 computational methods/variants for denovo reverseengineering of genomescale regulatory networks by benchmarking these methods in 15 highquality datasets and goldstandards of experimentally verified mechanistic knowledge. The results of this study show that some methods need to be substantially improved upon, while others should be used routinely. Our results also demonstrate that several univariate methods provide a "gatekeeper" performance threshold that should be applied when method developers assess the performance of their novel multivariate algorithms. Finally, the results of this study can be used to show practical utility and to establish guidelines for everyday use of reverseengineering algorithms, aiming towards creation of automated dataanalysis protocols and software systems.
Causal discovery from databases with discrete and continuous variables.
 In Probabilistic Graphical Models,
, 2014
"... ..."
(Show Context)
New dseparation identification results for learning continuous latent variable models
 Proceedings of the 22nd Interational Conference in Machine Learning
, 2005
"... Learning the structure of graphical models is an important task, but one of considerable difficulty when latent variables are involved. Because conditional independences using hidden variables cannot be directly observed, one has to rely on alternative methods to identify the dseparations that defi ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Learning the structure of graphical models is an important task, but one of considerable difficulty when latent variables are involved. Because conditional independences using hidden variables cannot be directly observed, one has to rely on alternative methods to identify the dseparations that define the graphical structure. This paper describes new distributionfree techniques for identifying dseparations in continuous latent variable models when nonlinear dependencies are allowed among hidden variables. 1.
A strategy for making predictions under manipulation
 IN JMLR WORKSHOP AND CONFERENCE PROCEEDINGS, VOLUME 3: WCCI 2008 CAUSALITY CHALLENGE, HONG KONG
"... Our submission for the LOCANET challenge relied on the results and procedures of the first causality challenge, from which the local networks were pruned. Details to the approach used for the first causality challenge are available in the paper for that challenge (available at the DSL website) but a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Our submission for the LOCANET challenge relied on the results and procedures of the first causality challenge, from which the local networks were pruned. Details to the approach used for the first causality challenge are available in the paper for that challenge (available at the DSL website) but a general overview of the method and how the results are used for this task are presented. Preprocessing: The preprocessing was tailored to each data set. For the REGED data set each variable was normalized so its mean was zero and standard deviation was one. For the SIDO data set, the variables were binary and no preprocessing was performed. For the CINA data set, variables that were not binary were treated as continuous and normalized; binary variables were all set to values of zero and one. For the MARTI data set, the preprocessed data by Dr. Guyon available on the challenge website was used. Causal discovery: Once the initial data sets have been preprocessed, the next step of our procedure was to identify the skeleton structure of the Bayesian Network around the target variable recursively using the MMPC algorithm, up to three edges away from the target. This region of interest makes it practical to apply causal algorithms that cannot