Results 1 - 10
of
15
Constrained Bayesian Inference for Low Rank Multitask Learning
"... We present a novel approach for constrained Bayesian inference. Unlike current methods, our approach does not require convexity of the constraint set. We reduce the constrained variational inference to a parametric optimization over the feasible set of densities and propose a general recipe for such ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
We present a novel approach for constrained Bayesian inference. Unlike current methods, our approach does not require convexity of the constraint set. We reduce the constrained variational inference to a parametric optimization over the feasible set of densities and propose a general recipe for such problems. We apply the proposed constrained Bayesian inference approach to multitask learning subject to rank constraints on the weight matrix. Further, constrained parameter estimation is applied to recover the sparse conditional independence structure encoded by prior precision matrices. Our approach is motivated by reverse inference for high dimensional functional neuroimaging, a domain where the high dimensionality and small number of examples requires the use of constraints to ensure meaningful and effective models. For this application, we propose a model that jointly learns a weight matrix and the prior inverse covariance structure between different tasks. We present experimental validation showing that the proposed approach outperforms strong baseline models in terms of predictive performance and structure recovery. 1
The Bigraphical Lasso
"... The i.i.d. assumption in machine learning is endemic, but often flawed. Complex data sets exhibit partial correlations between both instances and features. A model specifying both types of correlation can have a number of parameters that scales quadratically with the number of features and data poin ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The i.i.d. assumption in machine learning is endemic, but often flawed. Complex data sets exhibit partial correlations between both instances and features. A model specifying both types of correlation can have a number of parameters that scales quadratically with the number of features and data points. We introduce the bigraphical lasso, an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. A prominent product in spectral graph theory, this structure has appealing properties for regression, enhanced sparsity and interpretability. To deal with the parameter explosion we introduce ℓ1 penalties and fit the model through a flip-flop algorithm that results in a linear number of lasso regressions. We demonstrate the performance of our approach with simulations and an example from the COIL image data set. 1.
Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods
"... Gaussian processes (GPs) are a flexible class of methods with state of the art performance on spatial statistics applications. However, GPs re-quire O(n3) computations and O(n2) storage, and popular GP kernels are typically limited to smoothing and interpolation. To address these difficulties, Krone ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Gaussian processes (GPs) are a flexible class of methods with state of the art performance on spatial statistics applications. However, GPs re-quire O(n3) computations and O(n2) storage, and popular GP kernels are typically limited to smoothing and interpolation. To address these difficulties, Kronecker methods have been used to exploit structure in the GP covariance ma-trix for scalability, while allowing for expres-sive kernel learning (Wilson et al., 2014). How-ever, fast Kronecker methods have been confined to Gaussian likelihoods. We propose new scal-able Kronecker methods for Gaussian processes with non-Gaussian likelihoods, using a Laplace approximation which involves linear conjugate gradients for inference, and a lower bound on the GP marginal likelihood for kernel learning. Our approach has near linear scaling, requir-ing O(DnD+1D) operations and O(Dn 2D) stor-age, for n training data-points on a dense D> 1 dimensional grid. Moreover, we introduce a log Gaussian Cox process, with highly expres-sive kernels, for modelling spatiotemporal count processes, and apply it to a point pattern (n = 233,088) of a decade of crime events in Chicago. Using our model, we discover spatially varying multiscale seasonal trends and produce highly accurate long-range local area forecasts. 1.
Fast Near-GRID Gaussian Process Regression
"... Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and prediction. One drawback is its O(N 3) computational complexity for both prediction and hyperparameter estimation for N input points which has led to much work in sparse GPR methods. In case that the cova ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and prediction. One drawback is its O(N 3) computational complexity for both prediction and hyperparameter estimation for N input points which has led to much work in sparse GPR methods. In case that the covariance function is expressible as a tensor product kernel (TPK) and the inputs form a multidimensional grid, it was shown that the costs for exact GPR can be reduced to a sub-quadratic function of N. We extend these exact fast algorithms to sparse GPR and remark on a connection to Gaussian process latent variable models (GPLVMs). In practice, the inputs may also violate the multidimensional grid constraints so we pose and efficiently solve missing and extra data problems for both exact and sparse grid GPR. We demonstrate our method on synthetic, text scan, and magnetic resonance imaging (MRI) data reconstructions. 1
A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data
"... The ability to determine patient acuity (or severity of illness) has immediate practical use for clinicians. We evaluate the use of multivariate timeseries modeling with the multi-task Gaussian process (GP) models using noisy, incomplete, sparse, heterogeneous and unevenly-sampled clinical data, inc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
The ability to determine patient acuity (or severity of illness) has immediate practical use for clinicians. We evaluate the use of multivariate timeseries modeling with the multi-task Gaussian process (GP) models using noisy, incomplete, sparse, heterogeneous and unevenly-sampled clinical data, including both physiological sig-nals and clinical notes. The learned multi-task GP (MTGP) hyperparameters are then used to assess and forecast patient acuity. Experiments were conducted with two real clinical data sets acquired from ICU pa-tients: firstly, estimating cerebrovascular pressure reac-tivity, an important indicator of secondary damage for traumatic brain injury patients, by learning the inter-actions between intracranial pressure and mean arterial blood pressure signals, and secondly, mortality predic-tion using clinical progress notes. In both cases, MTGPs provided improved results: an MTGP model provided better results than single-task GP models for signal in-terpolation and forecasting (0.91 vs 0.69 RMSE), and the use of MTGP hyperparameters obtained improved results when used as additional classification features (0.812 vs 0.788 AUC). 1
Fast laplace approximation for gaussian processes with a tensor product kernel
- In Proceedings of 22th Benelux Conference on Artificial Intelligence (BNAIC
, 2014
"... The following full text is a preprint version which may differ from the publisher's version. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
The following full text is a preprint version which may differ from the publisher's version.
Constrained relative entropy minimization with applications to multitask learning
, 2013
"... Copyright by ..."
(Show Context)
Learning the dependency structure of latent factors
- In Advances in Neural Information Processing Systems 25
, 2012
"... Abstract In this paper, we study latent factor models with dependency structure in the latent space. We propose a general learning framework which induces sparsity on the undirected graphical model imposed on the vector of latent factors. A novel latent factor model SLFA is then proposed as a matri ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract In this paper, we study latent factor models with dependency structure in the latent space. We propose a general learning framework which induces sparsity on the undirected graphical model imposed on the vector of latent factors. A novel latent factor model SLFA is then proposed as a matrix factorization problem with a special regularization term that encourages collaborative reconstruction. The main benefit (novelty) of the model is that we can simultaneously learn the lowerdimensional representation for data and model the pairwise relationships between latent factors explicitly. An on-line learning algorithm is devised to make the model feasible for large-scale learning problems. Experimental results on two synthetic data and two real-world data sets demonstrate that pairwise relationships and latent factors learned by our model provide a more structured way of exploring high-dimensional data, and the learned representations achieve the state-of-the-art classification performance.
Multiple Output Regression with Latent Noise
, 2016
"... Abstract In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regres ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regression weights are needed. We note that both can be formulated in a natural way in a latent variable model, in which both the interesting signal and the noise are mediated through the same latent factors. Under this assumption, the signal model then borrows strength from the noise model by encouraging similar effects on correlated targets. We introduce a hyperparameter for the latent signal-to-noise ratio which turns out to be important for modelling weak signals, and an ordered infinitedimensional shrinkage prior that resolves the rotational unidentifiability in reduced-rank regression models. Simulations and prediction experiments with metabolite, gene expression, FMRI measurement, and macroeconomic time series data show that our model equals or exceeds the state-of-the-art performance and, in particular, outperforms the standard approach of assuming independent noise and signal models.
Multivariate Temporal Symptomatic Characterization of Cardiac Arrest
"... Abstract — We model the temporal symptomatic characteristics of 171 cardiac arrest patients in Intensive Care Units. The temporal and feature dependencies in the data are illustrated using a mixture of matrix normal distributions. We found that the cardiac arrest temporal signature is best summarize ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — We model the temporal symptomatic characteristics of 171 cardiac arrest patients in Intensive Care Units. The temporal and feature dependencies in the data are illustrated using a mixture of matrix normal distributions. We found that the cardiac arrest temporal signature is best summarized with six hours data prior to cardiac arrest events, and its statistical descriptions are significantly different from the measurements taken in the past two days. This matrix normal model can classify these patterns better than logistic regressions with lagged features. I.