Results 1  10
of
29
DistributionFree Distribution Regression
"... ‘Distribution regression ’ refers to the situation where a response Y depends on a covariate P where P is a probability distribution. The model is Y = f(P) + µ where f is an unknown regression function and µ is a random error. Typically, we do not observe P directly, but rather, we observe a sample ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
‘Distribution regression ’ refers to the situation where a response Y depends on a covariate P where P is a probability distribution. The model is Y = f(P) + µ where f is an unknown regression function and µ is a random error. Typically, we do not observe P directly, but rather, we observe a sample from P. In this paper we develop theory and methods for distributionfree versions of distribution regression. This means that we do not make strong distributional assumptions about the error term µ and covariate P. We prove that when the effective dimension is small enough (as measured by the doubling dimension), then the excess prediction risk converges to zero with a polynomial rate. 1
Distribution to Distribution Regression
"... We analyze ‘Distribution to Distribution regression’ where one is regressing a mapping where both the covariate (inputs) and response (outputs) are distributions. No parameters on the input or output distributions are assumed, nor are any strong assumptions made on the measure from which input distr ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
We analyze ‘Distribution to Distribution regression’ where one is regressing a mapping where both the covariate (inputs) and response (outputs) are distributions. No parameters on the input or output distributions are assumed, nor are any strong assumptions made on the measure from which input distributions are drawn from. We develop an estimator and derive an upper bound for the L2 risk; also, we show that when the effective dimension is small enough (as measured by the doubling dimension), then the risk converges to zero with a polynomial rate. 1.
Oneclass support measure machines for group anomaly detection
 In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI
, 1987
"... We propose oneclass support measure machines (OCSMMs) for group anomaly detection. Unlike traditional anomaly detection, OCSMMs aim at recognizing anomalous aggregate behaviors of data points. The OCSMMs generalize wellknown oneclass support vector machines (OCSVMs) to a space of probability ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We propose oneclass support measure machines (OCSMMs) for group anomaly detection. Unlike traditional anomaly detection, OCSMMs aim at recognizing anomalous aggregate behaviors of data points. The OCSMMs generalize wellknown oneclass support vector machines (OCSVMs) to a space of probability measures. By formulating the problem as quantile estimation on distributions, we can establish interesting connections to the OCSVMs and variable kernel density estimators (VKDEs) over the input space on which the distributions are defined, bridging the gap between largemargin methods and kernel density estimators. In particular, we show that various types of VKDEs can be considered as solutions to a class of regularization problems studied in this paper. Experiments on Sloan Digital Sky Survey dataset and High Energy Particle Physics dataset demonstrate the benefits of the proposed framework in realworld applications. 1
Domain Generalization via Invariant Feature Representation
"... This paper investigates domain generalization: How to take knowledge acquired from an arbitrary number of related domains and apply it to previously unseen domains? We propose DomainInvariant Component Analysis (DICA), a kernelbased optimization algorithm that learns an invariant transformation by ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
This paper investigates domain generalization: How to take knowledge acquired from an arbitrary number of related domains and apply it to previously unseen domains? We propose DomainInvariant Component Analysis (DICA), a kernelbased optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship between input and output variables. A learningtheoretic analysis shows that reducing dissimilarity improves the expected generalization ability of classifiers on new domains, motivating the proposed algorithm. Experimental results on synthetic and realworld datasets demonstrate that DICA successfully learns invariant features and improves classifier performance in practice. 1.
Kernel Mean Estimation and Stein Effect
"... A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is an important part of many algorithms ranging from kernel principal component analysis to Hilbertspace embedding of distributions. Given a finite sample, an empirical average is the standard estimate for the true ke ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is an important part of many algorithms ranging from kernel principal component analysis to Hilbertspace embedding of distributions. Given a finite sample, an empirical average is the standard estimate for the true kernel mean. We show that this estimator can be improved due to a wellknown phenomenon in statistics called Stein’s phenomenon. After consideration, our theoretical analysis reveals the existence of a wide class of estimators that are better than the standard one. Focusing on a subset of this class, we propose efficient shrinkage estimators for the kernel mean. Empirical evaluations on several applications clearly demonstrate that the proposed estimators outperform the standard kernel mean estimator. 1.
Kernel Mean Estimation via Spectral Filtering
"... The problem of estimating the kernel mean in a reproducing kernel Hilbert space (RKHS) is central to kernel methods in that it is used by classical approaches (e.g., when centering a kernel PCA matrix), and it also forms the core inference step of modern kernel methods (e.g., kernelbased nonparame ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The problem of estimating the kernel mean in a reproducing kernel Hilbert space (RKHS) is central to kernel methods in that it is used by classical approaches (e.g., when centering a kernel PCA matrix), and it also forms the core inference step of modern kernel methods (e.g., kernelbased nonparametric tests) that rely on embedding probability distributions in RKHSs. Previous work [1] has shown that shrinkage can help in constructing “better ” estimators of the kernel mean than the empirical estimator. The present paper studies the consistency and admissibility of the estimators in [1], and proposes a wider class of shrinkage estimators that improve upon the empirical estimator by considering appropriate basis functions. Using the kernel PCA basis, we show that some of these estimators can be constructed using spectral filtering algorithms which are shown to be consistent under some technical assumptions. Our theoretical analysis also reveals a fundamental connection to the kernelbased supervised learning framework. The proposed estimators are simple to implement and perform well in practice. 1
1Kernels on Sample Sets via Nonparametric Divergence Estimates
"... Abstract—Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest treating a group of data points as an i.i.d. sample set f ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest treating a group of data points as an i.i.d. sample set from an underlying feature distribution for that group. Our approach employs kernel machines with a kernel on i.i.d. sample sets of vectors. We define certain kernel functions on pairs of distributions, and then use a nonparametric estimator to consistently estimate those functions based on sample sets. The projection of the estimated Gram matrix to the cone of symmetric positive semidefinite matrices enables us to use kernel machines for classification, regression, anomaly detection, and lowdimensional embedding in the space of distributions. We present several numerical experiments both on real and simulated datasets to demonstrate the advantages of our new approach. F 1
The kendall and mallows kernels for permutations
 In Proceedings of the 32nd International Conference on Machine Learning (ICML15
, 1935
"... Abstract We show that the widely used Kendall tau correlation coefficient, and the related Mallows kernel, are positive definite kernels for permutations. They offer computationally attractive alternatives to more complex kernels on the symmetric group to learn from rankings, or to learn to rank. W ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract We show that the widely used Kendall tau correlation coefficient, and the related Mallows kernel, are positive definite kernels for permutations. They offer computationally attractive alternatives to more complex kernels on the symmetric group to learn from rankings, or to learn to rank. We show how to extend the Kendall kernel to partial rankings or rankings with uncertainty, and demonstrate promising results on highdimensional classification problems in biomedical applications.
Fast distribution to real regression
 In AISTATS
, 2014
"... Abstract We study the problem of distribution to real regression, where one aims to regress a mapping f that takes in a distribution input covariate P ∈ I (for a nonparametric family of distributions I) and outputs a realvalued response Y = f (P ) + . This setting was recently studied in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract We study the problem of distribution to real regression, where one aims to regress a mapping f that takes in a distribution input covariate P ∈ I (for a nonparametric family of distributions I) and outputs a realvalued response Y = f (P ) + . This setting was recently studied in