Results 1  10
of
106
A kernel method for the twosampleproblem
 IN NIPS. 2006
"... We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test ..."
Abstract

Cited by 117 (27 self)
 Add to MetaCart
(Show Context)
We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. The test statistic can be computed in O(m2) time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.
Graph Kernels
, 2007
"... We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexit ..."
Abstract

Cited by 101 (9 self)
 Add to MetaCart
(Show Context)
We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation between unlabeled graphs with n vertices from O(n 6) to O(n 3). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixedpoint methods that take O(dn 3) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for ddimensional edge kernels, and O(n 4) in the infinitedimensional case; on sparse graphs these algorithms only take O(n 2) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to Rconvolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment kernel of Fröhlich et al. (2006) yet provably positive semidefinite.
Integrating structured biological data by kernel maximum mean discrepancy
 IN ISMB
, 2006
"... Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernelbased statistical test for this problem, based on the fact that two distributions are different if and only if the ..."
Abstract

Cited by 86 (18 self)
 Add to MetaCart
(Show Context)
Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernelbased statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. Results: We study the practical feasibility of an MMDbased test on three central data integration tasks: Testing crossplatform comparability of microarray data, cancer diagnosis, and datacontent based schema matching for two different protein function classification schemas. In all of these experiments, including highdimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments.
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 70 (18 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Multiclass multiple kernel learning
 In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract

Cited by 62 (4 self)
 Add to MetaCart
(Show Context)
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semiinfinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.
ShortestPath Kernels on Graphs
 In Proceedings of the 2005 International Conference on Data Mining
, 2005
"... Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available by defining a kernel function on instances of graphs. Graph kernels based on walks, subtrees and cycles in graphs have bee ..."
Abstract

Cited by 62 (5 self)
 Add to MetaCart
(Show Context)
Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available by defining a kernel function on instances of graphs. Graph kernels based on walks, subtrees and cycles in graphs have been proposed so far. As a general problem, these kernels are either computationally expensive or limited in their expressiveness. We try to overcome this problem by defining expressive graph kernels which are based on paths. As the computation of all paths and longest paths in a graph is NPhard, we propose graph kernels based on shortest paths. These kernels are computable in polynomial time, retain expressivity and are still positive definite. In experiments on classification of graph models of proteins, our shortestpath kernels show significantly higher classification accuracy than walkbased kernels. 1
Efficient graphlet kernels for large graph comparison
, 2009
"... Stateoftheart graph kernels do not scale to large graphs with hundreds of nodes and thousands of edges. In this article we propose to compare graphs by counting graphlets, i.e., subgraphs with k nodes where k ∈ {3, 4, 5}. Exhaustive enumeration of all graphlets being prohibitively expensive, we i ..."
Abstract

Cited by 54 (10 self)
 Add to MetaCart
Stateoftheart graph kernels do not scale to large graphs with hundreds of nodes and thousands of edges. In this article we propose to compare graphs by counting graphlets, i.e., subgraphs with k nodes where k ∈ {3, 4, 5}. Exhaustive enumeration of all graphlets being prohibitively expensive, we introduce two theoretically grounded speedup schemes, one based on sampling and the second one specifically designed for bounded degree graphs. In our experimental evaluation, our novel kernels allow us to efficiently compare large graphs that cannot be tackled by existing graph kernels.
Characterizing structural relationships in scenes using graph kernels
 In ACM TOG
, 2011
"... Modeling virtual environments is a time consuming and expensive task that is becoming increasingly popular for both professional and casual artists. The model density and complexity of the scenes representing these virtual environments is rising rapidly. This trend suggests that datamining a 3D sc ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
Modeling virtual environments is a time consuming and expensive task that is becoming increasingly popular for both professional and casual artists. The model density and complexity of the scenes representing these virtual environments is rising rapidly. This trend suggests that datamining a 3D scene corpus to facilitate collaborative content creation could be a very powerful tool enabling more efficient scene design. In this paper, we show how to represent scenes as graphs that encode models and their semantic relationships. We then define a kernel between these relationship graphs that compares common virtual substructures in two graphs and captures the similarity between their corresponding scenes. We apply this framework to several scene modeling problems, such as finding similar scenes, relevance feedback, and contextbased model search. We show that incorporating structural relationships allows our method to provide a more relevant set of results when compared against previous approaches to model context search.
Proteinligand interaction prediction: an improved chemogenomics approach. Bioinformatics
, 2008
"... Motivation: Predicting interactions between small molecules and proteins is a crucial step to decipher many biological processes, and plays a critical role in drug discovery. When no detailed 3D structure of the protein target is available, ligandbased virtual screening allows the construction of ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
Motivation: Predicting interactions between small molecules and proteins is a crucial step to decipher many biological processes, and plays a critical role in drug discovery. When no detailed 3D structure of the protein target is available, ligandbased virtual screening allows the construction of predictive models by learning to discriminate known ligands from nonligands. However the accuracy of ligandbased models quickly degrades when the number of known ligands decreases, and in particular the approach is not applicable for orphan receptors with no known ligand. Results: We propose a systematic method to predict ligandprotein interactions, even for targets with no known 3D structure and few or no known ligands. Following the recent chemogenomics trend, we adopt a crosstarget view and attempt to screen the chemical space against whole families of proteins simultaneously. The lack of known ligand for a given target can then be compensated by the availability of known ligands for similar targets. We test this strategy on three important classes of drug targets, namely enzymes, Gprotein coupled receptors (GPCR) and ion channels, and report dramatic improvements in prediction accuracy over classical ligandbased virtual screening, in particular for targets with few or no known ligands. Availability: All data and algorithms are available as supplementary material. Contact: