Results 1 
3 of
3
DualTree Fast Exact MaxKernel Search
, 2013
"... The problem of maxkernel search arises everywhere: given a query point pq, a set of reference objects Sr and some kernel K, find arg maxpr∈Sr K(pq, pr). Maxkernel search is ubiquitous and appears in countless domains of science, thanks to the wide applicability of kernels. A few domains include im ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
The problem of maxkernel search arises everywhere: given a query point pq, a set of reference objects Sr and some kernel K, find arg maxpr∈Sr K(pq, pr). Maxkernel search is ubiquitous and appears in countless domains of science, thanks to the wide applicability of kernels. A few domains include image matching, information retrieval, bioinformatics, similarity search, and collaborative filtering (to name just a few). However, there are no generalized techniques for efficiently solving maxkernel search. This paper presents a singletree algorithm called singletree FastMKS which returns the maxkernel solution for a single query point in provably O(logN) time (where N is the number of reference objects), and also a dualtree algorithm (dualtree FastMKS) which is useful for maxkernel search with many query points. If the set of query points is of size O(N), this algorithm returns a solution in provably O(N) time, which is significantly better than the O(N2) linear scan solution; these bounds are dependent on the expansion constant of the data. These algorithms work for abstract objects, as they do not require explicit representation of the points in kernel space. Empirical results for a variety of datasets show up to 5 orders of magnitude speedup in some cases. In addition, we present approximate extensions of the FastMKS algorithms that can achieve further speedups. 1 Maxkernel search
PlugandPlay DualTree Algorithm Runtime Analysis
, 2015
"... Abstract Numerous machine learning algorithms contain pairwise statistical problems at their corethat is, tasks that require computations over all pairs of input points if implemented naively. Often, tree structures are used to solve these problems efficiently. Dualtree algorithms can efficiently ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Numerous machine learning algorithms contain pairwise statistical problems at their corethat is, tasks that require computations over all pairs of input points if implemented naively. Often, tree structures are used to solve these problems efficiently. Dualtree algorithms can efficiently solve or approximate many of these problems. Using cover trees, rigorous worstcase runtime guarantees have been proven for some of these algorithms. In this paper, we present a problemindependent runtime guarantee for any dualtree algorithm using the cover tree, separating out the problemdependent and the problemindependent elements. This allows us to just plug in bounds for the problemdependent elements to get runtime guarantees for dualtree algorithms for any pairwise statistical problem without rederiving the entire proof. We demonstrate this plugandplay procedure for nearestneighbor search and approximate kernel density estimation to get improved runtime guarantees. Under mild assumptions, we also present the first linear runtime guarantee for dualtree based range search. Keywords: dualtree algorithms, adaptive runtime analysis, cover tree, expansion constant, nearest neighbor search, kernel density estimation, range search Dualtree Algorithms A surprising number of machine learning algorithms have computational bottlenecks that can be expressed as pairwise statistical problems. By this, we mean computational tasks that can be evaluated directly by iterating over all pairs of input points. Nearest neighbor search is one such problem, since for every query point, we can evaluate its distance to every reference point and keep the closest one.
Faster Cover Trees
"... Abstract The cover tree data structure speeds up exact nearest neighbor queries over arbitrary metric spaces On standard benchmark datasets, we reduce the number of distance computations by 1050%. On a largescale bioinformatics dataset, we reduce the number of distance computations by 71%. On a ..."
Abstract
 Add to MetaCart
Abstract The cover tree data structure speeds up exact nearest neighbor queries over arbitrary metric spaces On standard benchmark datasets, we reduce the number of distance computations by 1050%. On a largescale bioinformatics dataset, we reduce the number of distance computations by 71%. On a largescale image dataset, our parallel algorithm with 16 cores reduces tree construction time from 3.5 hours to 12 minutes.