Results 1 
9 of
9
Fast and Scalable Polynomial Kernels via Explicit Feature Maps *
"... Approximation of nonlinear kernels using random feature mapping has been successfully employed in largescale data analysis applications, accelerating the training of kernel machines. While previous random feature mappings run in O(ndD) time for n training samples in ddimensional space and D rando ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Approximation of nonlinear kernels using random feature mapping has been successfully employed in largescale data analysis applications, accelerating the training of kernel machines. While previous random feature mappings run in O(ndD) time for n training samples in ddimensional space and D random feature maps, we propose a novel randomized tensor product technique, called Tensor Sketching, for approximating any polynomial kernel in O(n(d + D log D)) time. Also, we introduce both absolute and relative error bounds for our approximation to guarantee the reliability of our estimation algorithm. Empirically, Tensor Sketching achieves higher accuracy and often runs orders of magnitude faster than the stateoftheart approach for largescale realworld datasets.
Largescale image annotation by efficient and robust kernel metric learning
 in: IEEE International Conference on Computer Vision (ICCV
"... One of the key challenges in searchbased image annotation models is to define an appropriate similarity measure between images. Many kernel distance metric learning (KML) algorithms have been developed in order to capture the nonlinear relationships between visual features and semantics of the i ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
One of the key challenges in searchbased image annotation models is to define an appropriate similarity measure between images. Many kernel distance metric learning (KML) algorithms have been developed in order to capture the nonlinear relationships between visual features and semantics of the images. One fundamental limitation in applying KML to image annotation is that it requires converting image annotations into binary constraints, leading to a significant information loss. In addition, most KML algorithms suffer from high computational cost due to the requirement that the learned matrix has to be positive semidefinitive (PSD). In this paper, we propose a robust kernel metric learning (RKML) algorithm based on the regression technique that is able to directly utilize image annotations. The proposed method is also computationally more efficient because PSD property is automatically ensured by regression. We provide the theoretical guarantee for the proposed algorithm, and verify its efficiency and effectiveness for image annotation by comparing it to stateoftheart approaches for both distance metric learning and image annotation. 1.
Embed and Conquer: Scalable Embeddings for Kernel kMeans on MapReduce
"... The kernel kmeans is an effective method for data clustering which extends the commonlyused kmeans algorithm to work on a similarity matrix over complex data structures. It is, however, computationally very complex as it requires the complete kernel matrix to be calculated and stored. Further, i ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The kernel kmeans is an effective method for data clustering which extends the commonlyused kmeans algorithm to work on a similarity matrix over complex data structures. It is, however, computationally very complex as it requires the complete kernel matrix to be calculated and stored. Further, its kernelized nature hinders the parallelization of its computations on modern scalable infrastructures for distributed computing. In this paper, we are defining a family of kernelbased lowdimensional embeddings that allows for scaling kernel kmeans on MapReduce via an efficient and unified parallelization strategy. Afterwards, we propose two practical methods for lowdimensional embedding that adhere to our definition of the embeddings family. Exploiting the proposed parallelization strategy, we present two scalable MapReduce algorithms for kernel kmeans. We demonstrate the effectiveness and efficiency of the proposed algorithms through an empirical evaluation on benchmark datasets. 1
Large scale online kernel classification
 In IJCAI, 2013b
"... In this work, we present a new framework for large scale online kernel classification, making kernel methods efficient and scalable for largescale online learning tasks. Unlike the regular budget kernel online learning scheme that usually uses different strategies to bound the number of support v ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this work, we present a new framework for large scale online kernel classification, making kernel methods efficient and scalable for largescale online learning tasks. Unlike the regular budget kernel online learning scheme that usually uses different strategies to bound the number of support vectors, our framework explores a functional approximation approach to approximating a kernel function/matrix in order to make the subsequent online learning task efficient and scalable. Specifically, we present two different online kernel machine learning algorithms: (i) the Fourier Online Gradient Descent (FOGD) algorithm that applies the random Fourier features for approxi
Scalable Single Linkage Hierarchical Clustering For Big Data
"... Abstract—Personal computing technologies are everywhere; hence, there are an abundance of staggeringly large data sets— the Library of Congress has stored over 160 terabytes of web data and it is estimated that Facebook alone logs nearly a petabyte of data per day. Thus, there is a pertinent need fo ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Personal computing technologies are everywhere; hence, there are an abundance of staggeringly large data sets— the Library of Congress has stored over 160 terabytes of web data and it is estimated that Facebook alone logs nearly a petabyte of data per day. Thus, there is a pertinent need for systems by which one can elucidate the similarity and dissimilarity among and between groups in these big data sets. Clustering is one way to find these groups. In this paper, we extend the scalable Visual Assessment of Tendency (sVAT) algorithm to return singlelinkage partitions of big data sets. The sVAT algorithm is designed to provide visual evidence of the number of clusters in unloadable (big) data sets. The extension we describe for sVAT enables it to also then efficiently return the data partition as indicated by the visual evidence. The computational complexity and storage requirements of sVAT are (usually) significantly less than the O(n2) requirement of the classic singlelinkage hierarchical algorithm. We show that sVAT is a scalable instantiation of singlelinkage clustering for data sets that contain c compactseparated clusters, where c n; n is the number of objects. For data sets that do not contain compactseparated clusters, we show that sVAT produces a good approximation of singlelinkage partitions. Experimental results are presented for both synthetic and real data sets. I.
IEEE TRANSACTIONS ON FUZZY SYSTEMS 1 Incremental Fuzzy Clustering with Multiple Medoids for Large Data
"... fuzzy clustering with multiple medoids for large data. IEEE transactions on fuzzy systems, 22(6), 15571568. ..."
Abstract
 Add to MetaCart
(Show Context)
fuzzy clustering with multiple medoids for large data. IEEE transactions on fuzzy systems, 22(6), 15571568.
Approximate Nearest Centroid Embedding for Kernel kMeans
"... This paper proposes an efficient embedding method for scaling kernel kmeans on cloud infrastructures. The embedding method allows for approximating the computation of the nearest centroid to each data instance and, accordingly, it eliminates the quadratic space and time complexities of the cluste ..."
Abstract
 Add to MetaCart
This paper proposes an efficient embedding method for scaling kernel kmeans on cloud infrastructures. The embedding method allows for approximating the computation of the nearest centroid to each data instance and, accordingly, it eliminates the quadratic space and time complexities of the cluster assignment step in the kernel kmeans algorithm. We show that the proposed embedding method is effective under memory and computing power constraints, and that it achieves better clustering performance compared to other approximations of the kernel kmeans algorithm. 1
Robust Multiple Kernel Kmeans Using `2,1Norm
"... The kmeans algorithm is one of the most often used method for data clustering. However, the standard kmeans can only be applied in the original feature space. The kernel kmeans, which extends kmeans into the kernel space, can be used to capture the nonlinear structure and identify arbitrarily ..."
Abstract
 Add to MetaCart
The kmeans algorithm is one of the most often used method for data clustering. However, the standard kmeans can only be applied in the original feature space. The kernel kmeans, which extends kmeans into the kernel space, can be used to capture the nonlinear structure and identify arbitrarily shaped clusters. Since both the standard kmeans and kernel kmeans apply the squared error to measure the distances between data points and cluster centers, a few outliers will cause large errors and dominate the objection function. Besides, the performance of kernel method is largely determined by the choice of kernel. Unfortunately, the most suitable kernel for a particular task is often unknown in advance. In this paper, we first present a robust kmeans using `2,1norm in the feature space and then extend it to the kernel space. To recap the powerfulness of kernel methods, we further propose a novel robust multiple kernel kmeans (RMKKM) algorithm that simultaneously finds the best clustering label, the cluster membership and the optimal combination of multiple kernels. An alternating iterative schema is developed to find the optimal value. Extensive experiments well demonstrate the effectiveness of the proposed algorithms. 1
Kernel clustering
"... h i g h l i g h t s • Kernel competitive learning (KCL) cannot be applied in large scale data problem. • Propose a projection based approximate KCL method for large scale data problem. • Provide theoretical analysis on why the approximation modelling would work for KCL. • A pseudoparallelled approx ..."
Abstract
 Add to MetaCart
(Show Context)
h i g h l i g h t s • Kernel competitive learning (KCL) cannot be applied in large scale data problem. • Propose a projection based approximate KCL method for large scale data problem. • Provide theoretical analysis on why the approximation modelling would work for KCL. • A pseudoparallelled approximate computation framework for large scale KCL is developed. • Experimentally show the effectiveness and efficiency of the proposals. a r t i c l e i n f o Article history: