Results 1 
2 of
2
Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation
"... Abstract In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The pr ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The problem is also wellknown as facility location in the operations research literature. While the problem has welldeveloped convex relaxation with approximation and recovery guarantees, its number of variables grows quadratically with the number of samples. Therefore, stateoftheart methods can hardly handle more than 10 4 samples (i.e. 10 8 variables). In this work, we propose an AugmentedLagrangian with Block Coordinate Descent (ALBCD) algorithm that utilizes problem structure to obtain closedform solution for each block subproblem, and exploits lowrank representation of the dissimilarity matrix to search active columns without computing the entire matrix. Experiments show our approach to be orders of magnitude faster than existing approaches and can handle problems of up to 10 6 samples. We also demonstrate successful applications of the algorithm on worldscale facility location, document summarization and active learning.
A DualAugmented Block Minimization Framework for Learning with Limited Memory
"... In past few years, several techniques have been proposed for training of linear Support Vector Machine (SVM) in limitedmemory setting, where a dual blockcoordinate descent (dualBCD) method was used to balance cost spent on I/O and computation. In this paper, we consider the more general setting o ..."
Abstract
 Add to MetaCart
(Show Context)
In past few years, several techniques have been proposed for training of linear Support Vector Machine (SVM) in limitedmemory setting, where a dual blockcoordinate descent (dualBCD) method was used to balance cost spent on I/O and computation. In this paper, we consider the more general setting of regularized Empirical Risk Minimization (ERM) when data cannot fit into memory. In particular, we generalize the existing block minimization framework based on strong duality and Augmented Lagrangian technique to achieve global convergence for general convex ERM. The block minimization framework is flexible in the sense that, given a solver working under sufficient memory, one can integrate it with the framework to obtain a solver globally convergent under limitedmemory condition. We conduct experiments on L1regularized classification and regression problems to corroborate our convergence theory and compare the proposed framework to algorithms adopted from online and distributed settings, which shows superiority of the proposed approach on data of size ten times larger than the memory capacity. 1