Results

**11 - 16**of**16**### Divide-and-Conquer Learning by Anchoring a Conical Hull

, 2014

"... We reduce a broad class of fundamental machine learning problems, usually addressed by EM or sampling, to the problem of finding the k extreme rays spanning the conical hull of a1 data point set. These k “anchors ” lead to a global solution and a more interpretable model that can even outperform EM ..."

Abstract
- Add to MetaCart

We reduce a broad class of fundamental machine learning problems, usually addressed by EM or sampling, to the problem of finding the k extreme rays spanning the conical hull of a1 data point set. These k “anchors ” lead to a global solution and a more interpretable model that can even outperform EM and sampling on generalization error. To find the k anchors, we propose a novel divide-and-conquer learning scheme “DCA ” that distributes the problem to O(k log k) same-type sub-problems on different low-D random hyperplanes, each can be solved independently by any existing solver. For the 2D sub-problem, we instead present a non-iterative solver that only needs to compute an array of cosine values and its max/min entries. DCA also provides a faster subroutine inside other algorithms to check whether a point is covered in a conical hull, and thus improves these algorithms by providing significant speedups. We apply our method to GMM, HMM, LDA, NMF and subspace clustering, then show its competitive performance and scalability over other methods on large datasets.

### Fast Algorithms for Robust PCA via Gradient Descent

"... Abstract We consider the problem of Robust PCA in the fully and partially observed settings. Without corruptions, this is the well-known matrix completion problem. From a statistical standpoint this problem has been recently well-studied, and conditions on when recovery is possible (how many observ ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract We consider the problem of Robust PCA in the fully and partially observed settings. Without corruptions, this is the well-known matrix completion problem. From a statistical standpoint this problem has been recently well-studied, and conditions on when recovery is possible (how many observations do we need, how many corruptions can we tolerate) via polynomial-time algorithms is by now understood. This paper presents and analyzes a non-convex optimization approach that greatly reduces the computational complexity of the above problems, compared to the best available algorithms. In particular, in the fully observed case, with r denoting rank and d dimension, we reduce the complexity from O(r 2 d 2 log(1/ε)) to O(rd 2 log(1/ε)) -a big savings when the rank is big. For the partially observed case, we show the complexity of our algorithm is no more than O(r 4 d log d log(1/ε)). Not only is this the best-known run-time for a provable algorithm under partial observation, but in the setting where r is small compared to d, it also allows for near-linear-in-d run-time that can be exploited in the fully-observed case as well, by simply running our algorithm on a subset of the observations.

### Detection of Planted Solutions for Flat Satisfiability Problems

"... Abstract. We study the detection problem of finding planted solutions in random instances of flat satisfiability problems, a generalization of boolean satisfiability formulas. We describe the properties of random instances of flat satisfiability, as well of the optimal rates of detection of the asso ..."

Abstract
- Add to MetaCart

Abstract. We study the detection problem of finding planted solutions in random instances of flat satisfiability problems, a generalization of boolean satisfiability formulas. We describe the properties of random instances of flat satisfiability, as well of the optimal rates of detection of the associated hypothesis testing problem. We also study the perfor-mance of an algorithmically efficient testing procedure. We introduce a modification of our model, the light planting of solutions, and show that it is as hard as the problem of learning parity with noise. This hints strongly at the difficulty of detecting planted flat satisfiability for a wide class of tests.

### Resource Allocation for Statistical Estimation

, 2014

"... Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of datasets from multiple sources, which can have significant differences in character and in value. Due to these variations, the effectiveness of employing a given resource – e.g., a sensing dev ..."

Abstract
- Add to MetaCart

Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of datasets from multiple sources, which can have significant differences in character and in value. Due to these variations, the effectiveness of employing a given resource – e.g., a sensing device or computing power – for gathering or processing data from a particular source depends on the nature of that source. As a result, the appropriate division and assignment of a collection of resources to a set of data sources can substantially impact the overall performance of an inferential strategy. In this expository article, we adopt a general view of the notion of a resource and its effect on the quality of a data source, and we describe a framework for the allocation of a given set of resources to a collection of sources in order to optimize a specified metric of statistical efficiency. We discuss several stylized examples involving inferential tasks such as parameter estimation and hypothesis testing based on heterogeneous data sources, in which optimal allocations can be computed either in closed form or via efficient numerical procedures based on convex optimization.

### Matrix Completion with Queries

, 2015

"... In many applications, e.g., recommender systems and traffic monitoring, the data comes in the form of a matrix that is only partially observed and low rank. A fundamental data-analysis task for these datasets is matrix completion, where the goal is to accurately infer the entries missing from the ma ..."

Abstract
- Add to MetaCart

(Show Context)
In many applications, e.g., recommender systems and traffic monitoring, the data comes in the form of a matrix that is only partially observed and low rank. A fundamental data-analysis task for these datasets is matrix completion, where the goal is to accurately infer the entries missing from the matrix. Even when the data satisfies the low-rank assumption, classical matrix-completion methods may output completions with significant error – in that the reconstructed matrix differs significantly from the true underlying matrix. Often, this is due to the fact that the information contained in the observed entries is insufficient. In this work, we address this problem by proposing an active version of matrix completion, where queries can be made to the true underlying matrix. Subsequently, we design Order&Extend, which is the first algorithm to unify a matrix-completion approach and a querying strategy into a single algorithm. Order&Extend is able identify and alleviate insufficient in-formation by judiciously querying a small number of add-tional entries. In an extensive experimental evaluation on real-world datasets, we demonstrate that our algorithm is efficient and is able to accurately reconstruct the true matrix while asking only a small number of queries.