• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Global convergence of stochastic gradient descent for some nonconvex matrix problems. (2015)

by C De Sa, K Olukotun, C Re
Venue:In ICML,
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow

by Stephen Tu, Ross Boczar, Mahdi Soltanolkotabi, Benjamin Recht , 2015
"... In this paper we study the problem of recovering an low-rank positive semidefinite matrix from linear measurements. Our algorithm, which we call Procrustes Flow, starts from an ini-tial estimate obtained by a thresholding scheme followed by gradient descent on a non-convex objective. We show that as ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
In this paper we study the problem of recovering an low-rank positive semidefinite matrix from linear measurements. Our algorithm, which we call Procrustes Flow, starts from an ini-tial estimate obtained by a thresholding scheme followed by gradient descent on a non-convex objective. We show that as long as the measurements obey a standard restricted isometry property, our algorithm converges to the unknown matrix at a geometric rate. In the case of Gaussian measurements, such convergence occurs for a n×n matrix of rank r when the number of measurements exceeds a constant times nr. 1
(Show Context)

Citation Context

...g from the same neighborhood of U0 with only Ω(nr) measurements. Moreover, the theory of restricted isometries in our work considerably simplifies the analysis. Finally, we would also like to mention =-=[SOR14]-=- for guarantees using stochastic gradient algorithms. The results of [SOR14] are applicable to a variety of models; focusing on the Gaussian ensemble, the authors require Ω ((nr log n)/) samples to r...

Fast stochastic algorithms for svd and pca: Convergence properties and convexity

by Ohad Shamir , Ohad Ac Shamir@weizmann , Il - CoRR , 2015
"... Abstract We study the convergence properties of the VR-PCA algorithm introduced by ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract We study the convergence properties of the VR-PCA algorithm introduced by

Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent

by Chi Jin , Sham M Kakade , Praneeth Netrapalli
"... Abstract Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they pr ..."
Abstract - Add to MetaCart
Abstract Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they provide an estimate of the unknown matrix using all observations simultaneously. However, in many applications, the online version, where we observe one entry at a time and dynamically update our estimate, is more appealing. While existing algorithms are efficient for the offline setting, they could be highly inefficient for the online setting. In this paper, we propose the first provable, efficient online algorithm for matrix completion. Our algorithm starts from an initial estimate of the matrix and then performs non-convex stochastic gradient descent (SGD). After every observation, it performs a fast update involving only one row of two tall matrices, giving near linear total runtime. Our algorithm can be naturally used in the offline setting as well, where it gives competitive sample complexity and runtime to state of the art algorithms. Our proofs introduce a general framework to show that SGD updates tend to stay away from saddle surfaces and could be of broader interests to other non-convex problems.
(Show Context)

Citation Context

...completion: Another variant of online matrix completion studied in the literature is where observations are made on a column by column basis e.g., [16, 26]. These models can give improved offline performance in terms of space and could potentially work under relaxed regularity conditions. However, they do not tackle the version where only entries (as opposed to columns) are observed. Non-convex optimization: Over the last few years, there has also been a significant amount of work in designing other efficient algorithms for solving non-convex problems. Examples include eigenvector computation [6, 11], sparse coding [20, 1] etc. For general non-convex optimization, an interesting line of recent work is that of [7], which proves gradient descent with noise can also escape saddle point, but they only provide polynomial rate without explicit dependence. Later [17, 21] show that without noise, the space of points from where gradient descent converges to a saddle point is a measure zero set. However, they do not provide a rate of convergence. Another related piece of work to ours is [10], proves global convergence along with rates of convergence, for the special case of computing matrix squarer...

Convergence of Stochastic Gradient Descent for PCA

by Ohad Shamir
"... Abstract We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R d . A simple and computationally cheap algorithm for this is stochastic gradi ..."
Abstract - Add to MetaCart
Abstract We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R d . A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the non-convex nature of the problem, analyzing its performance has been a challenge. In particular, existing guarantees rely on a non-trivial eigengap assumption on the covariance matrix, which is intuitively unnecessary. In this paper, we provide (to the best of our knowledge) the first eigengap-free convergence guarantees for SGD in the context of PCA. This also partially resolves an open problem posed in

A Geometric Analysis of Phase Retrieval

by Ju Sun , Qing Qu , John Wright
"... Abstract Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of m measurements, y k = |a * k x| for k = 1, . . . , m, is it possible to recover x ∈ C n (i.e., length-n complex vector)? This generalized phase retrieval (GPR) problem is a fundamental task in vario ..."
Abstract - Add to MetaCart
Abstract Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of m measurements, y k = |a * k x| for k = 1, . . . , m, is it possible to recover x ∈ C n (i.e., length-n complex vector)? This generalized phase retrieval (GPR) problem is a fundamental task in various disciplines, and has been the subject of much recent investigation. Natural nonconvex heuristics often work remarkably well for GPR in practice, but lack clear theoretical explanations. In this paper, we take a step towards bridging this gap. We prove that when the measurement vectors a k 's are generic (i.i.d. complex Gaussian) and the number of measurements is large enough (m ≥ Cn log 3 n), with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) there are no spurious local minimizers, and all global minimizers are equal to the target signal x, up to a global phase; and (2) the objective function has a negative curvature around each saddle point. This structure allows a number of iterative optimization methods to efficiently find a global minimizer, without special initialization. To corroborate the claim, we describe and analyze a second-order trust-region algorithm.
(Show Context)

Citation Context

...rieval problem has a natural generalization to recovering low-rank positive semidefinite matrices. Consider the problem of recovering an unknown rank-r matrix M 0 in Rn×n from linear measurement of the form zk = tr(AkM) with symmetricAk for k = 1, . . . ,m. One can solve the problem by considering the “factorized” version: recoveringX ∈ Rn×r (up to right invertible transform) from measurements zk = tr(X ∗AkX). This is a natural generalization of GPR, as one can write the GPR measurements as y2k = |a∗kx| 2 = x∗(aka ∗ k)x. This generalization and related problems have recently been studied in [SRO15, ZL15, TBSR15, CW15]. 1.5 Notations, Organization, and Reproducible Research Basic notations and facts. Throughout the paper, we define complex inner product as: 〈a, b〉 .= a∗b for any a, b ∈ Cn. We use CSn−1 for the complex unit sphere in Cn. CSn−1(λ) with λ > 0 denotes the centered complex sphere with radius λ in Cn. Similarly, we use CBn(λ) to denote the centered complex ball of radius λ. We use CN (k) for a standard complex Gaussian vector of length k defined in (1.2). We reserve C and c, and their indexed versions to denote absolute constants. Their value vary with the context. Let < (z) ∈ Rn and =(z) ∈ Rn de...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University