Results 1 
4 of
4
D4L: Decentralized Dynamic Discriminative Dictionary Learning
, 2015
"... We consider discriminative dictionary learning in a distributed online setting, where a network of agents aims to learn a common set of dictionary elements of a feature space and model parameters while sequentially receiving observations. We formulate this problem as a distributed stochastic program ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We consider discriminative dictionary learning in a distributed online setting, where a network of agents aims to learn a common set of dictionary elements of a feature space and model parameters while sequentially receiving observations. We formulate this problem as a distributed stochastic program with a nonconvex objective and present a block variant of the ArrowHurwicz saddle point algorithm to solve it. Using Lagrange multipliers to penalize the discrepancy between them, only neighboring nodes exchange model information. We show that decisions made with this saddle point algorithm asymptotically achieve a firstorder stationarity condition on average. The learning rate depends on the signal source, network, and discriminative task. We illustrate the algorithm performance in an online multiagent setting for a collaborative image classification task and show that practical performance is comparable to the centralized case. Moreover, in the multiclass setting, the proposed framework empirically allows nodes to make global inferences despite only observing distinct subsets of the feature space. We apply the proposed method to a mobile robotic team performing collaborative navigability assessment in an unknown environment, demonstrating the proposed algorithm’s utility in a field setting.
BLOCK STOCHASTIC GRADIENT ITERATION FOR CONVEX AND NONCONVEX OPTIMIZATION
, 2015
"... The stochastic gradient (SG) method can quickly solve a problem with a large number of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, can quickly solve problems with multiple (blocks of ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The stochastic gradient (SG) method can quickly solve a problem with a large number of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, can quickly solve problems with multiple (blocks of) variables. This paper introduces a method that combines the great features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. This paper proposes a block SG (BSG) method for both convex and nonconvex programs. BSG generalizes SG by updating all the blocks of variables in the Gauss–Seidel type (updating the current block depends on the previously updated block), in either a fixed or randomly shuffled order. Although BSG has slightly more work at each iteration, it typically outperforms SG because of BSG’s Gauss–Seidel updates and larger step sizes, the latter of which are determined by the smaller perblock Lipschitz constants. The convergence of BSG is established for both convex and nonconvex cases. In the convex case, BSG has the same order of convergence rate as SG. In the nonconvex case, its convergence is established in terms of the expected violation of a firstorder optimality condition. In both cases our analysis is nontrivial since the typical unbiasedness assumption no longer holds. BSG is numerically evaluated on the following problems: stochastic least squares and logistic regression, which are convex, and lowrank tensor recovery and bilinear logistic regression, which are nonconvex. On the convex problems, BSG performed significantly better than SG. On the nonconvex problems, BSG significantly outperformed the deterministic BCD method because the latter tends to stagnate early near local minimizers. Overall, BSG inherits the benefits of both SG approximation and block coordinate updates and is especially useful for solving largescale nonconvex problems.
Pattern Anal Applic
"... Proximal gradient method for huberized support vector machine ..."
(Show Context)
Doubly Random Parallel Stochastic Methods for Large Scale Learning
"... Abstract — We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utiliz ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple processors to operate in a randomly chosen subset of blocks of the feature vector. We call the algorithm parallel stochastic because processors choose elements of the training set randomly and independently. Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both, the selection of blocks and the selection of elements of the training set. In RAPSA, processors utilize the randomly chosen functions to compute the stochastic gradient component associated with a randomly chosen block. The technical contribution of this paper is to show that this minimally coordinated algorithm converges to the optimal classifier when the training objective is convex. In particular, we show that: (i) When using decreasing stepsizes, RAPSA converges almost surely over the random choice of blocks and functions. (ii) When using constant stepsizes, convergence is to a neighborhood of optimality with a rate that is linear in expectation. RAPSA is numerically evaluated on the MNIST digit recognition problem. I.