Results 1 -
3 of
3
My Weak Consistency is Strong When Bad Things Do Not Come in Threes
"... ABSTRACT It is expensive to maintain strong data consistency during concurrent execution. However, weak consistency levels, which are considered harmful, have been widely applied in analytical jobs. Their success challenges our belief: data consistency, which is believed to be an essential to preci ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT It is expensive to maintain strong data consistency during concurrent execution. However, weak consistency levels, which are considered harmful, have been widely applied in analytical jobs. Their success challenges our belief: data consistency, which is believed to be an essential to precise computing, does not always need to be preserved. In this paper, we tackle one of the core questions related to the application of weak consistency: When does weak consistency work well? We propose an effective explanation for the success of weak consistency. We name it bad things do not come in threes, or BN3. It is based on the observation that the volume of data is far larger than the number of workers. If all workers are operating concurrently, the probability that two workers access the same data at the same time is relatively low. Although it is not small enough to be neglected, the chance that three or more workers access the same data at the same time is even lower. Based on the BN3 conjecture, we analyze different consistency levels. We show that a weak consistency level in transaction processing is equivalent to snapshot isolation (SI) under reasonable assumptions. Although the BN3 is an oversimplification of real scenarios, it explains why weak consistency often achieves results that are accurate enough. It also serves as a quality promise for the future wide application of weak consistency in analytical tasks. We verify our results in experimental studies.
Distributed Machine Learning via Sufficient Factor Broadcasting
"... Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands ..."
Abstract
- Add to MetaCart
(Show Context)
Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parame-ter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor Broadcasting (SFB) computation model for efficient distributed learning of a large family of matrix-parameterized mod-els, which share the following property: the parameter update computed on each data sample is a rank-1 matrix, i.e. the outer product of two “sufficient factors” (SFs). By broadcasting the SFs among worker machines and reconstructing the update matrices locally at each worker, SFB improves communication efficiency — communication costs are linear in the parameter matrix’s dimensions, rather than quadratic — without affecting computational correctness. We present a theo-retical convergence analysis of SFB, and empirically corroborate its efficiency on four different matrix-parametrized ML models. 1
RandomWalk Distributed Dual Averaging Method For Decentralized Consensus Optimization
"... In this paper, we address the problem of distributed learning over a large number of distributed sensors or geographically separated data centers, which suffer from sampling biases across nodes. We propose an algorithm called random walk distributed dual averaging (RW-DDA) method that only requires ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we address the problem of distributed learning over a large number of distributed sensors or geographically separated data centers, which suffer from sampling biases across nodes. We propose an algorithm called random walk distributed dual averaging (RW-DDA) method that only requires local updates and is fully distributed. Our RW-DDA method is robust to the change in network topology and amenable to asynchronous implementation. The theoretical analysis shows the algorithm has O(1/ t) convergence for non-smooth convex problems. Experimental results show that our algorithm outperforms competing methods in real-world scenarios, i.e. when trained over non-iid data and in the presence of communication link failures. 1