Results 11  20
of
33
Composable Incremental and Iterative DataParallel Computation with Naiad
"... We report on the design and implementation of Naiad, a set of declarative dataparallel language extensions and an associated runtime supporting efficient and composable incremental and iterative computation. This combination is enabled by a new computational model we call differential dataflow, in ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We report on the design and implementation of Naiad, a set of declarative dataparallel language extensions and an associated runtime supporting efficient and composable incremental and iterative computation. This combination is enabled by a new computational model we call differential dataflow, in which incremental computation can be performed using a partial, rather than total, order on time. Naiad extends standard batch dataparallel processing models like MapReduce, Hadoop, and Dryad/DryadLINQ, to support efficient incremental updates to the inputs in the manner of a stream processing system, while at the same time enabling arbitrarily nested fixedpoint iteration. In this paper, we evaluate a prototype of Naiad that uses shared memory on a single multicore computer. We apply Naiad to various computations, including several graph algorithms, and observe good scaling properties and efficient incremental recomputation. 1.
Accelerate LargeScale Iterative Computation through Asynchronous Accumulative Updates
"... Myriad of data mining algorithms in scientific computing require parsing data sets iteratively. These iterative algorithms have to be implemented in a distributed environment to scale to massive data sets. To accelerate iterative computations in a largescale distributed environment, we identify a b ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Myriad of data mining algorithms in scientific computing require parsing data sets iteratively. These iterative algorithms have to be implemented in a distributed environment to scale to massive data sets. To accelerate iterative computations in a largescale distributed environment, we identify a broad class of iterative computations that can accumulate iterative update results. Specifically, different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, accumulative iterative update accumulates the intermediate iterative update results. We prove that an accumulative update will yield the same result as its corresponding traditional iterative update. Furthermore, accumulative iterative computation can be performed asynchronously and converges much faster. We present a general computation model to describe asynchronous accumulative iterative computation. Based on the computation model, we design and implement a distributed framework, Maiter. We evaluate Maiter on Amazon EC2 Cloud with 100 EC2 instances. Our results show that Maiter achieves as much as 60x speedup over Hadoop for implementing iterative algorithms.
Oolong: Asynchronous Distributed Applications Made Easy ∗
"... We present Oolong, a distributed programming framework designed for sparse asynchronous applications such as distributed web crawling, shortest paths, and connected components. Oolong stores program state in distributed inmemory keyvalue tables on which userdefined triggers may be set. Triggers c ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We present Oolong, a distributed programming framework designed for sparse asynchronous applications such as distributed web crawling, shortest paths, and connected components. Oolong stores program state in distributed inmemory keyvalue tables on which userdefined triggers may be set. Triggers can be activated whenever a keyvalue pair is modified. The eventdriven nature of triggers is particularly appropriate for asynchronous computation where workers can independently process part of the state towards convergence without any need for global synchronization. Using Oolong, we have implemented solutions for several largescale asynchronous computation problems, achieving good performance and robust fault tolerance. 1
Mammoth Data in the Cloud: Clustering Social Images
"... Abstract — Social image datasets have grown to dramatic size with images classified in vector spaces with high dimension (5122048) and with potentially billions of images and corresponding classification vectors. We study the challenging problem of clustering such sets into millions of clusters usi ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract — Social image datasets have grown to dramatic size with images classified in vector spaces with high dimension (5122048) and with potentially billions of images and corresponding classification vectors. We study the challenging problem of clustering such sets into millions of clusters using Iterative MapReduce. We introduce a new Kmeans algorithm in the Map phase which can tackle the challenge of large cluster and dimension size. Further we stress that the necessary parallelism of such data intensive problems are dominated by particular collective (reduction) operations which are common to MPI and MapReduce and study different collective implementations, which enable cloudHPC cluster interoperability. Extensive performance results are presented.
Petuum: A New Platform for Distributed Machine Learning on Big Data
 IEEE Transactions on Big Data
, 2015
"... How can one build a distributed framework that allows efficient deployment of a wide spectrum of modern advanced machine learning (ML) programs for industrialscale problems using Big Models (100s of billions of parameters) on Big Data (terabytes or petabytes)? Contemporary parallelization strate ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
How can one build a distributed framework that allows efficient deployment of a wide spectrum of modern advanced machine learning (ML) programs for industrialscale problems using Big Models (100s of billions of parameters) on Big Data (terabytes or petabytes)? Contemporary parallelization strategies employ finegrained operations and scheduling beyond the classic bulksynchronous processing paradigm popularized by MapReduce, or even specialized operators relying on graphical representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of different ML programs at scale. We propose a generalpurpose framework that systematically addresses data and modelparallel
Maiter: An Asynchronous Graph Processing Framework for Deltabased Accumulative Iterative Computation
"... Myriad of graphbased algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a largescale distributed environment in order to scale to massive data sets. To accelerate these largescale graphbased iterative computations, we ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Myriad of graphbased algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a largescale distributed environment in order to scale to massive data sets. To accelerate these largescale graphbased iterative computations, we propose deltabased accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC updates the result by accumulating the “changes” between iterations. By DAIC, we can process only the “changes” to avoid the negligible updates. Furthermore, we can perform DAIC asynchronously to bypass the highcost synchronous barriers in heterogeneous distributed environments. Based on the DAIC model, we design and implement an asynchronous graph processing framework, Maiter. We evaluate Maiter on local cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves as much as 60x speedup over Hadoop and outperforms other stateoftheart frameworks.
Fast topk pathbased relevance query on massive graphs
 In ICDE’14
, 2014
"... AbstractThe task of obtaining the items highlyrelevant to a given set of query items is a basis for various applications, such as recommendation and prediction. A family of pathbased relevance metrics, which quantify item relevance based on the paths in a given item graph, have been shown to be e ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
AbstractThe task of obtaining the items highlyrelevant to a given set of query items is a basis for various applications, such as recommendation and prediction. A family of pathbased relevance metrics, which quantify item relevance based on the paths in a given item graph, have been shown to be effective in capturing the relevance in many applications. Despite their effectiveness, pathbased relevance normally requires timeconsuming iterative computation. We propose an approach to obtain the topk most relevant items for a given query item set quickly. Our approach can obtain the topk items without having to compute converged scores. The approach is designed for a distributed environment, which makes it scale for massive graphs having hundreds of millions of nodes. Our experimental results show that the proposed approach can produce the result 20 to 50 times faster than a previously proposed approach and can scale well with both the size of input and the number of machines used in the computation. I. INTRODUCTION The task of selecting the items that are highly relevant to a given set of items, or a query item set, is a key component in many applications. One wellknown example of such applications is a personalized recommendation system, whose goal is to present the items that will interest a user. This can be achieved by selecting the items that are the most relevant to those that the user has previously shown interest in. For marketing, it is useful to know a set of people that will highly influence a targeted set of customers. Other examples are a personalized search system, which tries to produce the results that match both a given search query and a known user preference, and a search query suggestion system, which assists a user in searching by offering relevant search queries to the user's past queries. To obtain a set of highly relevant items, a relevance metric is needed so that each item can be assigned a score based on their relevance to the query item set. There are many ways to quantify item relevance. Among them, there is a family of relevance metrics that define the relevance between items based on the structure of a graph induced from explicit relationships between items. Using the item relationship graph, in which each node is an item and each edge represents a relationship between a pair of item, these relevance metrics consider all the paths connecting between items in order to quantify their relevance. We refer to this type of relevance metrics as pathbased relevance metrics. Scores computed from wellknown
Exploiting iterativeness for parallel ML computations
"... Many largescale machine learning (ML) applications use iterative algorithms to converge on parameter values that make the chosen model fit the input data. Often, this approach results in the same sequence of accesses to parameters repeating each iteration. This paper shows that these repeating pa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Many largescale machine learning (ML) applications use iterative algorithms to converge on parameter values that make the chosen model fit the input data. Often, this approach results in the same sequence of accesses to parameters repeating each iteration. This paper shows that these repeating patterns can and should be exploited to improve the efficiency of the parallel and distributed ML applications that will be a mainstay in cloud computing environments. Focusing on the increasingly popular “parameter server ” approach to sharing model parameters among worker threads, we describe and demonstrate how the repeating patterns can be exploited. Examples include replacing dynamic cache and server structures with static preserialized structures, informing prefetch and partitioning decisions, and determining which data should be cached at each thread to avoid both contention and slow accesses to memory banks attached to other sockets. Experiments show that such exploitation reduces periteration time by 33–98%, for three real ML workloads, and that these improvements are robust to variation in the patterns over time. 1.
GraphLab: A Distributed Abstraction for Large Scale Machine Learning
, 2013
"... Machine Learning methods have found increasing applicability and relevance to the real world, finding applications in a broad range of fields in robotics, data mining, physics and biology, among many others. However, with the growth of the World Wide Web, and with improvements in data collection tec ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Machine Learning methods have found increasing applicability and relevance to the real world, finding applications in a broad range of fields in robotics, data mining, physics and biology, among many others. However, with the growth of the World Wide Web, and with improvements in data collection technology, real world datasets have been rapidly increasing in size and complexity, necessitating comparable scaling of Machine Learning algorithms. However, designing and implementing efficient parallel Machine Learning algorithms is challenging. Existing highlevel parallel abstractions like MapReduce are insufficiently expressive while lowlevel tools such as MPI are difficult to use, and leave Machine Learning experts repeatedly solving the same design challenges. In this thesis, we trace the development of a framework called GraphLab which aims to provide an expressive and efficient high level abstraction to satisfy the needs of a broad range of Machine Learning algorithms. We discuss the initial GraphLab design, including details of a shared memory and distributed memory implementation. Next, we discuss the scaling limitations of GraphLab on realworld powerlaw graphs and how that informed the design of PowerGraph. By placing restrictions on the abstraction, we are able to improve scalability,
Supplementary File of the TPDS manuscript
"... This supplementary file contains the supporting materials of the TPDS manuscript “Maiter: An Asynchronous Graph Processing Framework for Deltabased Accumulative Iterative Computation.” It improves the completeness of the TPDS manuscript. ..."
Abstract
 Add to MetaCart
This supplementary file contains the supporting materials of the TPDS manuscript “Maiter: An Asynchronous Graph Processing Framework for Deltabased Accumulative Iterative Computation.” It improves the completeness of the TPDS manuscript.