• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

SociaLite: Datalog Extensions for Efficient Social Network Analysis

by Jiwon Seo, Stephen Guo, Monica S. Lam
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis

by Jiwon Seo, Jongsoo Park, Jaeho Shin, Monica S. Lam
"... Large-scale graph analysis is becoming important with the rise of world-wide social network services. Recently in SociaLite, we proposed extensions to Datalog to efficiently and succinctly implement graph analysis programs on sequential machines. This paper describes novel extensions and optimizatio ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Large-scale graph analysis is becoming important with the rise of world-wide social network services. Recently in SociaLite, we proposed extensions to Datalog to efficiently and succinctly implement graph analysis programs on sequential machines. This paper describes novel extensions and optimizations of SociaLite for parallel and distributed executions to support large-scale graph analysis. With distributed SociaLite, programmers simply annotate how data are to be distributed, then the necessary communication is automatically inferred to generate parallel code for cluster of multi-core machines. It optimizes the evaluation of recursive monotone aggregate functions using a delta stepping technique. In addition, approximate computation is supported in SociaLite, allowing programmers to trade off accuracy for less time and space. We evaluated SociaLite with six core graph algorithms used in many social network analyses. Our experiment with 64 Amazon EC2 8-core instances shows that SociaLite programs performed within a factor of two with respect to ideal weak scaling. Compared to optimized Giraph, an opensource alternative of Pregel, SociaLite programs are 4 to 12 times faster across benchmark algorithms, and 22 times more succinct on average. As a declarative query language, SociaLite, with the help of a compiler that generates efficient parallel and approximate code, can be used easily to create many social apps that operate on large-scale distributed graphs. 1.
(Show Context)

Citation Context

... the shortest paths problem was found to run over 30 times slower using LogicBlox [24], a state-of-the-art commercial implementation of Datalog, than a Java implementation of the Dijkstra’s algorithm =-=[32]-=-. Recently, we proposed SociaLite, Datalog extensions for efficient graph analysis on sequential machines [32]. Through annotations, the programmers can specify that relations be represented with nest...

NScale: Neighborhoodcentric Large-Scale Graph Analytics

by Abdul Quamar, Amol Deshpande, Jimmy Lin - in the Cloud,” http://arxiv.org/abs/1405.1499 , 2014
"... There is an increasing interest in executing rich and complex analy-sis tasks over large-scale graphs, many of which require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph. Examples of such tasks include ego net-work analysis, motif counting, findi ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
There is an increasing interest in executing rich and complex analy-sis tasks over large-scale graphs, many of which require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph. Examples of such tasks include ego net-work analysis, motif counting, finding social circles, personalized recommendations, link prediction, anomaly detection, analyzing in-fluence cascades, and so on. These tasks are not well served by the existing vertex-centric graph processing frameworks, whose com-putation and execution models limit the user program to directly access the state of a single vertex; this results in high communi-cation, scheduling, and memory overheads in executing such tasks using those frameworks. Further, most existing graph processing frameworks typically ignore the challenges in extracting the rele-vant portion of the graph that an analysis task needs, and loading it
(Show Context)

Citation Context

...upport the aforementioned tasks; further these frameworks use serial execution within a partition and the onus of parallelization is left to the user. Other graph processing frameworks like Socialite =-=[29]-=- do not support user-specified computations. Secondly, most of these frameworks ignore the issues in extracting relevant portions of the underlying graph that an analysis task may be specifically inte...

Optimizing recursive queries with monotonic aggregates in deals

by Er Shkapsky, Mohan Yang, Carlo Zaniolo - In ICDE 2015. IEEE , 2015
"... Abstract—The exploding demand for analytics has refocused the attention of data scientists on applications requiring aggrega-tion in recursion. After resisting the efforts of researchers for more than twenty years, this problem is being addressed by innovative systems that are raising logic-oriented ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract—The exploding demand for analytics has refocused the attention of data scientists on applications requiring aggrega-tion in recursion. After resisting the efforts of researchers for more than twenty years, this problem is being addressed by innovative systems that are raising logic-oriented data languages to the levels of generality and performance that are needed to support efficiently a broad range of applications. Foremost among these new systems, the Deductive Application Language System (DeALS) achieves superior generality and performance via new constructs and optimization techniques for monotonic aggregates which are described in the paper. The use of a special class of monotonic aggregates in recursion was made possible by recent theoretical results that proved that they preserve the rigorous least-fixpoint semantics of core Datalog programs. This paper thus describes how DeALS extends their definitions and modifies their syntax to enable a concise expression of applications that, without them, could not be expressed in performance-conducive ways, or could not be expressed at all. Then the paper turns to the performance issue, and introduces novel implementation and optimization techniques that outperform tra-ditional approaches, including Semi-naive evaluation. An extensive experimental evaluation was executed comparing DeALS with other systems on large datasets. The results suggest that, unlike other systems, DeALS indeed combines superior generality with superior performance. I.
(Show Context)

Citation Context

... aggregation in recursion. Aggregates in recursive queries are essential in many important applications and are increasingly being applied in areas such as computer networking [1] and social networks =-=[2]-=-. Many significant applications require iterating over counts or probability computations, including machine learning algorithms for Markov chains and hidden Markov models, and data mining algorithms ...

Compiled Plans for In-Memory Path-Counting Queries

by Brandon Myers, Jeremy Hyrkas, Daniel Halperin, Bill Howe , 2013
"... Dissatisfaction with relational databases for large-scale graph processing has motivated a new class of graph databases that offer fast graph processing but sacrifice the ability to express basic relational idioms. However, we hypothesize that the performance benefits amount to implementation detai ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Dissatisfaction with relational databases for large-scale graph processing has motivated a new class of graph databases that offer fast graph processing but sacrifice the ability to express basic relational idioms. However, we hypothesize that the performance benefits amount to implementation details, not a fundamental limitation of the relational model. To evaluate this hypothesis, we are exploring code-generation to produce fast in-memory algorithms and data structures for graph patterns that are inaccessible to conventional relational optimizers. In this paper, we present preliminary results for this approach on path-counting queries, which includes triangle counting as a special case. We compile Datalog queries into main-memory pipelined hash-join plans in C++, and show that the resulting programs easily outperform PostgreSQL on real graphs with different degrees of skew. We then produce analogous parallel programs for Grappa, a runtime system for distributed memory architectures. Grappa is a good target for building a parallel query system as its shared memory programming model and communication mechanisms provide productivity and performance when building communication-intensive applications. Our experiments suggest that Grappa programs using hash joins have competitive performance with queries executed on Greenplum, a commercial parallel database. We find preliminary evidence that a code generation approach simplifies the design of a query engine for graph analysis and improves performance over conventional relational databases.
(Show Context)

Citation Context

...arallel loop constructs that exploit spatial locality when it exists; these idioms are a natural fit for pipelined query plans. 3 Code Generation for Path-Counting Queries Following Seo, Guo, and Lam =-=[26]-=-, we adopt a Datalog syntax for expressing graph queries. In this paper, we show only preliminary results of the efficacy of the code generation approach rather than a full Datalog implementation. We ...

GRAPHiQL: A Graph Intuitive Query Language for Relational Databases

by Alekh Jindal, Samuel Madden
"... Abstract—Graph analytics is becoming increasingly popular, driving many important business applications from social net-work analysis to machine learning. Since most graph data is collected in a relational database, it seems natural to attempt to perform graph analytics within the relational environ ..."
Abstract - Add to MetaCart
Abstract—Graph analytics is becoming increasingly popular, driving many important business applications from social net-work analysis to machine learning. Since most graph data is collected in a relational database, it seems natural to attempt to perform graph analytics within the relational environment. However, SQL, the query language for relational databases, makes it difficult to express graph analytics operations. This is because SQL requires programmers to think in terms of tables and joins, rather than the more natural representation of graphs as collections of nodes and edges. As a result, even relatively simple graph operations can require very complex SQL queries. In this paper, we present GRAPHiQL, an intuitive query language for graph analytics, which allows developers to reason in terms of nodes and edges. GRAPHiQL provides key graph constructs such as looping, recursion, and neighborhood operations. At runtime, GRAPHiQL compiles graph programs into efficient SQL queries that can run on any relational database. We demonstrate the applicability of GRAPHiQL on several applications and compare the performance of GRAPHiQL queries with those of Apache Giraph (a popular ‘vertex centric ’ graph programming language). I.
(Show Context)

Citation Context

... [3] and its extensions [4], [5], [6], GPS [7], Trinity [8], GRACE [9], [10], Pregelix [11]; neighborhood-centric systems, e.g. Giraph++ [12], NScale [13], [14]; datalog-based systems, e.g. Socialite =-=[15]-=-, [16], GrDB [17], [18]; SPARQL-based systems, e.g. G-SPARQL [19]; RDF stores, e.g. Jena [20] and AllegroGraph [21]; key-value stores, e.g. Neo4j [22], HypergraphDB [23]; and others such as TAO [24] a...

†Parallel Computing Lab, Intel Labs

by unknown authors
"... Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in ..."
Abstract - Add to MetaCart
Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among oth-ers) have been developed, each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap " between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users ’ choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for im-proving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get per-formance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implemen-tations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific al-gorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use.
(Show Context)

Citation Context

...g should be. Different programming frameworks use different bases such as sparse matrix operations [11, 22], vertex programs from the point of view of a single vertex [8, 21], declarative programming =-=[30]-=- or generic task based parallelization [26]. Further, all of the approaches differ in performance, and different frameworks often perform better on different algorithms. This makes it extremely diffic...

†Parallel Computing Lab, Intel Labs

by unknown authors
"... Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in ..."
Abstract - Add to MetaCart
Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among oth-ers) have been developed each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap " between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users ’ choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for im-proving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get per-formance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves Vs programming model abstractions Vs the framework implemen-tations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific al-gorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use. 1.
(Show Context)

Citation Context

...e. Different programming frameworks use different bases such as sparse matrix operations [11, 22], programs from the point of view of a single vertex (vertex program) [8, 21], declarative programming =-=[30]-=- or generic task based parallelization [26]. Further, all of the approaches differ in performance as well, and different frameworks often perform better on different algorithms. This makes it extremel...

Adobe Systems

by Fredrik Kjolstad, Shoaib Kamil, David I. W. Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny M. Kaufman, Gurtej Kanwar, Wojciech Matusik, Saman Amarasinghe, Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-kelley, David I. W. Levin, Desai Chen, Etienne Vouga, Danny M. Kaufman, Gurtej Kanwar, Wojciech Matusik, Saman Amarasinghe , 2015
"... Using existing programming tools, writing high-performance simulation code is labor intensive and requires sacrificing readability and portability. The alternative is to prototype simulations in a high-level language like Matlab, thereby sacrificing performance. The Matlab programming model naturall ..."
Abstract - Add to MetaCart
Using existing programming tools, writing high-performance simulation code is labor intensive and requires sacrificing readability and portability. The alternative is to prototype simulations in a high-level language like Matlab, thereby sacrificing performance. The Matlab programming model naturally describes the behavior of an entire physical system using the language of linear algebra. However, simulations also manipulate individual geometric elements, which are best represented using linked data structures like meshes. Translating between the linked data structures and linear algebra comes at significant cost, both to the programmer and the machine. High-performance implementations avoid the cost by rephrasing the computation in terms of linked or index data structures, leaving the code complicated and monolithic, often increasing its size by an order of magnitude. In this paper, we present Simit, a new language for physical simulations that lets the programmer view the system both as a linked data structure in the form of a hypergraph, and as a set of global vectors, matrices and tensors

Continuous query processing; Temporal analytics; Dynamic social

by Jayanta Mondal, Amol Deshpande
"... networks; Incremental computation. ..."
Abstract - Add to MetaCart
networks; Incremental computation.
(Show Context)

Citation Context

...been shown to be an effective centerpiece in enabling declarative specification in a range of domains including networking [77], data cleaning [21], machine learning [36], and social network analysis =-=[89, 98]-=-. Compared to the above two languages, Datalog seems more amenable to be extended to support a large class of complex aggregate queries (e.g., global queries like PageRank computation, shortest paths,...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University