Results 1 -
4 of
4
Musketeer: all for one, one for all in data processing systems
"... Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is “best ” for a given workflow. Porting workflows between systems is tedious. Hence, users become “locked in”, despite faster or more ef-ficient sy ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is “best ” for a given workflow. Porting workflows between systems is tedious. Hence, users become “locked in”, despite faster or more ef-ficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that ex-press workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad). We argue that the ways that workflows are defined should be decoupled from the manner in which they are executed. To explore this idea, we have built Musketeer, a workflow manager which can dynamically map front-end workflow descriptions to a broad range of back-end execution engines. Our prototype maps workflows expressed in four high-level query languages to seven different popular data pro-cessing systems. Musketeer speeds up realistic workflows by up to 9 × by targeting different execution engines, with-out requiring any manual effort. Its automatically generated back-end code comes within 5%–30 % of the performance of hand-optimized implementations. 1.
Towards Elastic Stream Processing: Patterns and Infrastructure
"... Distributed, highly-parallel processing frameworks as Hadoop are deemed to be state-of-the-art for handling big data today. But they burden application developers with the task to manually implement program logic using lowlevel batch processing APIs. Thus, a movement can be observed that high-level ..."
Abstract
- Add to MetaCart
(Show Context)
Distributed, highly-parallel processing frameworks as Hadoop are deemed to be state-of-the-art for handling big data today. But they burden application developers with the task to manually implement program logic using lowlevel batch processing APIs. Thus, a movement can be observed that high-level languages are developed which allow to declaratively model dataflows that are automatically optimized and mapped to the batch-processing backends. However, most of these systems are based on programming models as MapReduce that provide elasticity and fault-tolerance in a natural manner since intermediate results are materialized and, therefore, processes can simply be restarted and scaled with partitioning input datasets. For continuous query processing on data streams, these concepts cannot be applied directly since it must be guaranteed that no data is lost when nodes fail. Usually, these long running queries contain operators that maintain state information which depends on the data that has already been processed and hence they cannot be restarted without information loss. This also is an issue when streaming tasks should be scaled. Therefore, integrating elasticity and fault-tolerance in this context is a challenging task which is subject of this paper. We show how common patterns from parallel and distributed algorithms can be applied to tackle these problems and how they are mapped to the Mesos cluster management system. 1.
Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/spe.2215 Some structural measures of API usability
"... In this age of collaborative software development, the importance of usable APIs is well recognized. There already exists a rich body of literature that addresses issues ranging from how to design usable APIs to assessing qualitatively the usability of a given API. However, there does not yet exist ..."
Abstract
- Add to MetaCart
(Show Context)
In this age of collaborative software development, the importance of usable APIs is well recognized. There already exists a rich body of literature that addresses issues ranging from how to design usable APIs to assessing qualitatively the usability of a given API. However, there does not yet exist a set of generalpurpose metrics that can be pressed into service for a more quantitative assessment of API usability. The goal of this paper is to remedy this shortcoming in the literature. Our work presents a set of formulas that examine the API method declarations from the perspective of several commonly held beliefs regarding what makes APIs difficult to use. We validate the numerical characterizations of API usability as produced by our
First International Workshop on Rack-scale Computing (WRSC 2014) The Case for the Holistic Language Runtime System
"... We anticipate that, by 2020, the basic unit of warehouse-scale cloud computing will be a rack-sized machine instead of an individual server. At the same time, we expect a shift from commodity hardware to custom SoCs that are specifically designed for the use in warehouse-scale comput-ing. In this pa ..."
Abstract
- Add to MetaCart
(Show Context)
We anticipate that, by 2020, the basic unit of warehouse-scale cloud computing will be a rack-sized machine instead of an individual server. At the same time, we expect a shift from commodity hardware to custom SoCs that are specifically designed for the use in warehouse-scale comput-ing. In this paper, we make the case that the software for such custom rack-scale machines should move away from the model of running managed language workloads in separate language runtimes on top of a traditional operating system but instead run a distributed language runtime system capa-ble of handling different target languages and frameworks. All applications will execute within this runtime, which per-forms most traditional OS and cluster manager functionality such as resource management, scheduling and isolation. 1.