Results 1 -
4 of
4
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
- In NSDI (2012
"... Map/Reduce style data-parallel computation is charac-terized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. In-stead of treating user-defined functions as “black boxes”, we propose to analyze ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
Map/Reduce style data-parallel computation is charac-terized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. In-stead of treating user-defined functions as “black boxes”, we propose to analyze those functions to turn them into “gray boxes ” that expose opportunities to optimize da-ta shuffling. We identify useful functional properties for user-defined functions, and propose SUDO, an optimiza-tion framework that reasons about data-partition proper-ties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 data-parallel programs used in production SCOPE clus-ters, and designed a framework that is incorporated it in-to the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47 % in terms of disk and net-work I/O for shuffling, and up to 48 % in terms of cross-pod network traffic. 1
I~~~~ ~ 8~~~
"... ~~~~TERIM~ ~ E~~~AL41J L~~~~~~ 1 ’ V CONTRACT AAG29-78-G-~Q46 / Approved for publ ic release; ~~~-.• Distribution Unlimited D D Cr~r~nnrir~ I~1N0 ~ 8 1 ~‘i~~a iJQ1 UU151~JL6UuL~7—’Y-’ ’ rf ~ 2 B /~~ j ‘ — I i ‘I ~~~_— ..."
Abstract
- Add to MetaCart
~~~~TERIM~ ~ E~~~AL41J L~~~~~~ 1 ’ V CONTRACT AAG29-78-G-~Q46 / Approved for publ ic release; ~~~-.• Distribution Unlimited D D Cr~r~nnrir~ I~1N0 ~ 8 1 ~‘i~~a iJQ1 UU151~JL6UuL~7—’Y-’ ’ rf ~ 2 B /~~ j ‘ — I i ‘I ~~~_—
Microsoft Bing Peking University
"... Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as “black boxes”, we propose to analyze t ..."
Abstract
- Add to MetaCart
(Show Context)
Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as “black boxes”, we propose to analyze those functions to turn them into “gray boxes ” that expose opportunities to optimize data shuffling. We identify useful functional properties for userdefined functions, and propose SUDO, an optimization framework that reasons about data-partition properties, functional properties, and data shuffling. We have assessed this optimization opportunity on over 10,000 dataparallel programs used in production SCOPE clusters, and designed a framework that is incorporated it into the production system. Experiments with real SCOPE programs on real production data have shown that this optimization can save up to 47 % in terms of disk and network I/O for shuffling, and up to 48 % in terms of cross-pod network traffic. 1
META-EVALUATION AS A TOOL FOR PROGRAM UNDERSTANDING
"... Formal program specifications are difficult to write. They are always constructed from an informal precursor. We are exploring the technology required to aid in the construction of the formal specification from the informal version. An informal specification differs from a formal one in that much in ..."
Abstract
- Add to MetaCart
Formal program specifications are difficult to write. They are always constructed from an informal precursor. We are exploring the technology required to aid in the construction of the formal specification from the informal version. An informal specification differs from a formal one in that much information which the writer believes the reader can infer from the context has been supressed from the specification. Resolution of the supressed information depends upon information contained in other parts of the specification and upon Knowledge of what makes a specification well-formed and the ability to model the parts of the specification interacting with one another. This paper describes the technology used in a running system which embodies theories of program well-formedness and informality resolution within the context established by symbolically executing the program to systematically discover the intended meaning of each informal construct within an informal specification.