DMCA
Automatic Optimization for MapReduce Programs
Cached
Download Links
Citations: | 44 - 0 self |
Citations
3437 | Mapreduce: Simplified data processing on large clusters
- Dean, Ghemawat
(Show Context)
Citation Context ... manageable administrative overhead: Yahoo has announced one that uses 10,000 cores [29]. Although the original motivation of MapReduce’s designers was scalability for Web-scale bulk-processing tasks =-=[11]-=-, Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title o... |
2750 | R-trees: a dynamic index structure for spatial searching
- Guttman
- 1984
(Show Context)
Citation Context ...uce [12]. The critical contribution of Manimal is to detect these selections automatically in unmodified developer code. Manimal currently uses a B+Tree, but in the future could also employ an R-Tree =-=[15]-=- or some other indexing technique when appropriate. Projection optimizations modify the on-disk data file to only store bytes that are actually necessary for executing the user’s code. For example, th... |
607 | Pig latin: a not-so-foreign language for data processing
- Olston, Reed, et al.
- 2008
(Show Context)
Citation Context ...ate for MapReduce programs that are “program-specific,” with code that is directly related to the user’s end-goals for the program. For tools layered on top of MapReduce, such as the Pig query system =-=[21]-=-, we believe a better approach is for the tools to give Manimal explicit hints about program semantics. We discuss this issue in more depth in Appendix A. Background There has been a recent surge of i... |
350 | Improving MapReduce Performance in Heterogeneous Environments
- Zaharia, Konwinski, et al.
- 2008
(Show Context)
Citation Context ...in MapReduce systems. Some projects have applied MapReduceinspired techniques to building a traditional relational database [2, 26], but most have focused on improving MapReduce execution performance =-=[3, 8, 12, 27, 28]-=-. However, most of these projects are either low-level system techniques that are semantics-free, or ask the user to modify their code to expose more program semantics. To the best of our knowledge, M... |
257 | A comparison of approaches to large-scale data analysis
- PAVLO, PAULSON, et al.
- 2009
(Show Context)
Citation Context ...file data – overlap with traditional relational workloads. However, MapReduce systems lag far behind RDBMSes in their query-processing sophistication and runtime efficiency. For example, Pavlo et al. =-=[22]-=- showed that a MapReduce program can run 2-50x slower than a similar relational query run on an RDBMS, using identical hardware. A recent paper by Anderson and Tucek [5] suggested that Hadoop performe... |
227 |
C-Store: A Column-Oriented DBMS
- Stonebraker, Abadi, et al.
- 2005
(Show Context)
Citation Context ...em can use an alternate serialized version of the data that stores only the needed fields for a program, thereby reducing the overall number of bytes that must be processed (similar to a column-store =-=[25]-=- or an ondisk binary association table [7]). The goal of Manimal is to(SELECT, V.rank(), V.rank() > 1) /logs/log.1 /logs/log.2 ... /logs/.log.1.idx /logs/.log.2.idx ... Manimal catalog select src... ... |
203 | Map-Reduce-Merge: simplified relational data processing on large clusters
- Yang, Dasdan, et al.
(Show Context)
Citation Context ...in MapReduce systems. Some projects have applied MapReduceinspired techniques to building a traditional relational database [2, 26], but most have focused on improving MapReduce execution performance =-=[3, 8, 12, 27, 28]-=-. However, most of these projects are either low-level system techniques that are semantics-free, or ask the user to modify their code to expose more program semantics. To the best of our knowledge, M... |
180 | HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
- Abouzeid, Bajda-Pawlikowski, et al.
- 2009
(Show Context)
Citation Context ...e depth in Appendix A. Background There has been a recent surge of interest in MapReduce systems. Some projects have applied MapReduceinspired techniques to building a traditional relational database =-=[2, 26]-=-, but most have focused on improving MapReduce execution performance [3, 8, 12, 27, 28]. However, most of these projects are either low-level system techniques that are semantics-free, or ask the user... |
161 | Database architecture optimized for the new bottleneck: Memory access.
- Boncz, Manegold, et al.
- 1999
(Show Context)
Citation Context ...of the data that stores only the needed fields for a program, thereby reducing the overall number of bytes that must be processed (similar to a column-store [25] or an ondisk binary association table =-=[7]-=-). The goal of Manimal is to(SELECT, V.rank(), V.rank() > 1) /logs/log.1 /logs/log.2 ... /logs/.log.1.idx /logs/.log.2.idx ... Manimal catalog select src... select src... ... (SELECT, log1.idx, V.ran... |
140 | Integrating Compression and Execution in Column-Oriented Database Systems.
- Abadi, Madden, et al.
- 2006
(Show Context)
Citation Context ...everal compression techniques, but in all cases applies the technique to the entire data file. Instead, Manimal enables two semantics-aware forms of compression, both previously used by Abadi, et al. =-=[1]-=-: First, delta-compression efficiently stores runs of numeric values, by only keeping differences between values, instead of the absolute values. Storing just small deltas, when combined with a size-s... |
135 | HaLoop: efficient iterative data processing on large clusters
- Bu, Howe, et al.
- 2010
(Show Context)
Citation Context ...in MapReduce systems. Some projects have applied MapReduceinspired techniques to building a traditional relational database [2, 26], but most have focused on improving MapReduce execution performance =-=[3, 8, 12, 27, 28]-=-. However, most of these projects are either low-level system techniques that are semantics-free, or ask the user to modify their code to expose more program semantics. To the best of our knowledge, M... |
99 | Efficient Representations and Abstractions for Quantifying and Exploiting Data Reference Locality.
- Chilimbi
- 2001
(Show Context)
Citation Context ... N/A 0 Table 2: Overall performance improvement provided by Manimal, across the Pavlo benchmark tasks. Manimal’s analyzer employs compiler techniques for databasestyle optimizations. Chilimbi, et al. =-=[10]-=- tried to reorder inmemory data representations to improve cache behavior, a systems-level improvement suggested by program semantics. They did not examine disk-based approaches. Manimal has some qual... |
97 |
Fair Scheduling for Distributed Computing Clusters.
- Quincy
- 2009
(Show Context)
Citation Context ...mount of recent work on MapReduce [3, 8, 12, 27, 28], though none that takes system’s wholly-automated approach to optimization. Several efforts have explored the problem of scheduling task execution =-=[16, 28]-=-. Afrati and Ullman [3] investigated how to efficiently perform joins using MapReduce. Yang, et al. [27] extended the programming model to Map-Reduce-Merge, allowing the user to express different join... |
91 | Optimizing joins in a map-reduce environment
- Afrati, Ullman
- 2010
(Show Context)
Citation Context ...in MapReduce systems. Some projects have applied MapReduceinspired techniques to building a traditional relational database [2, 26], but most have focused on improving MapReduce execution performance =-=[3, 8, 12, 27, 28]-=-. However, most of these projects are either low-level system techniques that are semantics-free, or ask the user to modify their code to expose more program semantics. To the best of our knowledge, M... |
79 | Hadoop++: Making a Yellow Elephant Run Like a Cheetah.
- Dittrich, Quiane-Ruiz, et al.
- 2010
(Show Context)
Citation Context ...in MapReduce systems. Some projects have applied MapReduceinspired techniques to building a traditional relational database [2, 26], but most have focused on improving MapReduce execution performance =-=[3, 8, 12, 27, 28]-=-. However, most of these projects are either low-level system techniques that are semantics-free, or ask the user to modify their code to expose more program semantics. To the best of our knowledge, M... |
57 |
LINQ: reconciling objects, relations and XML in the .NET framework.
- Meijer, Beckman, et al.
- 2006
(Show Context)
Citation Context ...c analysis to automatically apply well-known optimizations to a novel language. There has also been work in integrating data-manipulation primitives with traditional programming languages, as in LINQ =-=[19]-=-. Such tools side-step the need for Manimal’s analyzer but also require the programmer to use a potentiallynovel programming language. 6. CONCLUSIONS We have described the Manimal system for optimizin... |
44 | A Complete and Efficient Algebraic Compiler for XQuery.
- Re, Simeon, et al.
- 2006
(Show Context)
Citation Context ...he behavior, a systems-level improvement suggested by program semantics. They did not examine disk-based approaches. Manimal has some qualities in common with work optimizing XQuery (e.g., Ré, et al. =-=[24]-=-). In particular, Manimal follows the same general approach of using static analysis to automatically apply well-known optimizations to a novel language. There has also been work in integrating data-m... |
26 | Manimal: Relational Optimization for Data-intensive Programs. In WebDB,
- Cafarella, Re
- 2010
(Show Context)
Citation Context ...e system to use data semantics-driven optimizations without requiring any code changes from developers. We previously presented an outline of the Manimal architecture and a single experimental result =-=[9]-=-. This paper substantially expands on that earlier work, with new optimization techniques, a much more detailed technical discussion, more complete discussions of MapReduce workloads, and full experim... |
23 | Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database.
- Yang, Yen, et al.
- 2010
(Show Context)
Citation Context ...e depth in Appendix A. Background There has been a recent surge of interest in MapReduce systems. Some projects have applied MapReduceinspired techniques to building a traditional relational database =-=[2, 26]-=-, but most have focused on improving MapReduce execution performance [3, 8, 12, 27, 28]. However, most of these projects are either low-level system techniques that are semantics-free, or ask the user... |
19 |
Efficiency matters!
- Anderson, Tucek
- 2009
(Show Context)
Citation Context ...cy. For example, Pavlo et al. [22] showed that a MapReduce program can run 2-50x slower than a similar relational query run on an RDBMS, using identical hardware. A recent paper by Anderson and Tucek =-=[5]-=- suggested that Hadoop performed bulk data processing at a rate of less than 5 megabytes per second per node (and barely more than half a megabyte per second per core) [5]. Thus, MapReduce systems may... |
16 | Mrbench: A benchmark for mapreduce framework - Kim, Jeon, et al. - 2008 |
4 |
Yahoo! Launches World’s Largest Hadoop Production Application
- Zawodny
(Show Context)
Citation Context ...o explicitly-declared metadata. Further, MapReduce systems have shown they can operate at extremely large scale with manageable administrative overhead: Yahoo has announced one that uses 10,000 cores =-=[29]-=-. Although the original motivation of MapReduce’s designers was scalability for Web-scale bulk-processing tasks [11], Permission to copy without fee all or part of this material is granted provided th... |
2 | Apache Hadoop: Best practices and anti-patterns. http://developer.yahoo.com/blogs/hadoop/posts/2010/08/apache_ hadoop_best_practices_a - Murthy - 2010 |
1 | Gridmix3: Emulating production io workload for apache hadoop - Douglas, Tang - 2010 |
1 |
Speeding up hadoop using column-store techniques
- Floratou, Patel, et al.
(Show Context)
Citation Context ...educe programs. There have been several recent index-style attempts to improve MapReduce performance, such as Dittrich, et al. [12]’s Hadoop++ system, and the column-oriented work of Floratou, et al. =-=[14]-=-. The former requires explicit support from the programmer, while the latter only requires physical storage reorganization; both could be used as targets for Manimal.Test Description Space Overhead H... |