Results 1 - 10
of
171
Pig Latin: A Not-So-Foreign Language for Data Processing
"... There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively e ..."
Abstract
-
Cited by 607 (13 self)
- Add to MetaCart
-level, procedural style of map-reduce. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. We give a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required
Google’s MapReduce Programming Model — Revisited
"... Google’s MapReduce programming model serves for processing large data sets in a massively parallel manner. We deliver the first rigorous description of the model including its advancement as Google’s domain-specific language Sawzall. To this end, we reverse-engineer the seminal papers on MapReduce a ..."
Abstract
-
Cited by 82 (1 self)
- Add to MetaCart
Google’s MapReduce programming model serves for processing large data sets in a massively parallel manner. We deliver the first rigorous description of the model including its advancement as Google’s domain-specific language Sawzall. To this end, we reverse-engineer the seminal papers on MapReduce
MapReduce Framework
, 2010
"... have become so complex, and thus computation tools play an important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives with MapReduce through the case study approach, and demon-strate these primitives with different computation engines ..."
Abstract
- Add to MetaCart
have become so complex, and thus computation tools play an important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives with MapReduce through the case study approach, and demon-strate these primitives with different computation engines
Behavioral Simulations in MapReduce
"... In many scientific domains, researchers are turning to large-scale behavioral simulations to better understand real-world phenomena. While there has been a great deal of work on simulation tools from the high-performance computing community, behavioral simulations remain challenging to program and a ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
and automatically scale in parallel environments. In this paper we present BRACE (Big Red Agent-based Computation Engine), which extends the MapReduce framework to process these simulations efficiently across a cluster. We can leverage spatial locality to treat behavioral simulations as iterated spatial joins
Map-Reduce Examples
"... tf-idf weight (term frequencyinverse document frequency) is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word ..."
Abstract
- Add to MetaCart
of the word in the corpus. Variations of the tfidf weighting scheme are often used by search engines as a central tool in scoring and ranking a document’s relevance given a user query. TF-IDF ◮ The term frequency (tf) for a given term ti within a particular document dj is defined as follows, where ni
MapReduce on the Cell Broadband Engine Architecture
, 2007
"... In this paper, we propose the evaluation of MapReduce on the Cell processor by way of the Marchine Cubes application. We argue that the Cell architecture and the MapReduce parallel programming model complement each other well, and that the Marching Cubes application is a good application through whi ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper, we propose the evaluation of MapReduce on the Cell processor by way of the Marchine Cubes application. We argue that the Cell architecture and the MapReduce parallel programming model complement each other well, and that the Marching Cubes application is a good application through
Disco: Distributed co-clustering with map-reduce. ICDM
, 2008
"... Huge datasets are becoming prevalent; even as researchers, we now routinely have to work with datasets that are up to a few terabytes in size. Interesting real-world applications produce huge volumes of messy data. The mining process involves several steps, starting from pre-processing the raw data ..."
Abstract
-
Cited by 53 (1 self)
- Add to MetaCart
to estimating the final models. As data become more abundant, scalable and easyto-use tools for distributed processing are also emerging. Among those, Map-Reduce has been widely embraced by both academia and industry. In database terms, Map-Reduce is a simple yet powerful execution engine, which can
Cogset: A High-Performance MapReduce Engine
, 2011
"... MapReduce has become a widely employed programming model for large-scale data-intensive computations. Traditional MapReduce engines employ dynamic routing of data as a core mech-anism for fault tolerance and load balancing. An alternative mechanism is static routing, which reduces the need to store ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
MapReduce has become a widely employed programming model for large-scale data-intensive computations. Traditional MapReduce engines employ dynamic routing of data as a core mech-anism for fault tolerance and load balancing. An alternative mechanism is static routing, which reduces the need to store
Soren: Adaptive MapReduce for Programmable
"... Abstract. InrecentyearstheMapReduceprogrammingmodelhasbeen widely used for developing parallel data-intensive applications. As a result of its popularity, there exist many implementations of the MapReduce model on different parallel architectures including on massively parallel programmable GPUs. A ..."
Abstract
- Add to MetaCart
which is capable of monitoring key characteristics of applications and dynamically executing them efficiently in one of the three variations of the MapReduce engine it implements. Our preliminary results show that our adaptive method can significantly improve performance for many MapReduce applications
Versatile XQuery Processing in MapReduce
"... Abstract. The MapReduce (MR) framework has become a standard tool for performing large batch computations—usually of aggregative nature—in parallel over a cluster of commodity machines. A significant share of typical MR jobs involves standard database-style queries, where it becomes cumbersome to sp ..."
Abstract
- Add to MetaCart
Abstract. The MapReduce (MR) framework has become a standard tool for performing large batch computations—usually of aggregative nature—in parallel over a cluster of commodity machines. A significant share of typical MR jobs involves standard database-style queries, where it becomes cumbersome
Results 1 - 10
of
171