• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

DMCA

Pig Latin: A Not-So-Foreign Language for Data Processing

Cached

  • Download as a PDF

Download Links

  • [www.cs.brandeis.edu]
  • [www.cs.brandeis.edu]
  • [infolab.stanford.edu]
  • [www.cs.cmu.edu]
  • [infolab.stanford.edu]
  • [www-db.stanford.edu]
  • [web.mit.edu]
  • [www.cs.cmu.edu]
  • [www-db.stanford.edu]
  • [www-2.cs.cmu.edu]
  • [cs.brown.edu]
  • [cs.brown.edu]
  • [www.tomkinshome.com]
  • [www.cs.ucr.edu]
  • [infolab.stanford.edu]
  • [www.cse.buffalo.edu]
  • [cs.brown.edu]
  • [www.cs.brandeis.edu]
  • [www.cse.iitb.ac.in]
  • [www.cs.ucr.edu]
  • [infolab.stanford.edu]
  • [i.stanford.edu]
  • [www.tomkinshome.com]
  • [www.cse.iitb.ac.in]
  • [www.cs.brandeis.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Christopher Olston , Benjamin Reed , Utkarsh Srivastava , Ravi Kumar , Andrew Tomkins
Citations:606 - 13 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Olston_piglatin:,
    author = {Christopher Olston and Benjamin Reed and Utkarsh Srivastava and Ravi Kumar and Andrew Tomkins},
    title = {Pig Latin: A Not-So-Foreign Language for Data Processing},
    year = {}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural map-reduce programming model, and its associated scalable implementations on commodity hardware, is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse. We describe a new language called Pig Latin that we have designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. We give a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. We also report on a novel debugging environment that comes integrated with Pig, that can lead to even higher productivity gains. Pig is an open-source, Apache-incubator project, and available for general use. 1.

Keyphrases

pig latin    data processing    not-so-foreign language    map-reduce implementation    general use    new language    declarative style    scalable implementation    map-reduce paradigm    large data set    sweet spot    internet company    data analysis task    productivity gain    procedural style    apache-incubator project    procedural programmer    great deal    procedural map-reduce programming model    novel debugging environment    parallel database product    custom user code    physical plan    compiles pig latin    ad-hoc analysis    sql style    commodity hardware   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University