Results 1 - 10
of
201,264
MapReduce: Simplified data processing on large clusters.
- In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI-04),
, 2004
"... Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of ..."
Abstract
-
Cited by 3439 (3 self)
- Add to MetaCart
Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details
Pig Latin: A Not-So-Foreign Language for Data Processing
"... There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively e ..."
Abstract
-
Cited by 607 (13 self)
- Add to MetaCart
There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively
Models and issues in data stream systems
- IN PODS
, 2002
"... In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work releva ..."
Abstract
-
Cited by 786 (19 self)
- Add to MetaCart
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work
Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing
- IEEE TRANSACTIONS ON COMPUTERS
, 1987
"... Large grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or cost-sensitive applications. In some situations, designers are not willing to squander computing resources for the sake of pro ..."
Abstract
-
Cited by 598 (37 self)
- Add to MetaCart
Large grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or cost-sensitive applications. In some situations, designers are not willing to squander computing resources for the sake
Verbal reports as data
- Psychological Review
, 1980
"... The central proposal of this article is that verbal reports are data. Accounting for verbal reports, as for other kinds of data, requires explication of the mech-anisms by which the reports are generated, and the ways in which they are sensitive to experimental factors (instructions, tasks, etc.). W ..."
Abstract
-
Cited by 513 (3 self)
- Add to MetaCart
The central proposal of this article is that verbal reports are data. Accounting for verbal reports, as for other kinds of data, requires explication of the mech-anisms by which the reports are generated, and the ways in which they are sensitive to experimental factors (instructions, tasks, etc
Gaussian processes for machine learning
, 2003
"... We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters us ..."
Abstract
-
Cited by 720 (2 self)
- Add to MetaCart
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters
Hierarchical Dirichlet processes.
- Journal of the American Statistical Association,
, 2006
"... We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this s ..."
Abstract
-
Cited by 942 (78 self)
- Add to MetaCart
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data
Data Streams: Algorithms and Applications
, 2005
"... In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerg ..."
Abstract
-
Cited by 533 (22 self)
- Add to MetaCart
analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated
The Protein Data Bank
- Nucleic Acids Res
, 2000
"... The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the futur ..."
Abstract
-
Cited by 1387 (24 self)
- Add to MetaCart
deposited. In the 1980s the number of deposited structures began to increase dramatically. This was due to the improved technology for all aspects of the crystallographic process, the addition of structures determined by nuclear magnetic resonance (NMR) methods, and changes in the community views about data
Synchronous data flow
, 1987
"... Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case ..."
Abstract
-
Cited by 622 (45 self)
- Add to MetaCart
Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case
Results 1 - 10
of
201,264