Results 1 - 10
of
387
Memory access buffering in multiprocessors
- In Proceedings of the 13th Annual International Symposium on Computer Architecture
, 1986
"... In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access la-tency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss. W ..."
Abstract
-
Cited by 254 (4 self)
- Add to MetaCart
In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access la-tency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss
Available Instruction-Level Parallelism for Superscalar and Superpipelined Machines
, 1989
"... Superscalar machines can issue several instructions per cycle. Superpipelined machines can issue only one instruction per cycle, but they have cycle times shorter than the latency of any functional unit. In this paper these two techniques are shown to be roughly equivalent ways of exploiting instruc ..."
Abstract
-
Cited by 205 (12 self)
- Add to MetaCart
of superpipelining metric is introduced. Our simulations suggest that this metric is already high for many machines. These machines already exploit all of the instruction-level parallelism available in many non-numeric applications, even without parallel instruction issue or higher degrees of pipelining.
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud
, 2012
"... While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill ..."
Abstract
-
Cited by 141 (2 self)
- Add to MetaCart
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help
Monsoon: an explicit token-store architecture
- In Proc. of the 17th Annual Int. Symp. on Comp. Arch
, 1990
"... Dataflow architectures tolerate long unpredictable com-munication delays and support generation and coordi-nation of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the proposed mecha-nisms are complex and introduce n ..."
Abstract
-
Cited by 194 (13 self)
- Add to MetaCart
with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dy-namically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von
Tuning Paxos for high-throughput with batching and pipelining
, 2011
"... Paxos is probably the most popular state machine replication protocol. Two optimizations that can greatly improve its performance are batching and pipelining. Nevertheless, tuning these two optimizations to achieve high-throughput can be challenging, as their effectiveness depends on many parameters ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Paxos is probably the most popular state machine replication protocol. Two optimizations that can greatly improve its performance are batching and pipelining. Nevertheless, tuning these two optimizations to achieve high-throughput can be challenging, as their effectiveness depends on many
Multiple-banked register file architectures
- In International Symposium on Computer Architecture(ISCA-27
, 2000
"... Abstract The register file access time is one of the critical delays in current superscalar processors. Its impact on processor performance is likely to increase in future processor generations, as they are expected to increase the issue width (which implies more register ports) and the size of the ..."
Abstract
-
Cited by 146 (12 self)
- Add to MetaCart
of the instruction window (which implies more registers), and to use some kind of multithreading. Under this scenario, the register file access time could be a dominant delay and a pipelined implementation would be desirable to allow for high clock rates. However, a multi-stage register file has severe implications
Experience with a Software-Defined Machine Architecture
- Unreachable Procedures in Object-oriented WRL Research Report 91/10
, 1991
"... We built a system in which the compiler back end and the linker work together to present an abstract machine at a considerably higher level than the actual machine. The intermediate language translated by the back end is the target language of all high-level compilers and is also the only assembl ..."
Abstract
-
Cited by 57 (7 self)
- Add to MetaCart
We built a system in which the compiler back end and the linker work together to present an abstract machine at a considerably higher level than the actual machine. The intermediate language translated by the back end is the target language of all high-level compilers and is also the only
QUICK PIPING: A Fast, High-Level Model for Describing Processor Pipelines †
"... Responding to marketplace needs, today’s embedded processors must feature a flexible core that allows easy modification with fast time to market. In this environment, embedded processors are increasingly reliant on flexible support tools. This paper presents one such tool, called Quick Piping, a new ..."
Abstract
- Add to MetaCart
new, high-level formalism for modeling processor pipelines. Quick Piping consists of three primary components that together provide an easy-to-build, reusable processor description: Pipeline graphs—a new high-level formalism for modeling processor pipelines, pipe—a companion domain-specific language
High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms
"... Abstract—We propose, implement, and experimentally evalu-ate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have multiple CPUs and general purpose graphics process ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—We propose, implement, and experimentally evalu-ate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have multiple CPUs and general purpose graphics
TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline
- PLoS One
, 2014
"... Genotyping by sequencing (GBS) is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol is ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Genotyping by sequencing (GBS) is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol
Results 1 - 10
of
387