Results 1 - 10
of
46,582
Microarchitecture optimizations for exploiting memory-level parallelism
- In ISCA-31
, 2004
"... The performance of memory-bound commercial applications such as databases is limited by increasing memory latencies. In this paper, we show that exploiting memory-level parallelism (MLP) is an effective approach for improving the performance of these applications and that microarchitecture has a pro ..."
Abstract
-
Cited by 97 (3 self)
- Add to MetaCart
The performance of memory-bound commercial applications such as databases is limited by increasing memory latencies. In this paper, we show that exploiting memory-level parallelism (MLP) is an effective approach for improving the performance of these applications and that microarchitecture has a
Exploiting Memory-Level Parallelism in Reconfigurable Accelerators
"... Abstract —As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism between memory accesses can be revealed ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract —As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism between memory accesses can be revealed
Are We Ready for High Memory-Level Parallelism?
"... Abstract — Recently-proposed processor microarchitectures that generate high Memory Level Parallelism (MLP) promise substantial performance gains. However, current cache hierarchies have Miss-Handling Architectures (MHAs) that are too limited to support the required MLP — they need to be redesigned ..."
Abstract
- Add to MetaCart
Abstract — Recently-proposed processor microarchitectures that generate high Memory Level Parallelism (MLP) promise substantial performance gains. However, current cache hierarchies have Miss-Handling Architectures (MHAs) that are too limited to support the required MLP — they need to be redesigned
Enhancing Memory Level Parallelism via Recovery-Free Value Prediction
- In Proceedings of the 17th International Conference on Supercomputing
, 2003
"... The ever-increasing computational power of contemporary microprocessors reduces the execution time spent on arithmetic computations (i.e., the computations not involving slow memory operations such as cache misses) significantly. Therefore, for memory intensive workloads, it becomes more important t ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
to overlap multiple cache misses than to overlap slow memory operations with other computations. In this paper, we propose a novel technique to parallelize sequential cache misses, thereby increasing memory-level parallelism (MLP). Our idea is based on the value prediction, which was proposed originally
A memory-level parallelism aware fetch policy for SMT processors
- In HPCA
, 2007
"... A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated by a stalled thread by identifying long-latenc ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
-latency loads and preventing the given thread from fetching more instructions — and in some implementations, instructions beyond the long-latency load may even be flushed which frees allocated resources. This paper proposes an SMT fetch policy that takes into account the available memory-level parallelism (MLP
Enhance Memory-Level Parallelism via Recovery-Free Value Prediction
- Proc. 2003 Int’l Conf. Supercomputing (ICS-03
, 2003
"... Abstract—The ever-increasing computational power of contemporary microprocessors reduces the execution time spent on arithmetic computations (i.e., the computations not involving slow memory operations such as cache misses) significantly. Therefore, for memory-intensive workloads, it becomes more im ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
important to overlap multiple cache misses than to overlap slow memory operations with other computations. In this paper, we propose a novel technique to parallelize sequential cache misses, thereby increasing memory-level parallelism (MLP). Our idea is based on value prediction, which was proposed
A Memory-Level Parallelism Aware Fetch Policy for SMT Processors
"... A thread executing on a simultaneous multithreading (SMTJ processor that experiences a long-latency load will eventually stall while holding execution resources. Exist-ing long-latency load aware SMT fetch policies limit the amount of resources allocated by a stalled thread by iden-trjving long-late ..."
Abstract
- Add to MetaCart
-latency loadr and preventing the given thread from fetching more instructions- and in some implemen-tations, instructions beyond the long-latency load may even be flushed which frees allocded resources. This paper proposes an SMT fetch policy that takes into account the available memory-level parallelism (MLP
Leveraging Memory Level Parallelism Using Dynamic Warp Subdivision
"... SIMD organizations have shown to allow high throughput for data-parallel applications. They can operate on multiple datapaths under the same instruction sequencer, with its set of operations happening in lockstep sometimes referred to as warps and a single lane referred to as a thread. However, abil ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
, ability of SIMD to gather from disparate addresses instead of aligned vectors means that a single long latency memory access will suspend the entire warp until it completes. This under-utilizes the computation resources and sacrifices memory level parallelism because threads that hit are not able
Synthesizing Memory-Level Parallelism Aware Miniature Clones for SPEC CPU2006 and
"... Abstract—We generate and provide miniature synthetic benchmark clones for modern workloads to solve two pre-silicon design challenges, namely: 1) huge simulation time (weeks to months) when using complete runs of modern workloads like SPEC CPU2006 having trillions of instructions on pre-silicon desi ..."
Abstract
- Add to MetaCart
-independent metrics. Our metrics include the Memory Level Parallelism (MLP) of these workloads to estimate the burstiness of accesses to the main memory. Secondly, our proposed framework, that uses this characterized information (including MLP) to generate synthetic clones is explained and evaluated. We provide
Scalable Cache Miss Handling for High Memory-Level Parallelism ∗
"... Recently-proposed processor microarchitectures for high Memory Level Parallelism (MLP) promise substantial performance gains. Unfortunately, current cache hierarchies have Miss-Handling Architectures (MHAs) that are too limited to support the required MLP — they need to be redesigned to support 1-2 ..."
Abstract
- Add to MetaCart
Recently-proposed processor microarchitectures for high Memory Level Parallelism (MLP) promise substantial performance gains. Unfortunately, current cache hierarchies have Miss-Handling Architectures (MHAs) that are too limited to support the required MLP — they need to be redesigned to support 1
Results 1 - 10
of
46,582