Citations
1404 |
Cramming more components onto integrated circuits. Electronic Magazine, 38:114–117
- Moore
- 1965
(Show Context)
Citation Context ...s associated with developing hardware for exascale computation. 4.1 Evolution of Moore’s Law The exponential increase in computing capability has been enabled by two technological trends: Moore’s law =-=[31]-=- and Dennard scaling [18]. Moore’s law refers to the observation by GordonMoore that the number of transistors on a microprocessor essentially doubles every 18–24 months. Shown in Figure 4-1 is the nu... |
726 | The design and implementation of FFTW3
- Frigo, Johnson
(Show Context)
Citation Context ... the FFT, and an auto-tuner that selects the fastest implementation on the target architecture. With the exception of a few vendor implementations, FFTW is the fastest available FFT on many platforms =-=[20]-=-. There are of course tradeoffs in such a flexible, robust design. Designing a code base with a domain-specific compiler and auto-tuner requires planning, as it is difficult to graft onto a legacy cod... |
487 | The landscape of parallel computing research: A view from berkeley.
- Asanovic, Bodik, et al.
- 2006
(Show Context)
Citation Context ...data-rate capabilities and other metrics of performance for truly scaled photonic circuits. 98 6.3 Computation and Communication Patterns of DOE/NNSA Applications It has been observed by Phil Colella =-=[4]-=- that the scientific applications of interest to both NNSA and DOE possess common communication and computation patterns. Colella noted there were seven such patterns and named them “the seven dwarfs ... |
328 |
The homogeneous chaos
- Wiener
- 1938
(Show Context)
Citation Context ...ncertainty quantification (UQ) that appears often in DOE reports and projections going under the name of Wiener-Hermite expansions [28] or polynomial chaos [56]. The basic idea, due to Weiner in 1938 =-=[52]-=- is this: if we have a field u(x, t) satisfying some partial differential equation which contains uncertain elements such as transport coefficients or an uncertain driving force which depends on a flu... |
302 |
The piecewise parabolic method (PPM) for gas-dynamical simulations.
- Colella, Woodward
- 1984
(Show Context)
Citation Context ...s code used at Sandia to perform simulations of high strain rate mechanics, and • sPPM – a simplified benchmark code that solves gas dynamics problems in 3D by means of the Piecewise Parabolic Method =-=[11]-=-. The methodology used in this study is to extract an instruction stream of about four billion instructions from each of these codes. Care was taken to ensure that the instructions were associated wit... |
240 | Failure trends in a large disk drive population
- Pinheiro, Weber, et al.
- 2007
(Show Context)
Citation Context ... it has been shown that hard disk drives that experience scan errors are more likely to fail [43]. For example, drives are 39 times more likely to fail in the 60 days following their first scan error =-=[37]-=-. The amount of redundancy required to reduce the probability of data loss to an acceptable level needs to be estimated. This will dictate the number of additional hard disk drives that must be used f... |
206 | A large-scale study of failures in high-performance computing systems,”
- Schroeder, Gibson
- 2006
(Show Context)
Citation Context ...section, we examine some of the salient issues associated with the resilience of HPC systems. Recent reports show that hardware errors are the dominant cause of system failures in modern HPC machines =-=[27, 42]-=-. As an example, Figure 4-20 shows a breakdown of root causes for system failures (both soft and hard) seen in the LANL HPC systems observed by [42] over a period of a few years. For these systems, ha... |
167 | Massive arrays of idle disks for storage archives
- Colarelli, Grunwald
(Show Context)
Citation Context ...ter every use, since there is a large spike in power required to accelerate the platter to operating speed. In recent years, there has been work on what are called Massive Arrays of Idle Disks (MAID) =-=[10]-=-, with the goal of powering down as many disk drives as possible. These file system organizations have been shown to save significant amounts of power, but their applicability is highly dependent on t... |
159 | Roofline: an insightful visual performance model for multicore architectures
- Williams, Waterman, et al.
(Show Context)
Citation Context ... issues is presented in the next section. 4.4 The Roof-Line Model A useful framework for understanding the performance of scientific applications on a given hardware platform is the “roof-line” model =-=[54]-=-. The essence of this idea is that applications with sufficiently high byte to flop ratios, as discussed above, will be limited in the floating point rate they achieve by the memory bandwidth of the s... |
126 | DRAM errors in the wild: a large-scale field study
- Schroeder, Pinheiro, et al.
- 2009
(Show Context)
Citation Context ...cale with the quantity of silicon. Some reports 3see, e.g., the Computer Failure Data Repository http://cfdr.usenix.org 54 indicate that the failure rate per DIMM has not increased across generations =-=[41]-=-. Adopting the [27] rule of thumb that the system failure rate scales with the socket count, we can predict future system-level failure rate trends. For clarity of exposition, we will explore two hypo... |
119 |
ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,"
- Kogge
- 2008
(Show Context)
Citation Context ...el of capability will be very challenging for a number of reasons detailed in this report, and noted in many previous studies. 11 Figure 3-1: Evolution of peak performance over the period 1993 - 2009 =-=[27]-=- DOE and the NNSA tasked JASON to study the possibility of developing an exaflop computational capability. We quote below from the study charge as communicated by DOE/NNSA to JASON: “This study will a... |
112 |
HPL - a Portable Implementation of the High-Performance Linpack Benchmark for Distributed-memory Computers,
- Petitet, Whaley, et al.
- 2008
(Show Context)
Citation Context ...bit a significant percentage of floating point intensity, the majority do not. The strong floating point peaks originate from codes such as HPL which implements the high performance LINPACK benchmark =-=[36]-=-. This code performs an LU decomposition of a random matrix, and is tuned to minimize memory references while maximizing floating point throughput. Another view of this distribution of memory vs. floa... |
94 | Modeling uncertainty in steady state diffusion problems via generalized polynomial chaos
- Xiu, Karniadakis
(Show Context)
Citation Context ...eling. 130 There is a thread of estimating uncertainty quantification (UQ) that appears often in DOE reports and projections going under the name of Wiener-Hermite expansions [28] or polynomial chaos =-=[56]-=-. The basic idea, due to Weiner in 1938 [52] is this: if we have a field u(x, t) satisfying some partial differential equation which contains uncertain elements such as transport coefficients or an un... |
77 | Reliability mechanisms for very large storage systems
- Xin, Miller, et al.
(Show Context)
Citation Context ...fter which “magically” a new disk array replaces the old one with all the data that could be rescued from its predecessor’s catastrophe (with the holes discussed previously). The new model is simpler =-=[55]-=- 123 0 1 Fail nλ (n-1)λ ρ (a) RAID 5 with failure state 0 1 nλ ρ (n - 1)λ (b) RAID 5 ergodic model Figure A-1: Markov models for RAID 5 reliability and we can calculate equilibrium probabilities for b... |
75 |
Terrestrial cosmic rays
- Ziegler
- 1996
(Show Context)
Citation Context ...xpected, cosmic ray-induced neutrons are the dominant source of transient errors, there are avenues to be explored. The cosmic ray flux varies by a factor of ∼ 2 from the Earth’s equator to the poles =-=[58]-=-. Locating a compute cluster in Miami instead of Albuquerque, although perhaps an unpopular decision, would reduce the transient error rate by approximately one third. The cosmic ray flux also varies ... |
65 | a special-purpose machine for molecular dynamics simulation
- Shaw, Deneroff, et al.
- 2007
(Show Context)
Citation Context ...s. This has been attempted several times for specific applications. The most recent approach has been the development of a special purpose machine for computation of protein folding by D. Shaw et al. =-=[44]-=-. Such approaches have produced impressive results for specific target problems. However, since these architectures are not general purpose, they are not viewed as a credible path to exascale computin... |
63 |
The future of microprocessors.
- Olukotun, Hammond
- 2005
(Show Context)
Citation Context ...as done to keep power levels low enough for cooling purposes. The final curve shows instruction level parallelism because clock speeds have flattened, instruction level parallelism has also flattened =-=[34]-=-. 4.2 Evolution of Memory Size and Memory Bandwidth While the number of cores on each processor is increasing, the amount of total memory relative to the available computational capability is decreasi... |
53 |
GPUs and the Future of Parallel Computing.
- KECKLER, DALLY, et al.
- 2011
(Show Context)
Citation Context ...or rapid access, but has low density relative to DRAM. 46 Figure 4-18: Energy costs for computational operations. Vertical axis labels denote picoJoules. All costs are for operations on a 64 bit word =-=[26]-=-. Other operations that require communication over the processor must factor in the cost of signaling plus the cost of performing the operation. For example to communicate a 64 bit word over a distanc... |
33 | The Bleak Future of NAND Flash Memory
- Grupp, Davis, et al.
- 2012
(Show Context)
Citation Context ...when it comes to solid state memories such as NAND Flash. While the density of flash memories, particularly NAND flash, continues to increase as feature sizes decrease there is some cause for concern =-=[22]-=-. As the density increases, the performance, energy efficiency, number of cycles, and data retention time all rapidly decrease. Flash memory operates by storing bits in memory cells made from floating... |
22 |
Thermal stability of recorded information at high densities
- Charap, Lu, et al.
- 1997
(Show Context)
Citation Context ...cording is also expected to be short-lived [16]. Increases in areal density require harder magnetic materials in order avoid thermal instability. Charap’s recognition of the super-paramagnetic effect =-=[8]-=- was the cause of the most recent large technological shift from longitudinal to orthogonal recording. The next large step in recording density is expected to result from Heat-Assisted Magnetic Record... |
22 |
Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-µm to 90-nm generation
- Hazucha, Karnik, et al.
(Show Context)
Citation Context ... of months, the error rate should scale with the quantity of silicon in a chip. The scaling with voltage is more complex, but it has been shown that the failure rate increases with decreasing voltage =-=[23]-=-. Since the area of processor chips is now roughly constant with time, and core voltages appear to be leveling off, the transient failure rate for CPUs should be approximately constant across generati... |
21 | Predicting the Number of Fatal Soft Errors in Los Alamos National Laboratory’s ASC Q Supercomputer.
- Michalak
- 2005
(Show Context)
Citation Context ...n HPCclass machines. Although attempts have been made to assess the extant data 3, raw failure data is sparse. In addition, when failures are recorded, the root cause is often not known (although see =-=[30]-=- for an exception to this rule). At the component level, however, some trends can be discerned. Transient failures in CPUs are thought to be caused primarily by cosmic ray-induced neutrons which, when... |
20 | Historical Notes on the Fast Fourier Transform
- COOLEY, LEwis, et al.
- 1967
(Show Context)
Citation Context ...st Fourier transform (FFT) was understood some time ago (possibly by Gauss), it was only the invention of the digital computer which made it a revolutionary advance as pointed out by Cooley and Tukey =-=[13]-=-. 45 Although our discussion of hardware so far has been kept rather simplified, it should still be clear that the scaling trends do not favor scientific applications if the flops to byte ratios are i... |
19 | Architectural exploration of chip-scale photonic interconnection network designs using physical-layer analysis
- Chan, Hendry, et al.
- 2010
(Show Context)
Citation Context ... interconnects, it is not at all clear how well these simulations might describe the further scale up of density and complexity that will be required to mediate the memory wall for exascale computers =-=[6]-=-. In addition, there are a variety of 3D geometries being considered for the integration of the photonic layer with the microprocessor layer. Thus, although photonic interconnects have already played ... |
15 | Towards UltraHigh Resolution Models of Climate and Weather",
- Wehner, Oliker, et al.
- 2008
(Show Context)
Citation Context ...ressed as surface altitude) is resolved at scales of 200 and 25 km 73 Figure 5-1: Surface altitude of the California coast shown at resolutions of 200km (left), 25 km (middle), and 1 km (right). From =-=[51]-=-. respectively. Modeling at this scale neglects or resolves poorly important moisture transport processes associated with the dynamics of clouds, considered today one of the key uncertainties in clima... |
13 | Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems.
- Chung, Lee, et al.
- 2012
(Show Context)
Citation Context ... be potentially resolved by placing some NVRAM with fast I/O on each node so one doesn’t have to perform slower disk I/O. Alternatively, it could be handled with a more efficient checkpointing scheme =-=[9]-=-. Modular redundancy at varying levels (e.g., double, triple) has been used effectively, although it incurs a significant penalty in efficiency if used globally. There are some programming tools that ... |
12 |
Disk drive vintage and its effect on reliability
- Shah, Elerath
- 2004
(Show Context)
Citation Context ...detect these latent errors before they can cause harm. They may be harbingers of trouble yet to come, since it has been shown that hard disk drives that experience scan errors are more likely to fail =-=[43]-=-. For example, drives are 39 times more likely to fail in the 60 days following their first scan error [37]. The amount of redundancy required to reduce the probability of data loss to an acceptable l... |
12 |
Modeling the cosmic-ray-induced soft-error rate in integrated circuits: an overview
- Srinivasan
- 1996
(Show Context)
Citation Context ...elding (e.g., concrete, polyethylene) could also placed around a compute cluster to reduce neutron flux exposure. Vendors typically perform simulations and experiments to characterize hardware errors =-=[45]-=-. DOE could attempt to influence vendors early in the design phase to produce more error-resistant hardware. This could be realized by adding ECC to chip components or, in some cases, modifying the ch... |
9 | MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems”.
- Aji, Dinan, et al.
- 2012
(Show Context)
Citation Context ...h, but then quickly dropped. The above example shows that interpreting failure data often requires interaction with the people who run the systems and collect the data. The public release of the data =-=[1]-=- includes a complete FAQ of all questions that we asked LANL in the process of our work. We also study, using the repair time information in the LANL data, how much each root cause contributes to the ... |
9 | The implications of working set analysis on supercomputing memory hierarchy design
- Murphy, Rodrigues, et al.
- 2005
(Show Context)
Citation Context ...equested from main memory, and if no other productive work can be performed while this access takes place, the processor will stall. An interesting study of this issue was undertaken by Murphy et al. =-=[33]-=-, who examined the the implications of the working set size on the design of supercomputer memory hierarchies. In particular, they compared the working set sizes of the applications in the Standard Pe... |
9 |
Nuclear physics of cosmic ray interaction with semiconductor materials: particle-induced soft errors from a physicist’s perspective. IBM journal of research and development
- Tang
- 1996
(Show Context)
Citation Context ...s electron-hole pairs. If those electron-hole pairs are near a p–n junction, the electric field will separate them, leading to a change in voltage. If the voltage is above threshold, an error results =-=[47]-=-. Since the cosmic ray-induced neutron flux is approximately constant over time-scales of months, the error rate should scale with the quantity of silicon in a chip. The scaling with voltage is more c... |
8 | Integrating task and data parallelism with the collective communication archetype
- Chandy, Manohar, et al.
- 1994
(Show Context)
Citation Context ... the data flow and so can develop not only class libraries 102 but whole template codes for important operations. Chandy et al put forth a similar idea in their discussion of programming “archetypes” =-=[7]-=-. We discuss optimization approaches using hardware briefly below. Software optimizations are discussed in Section 7. 6.4 Optimizing Hardware for Computational Patterns One approach to optimizing perf... |
8 |
Intermittent faults in VLSI circuits
- Constantinescu
- 2007
(Show Context)
Citation Context ... there is a selection effect at work, since ECC-enabled RAM can detect and report errors, while CPUs are generally not designed to do so. Hardware errors can be broadly categorized into three classes =-=[12]-=-: transient, intermittent, and permanent. Transient errors are those that are due to the environment. At sea level, neutrons induced by cosmic rays are thought to be the primary culprit, but alpha par... |
8 |
Sur une propriete de la loi de Gauss.
- Marcinkiewicz
- 1938
(Show Context)
Citation Context ...) ·u(q, t)〉 = ( k 2pi )2 |u1(k)|2+ 2k 2 2pi5 ∫ d3q |u2(k,q− k)|2+ · · · and this is a very good attribute. 131 However, truncating the series at a finite order runs afoul of the Marcinkiewicz theorem =-=[29]-=- which says that the independent correlations of〈 u(x, t)u(x′, t) . . .u(xn, t(n)) 〉 can be truncated at n= 2 when only second order correlations < u(x, t)u(x′, t ′)> are independent, so the process i... |
5 | Data management and layout for shingled magnetic recording
- Amer, Holliday, et al.
- 2011
(Show Context)
Citation Context ...magnetic media since the field required to write the materials with a smaller domain size is difficult to contain and could affect adjacent domains. A proposal to do what is called shingled recording =-=[2]-=- is expected to yield an increase in density by a factor of 2–4, but at the cost of making the hard disk drive no longer a simple random access device. The expected density gain from shingled recordin... |
5 |
et al., “Liszt: a domain specific language for building portable mesh-based PDE solvers
- DeVito, Joubert, et al.
- 2011
(Show Context)
Citation Context ...opment), their use in the scientific world appears rare. One particularly compelling DSL for scientific applications (and one that was briefed to us) is the Liszt language for solving mesh-based PDEs =-=[19]-=-. Liszt is a mature language 109 with a proven compiler that generates efficient code for shared-memory multiprocessors, compute clusters, and GPUs. In Liszt, the application programmer specifies the ... |
3 |
Technologies for exascale systems
- Coteus, Knickerbocker, et al.
- 2011
(Show Context)
Citation Context ...bers (VCSEL/MMs). These VCSEL transceivers provide over 100,000 optical ports per system and facilitate the achievement of high aggregate bandwidth communications between racks and the central switch =-=[15]-=-. Extensions to exascale computing will necessitate more comprehensive intercommunications (racks, boards, modules, chips), a massive increase in the number of parallel optical links (≈ 100 million) w... |
3 |
Raj et al., “Highly-efficient thermally-tuned resonant optical filters
- Cunningham, Shubin, et al.
- 2010
(Show Context)
Citation Context ...y per device component (e.g. modulator) must be understood and accounted for. Finally, there may be some energy costs to stabilizing the performance or tuning the response of optical components (e.g. =-=[17]-=-). Ultimately, the choice of laser (optical source), its type (VCSEL1, DFB2), its efficiency and the efficiency of coupling on-chip will be a large determinant of the energy/bit for the optical links,... |
3 |
and Editors; Committee on Sustaining Growth in Computing Performance; National Research Council Lynette I. Millett. The Future of Computing Performance: Game Over or Next Level? The National Academies
- Fuller
- 2011
(Show Context)
Citation Context ...r way to processor capability, and so it was possible to provision one byte or more of avail25 Figure 4-5: A view of Moore’s law showing evolution of the number of cores on a processor (bottom curve) =-=[21]-=-. able memory for each flop of processing capability. At around the same time as the transition to multi-core architecture, the ratio of available bytes to flops began decreasing. This is shown in Fig... |
3 |
Wiener-Hermite Expansion Applied to Decaying Isotropic Turbulence Using a Re- normalized Time-Dependent Base
- Hogge, Meecham
- 1978
(Show Context)
Citation Context ...ith the specific nature of the noise in the original differential equations. If that noise is Gaussian then ρ(ξ ) = exp(−ξ 2) and the orthogonal functions are Hermite polynomials In papers by Meecham =-=[25]-=-, the orthogonality of the polynomials appears quite attractive as it guarantees that the energy, quadratic in the un(ξ , t) , is a sum of explicitly positive definite terms E(k) = k2 (2pi)5 ∫ d3q〈u(k... |
3 |
High performance computing and the implications of multi-core architectures
- Turek
- 2007
(Show Context)
Citation Context ... 27 Figure 4-7: Evolution of memory cost as a function of time. Also shown is the evolution of floating point cost for comparison. It is seen that floating point costs have decreased far more rapidly =-=[48]-=-. realistically, a memory size of 50 petabytes or so is envisioned. One might argue that improvements in technology will lead to higher memory densities and so the 4GB DRAM of today would evolve into ... |
2 | Selfadjusting two-failure tolerant disk arrays
- and, Schwarz, et al.
- 2010
(Show Context)
Citation Context ...rd disk drives. For example, we can carve the hard disk drive up into disklets, and through careful placement assure that no two disklets in the same RAID group are placed on the same hard disk drive =-=[14]-=-. A disklet layout is defined by an (almost) n-regular graph. We represent the assignment of disklets to disks by coloring the element (vertex or edge) with a color representing the disk. Not every co... |
2 |
25-Gb/s 6.5-pJ/bit 90-nm CMOS-Driven Multimode Optical Link
- Schow, Rylyakov, et al.
- 2011
(Show Context)
Citation Context ...n) with high reliability, low power dissipation and low cost. Recently, a 25Gb/s, 6.5 pJ/bit optical link was reported, utilizing 10Gb/s class VCSELs, with electronics built into a 90 nm CMOS process =-=[40]-=-. These researchers anticipate building on future developments in faster VCSELs, mandated by industry-wide development for 100Gb Ethernet applications. The feasibility of on-chip photonic integration ... |
1 |
Hardware issues. Presentation to JASON
- Ang
- 2012
(Show Context)
Citation Context ...e with 1000× the memory capacity of Roadrunner (even though such a system may be infeasible for other reasons, as described in Section 6). Recent trends show memory density doubling every three years =-=[3]-=-. By 2020, memory density will have increased by a factor of 212/3 = 16. Requiring 1000× the memory capacity of Roadrunner will require the number of 55 Figure 4-21: Socket count as function of time. ... |
1 |
Combustion. Presentation to JASON
- Bell
- 2012
(Show Context)
Citation Context ...r framework 110 is used at the ExaCT combustion co-design center to aid in auto-tuning, where they speed the design cycle by allowing quick exploration of the possibilities of a given hardware design =-=[5]-=-. Auto-tuning was also crucial in the Green Flash design process [51] discussed in Section 6.4. The auto-tuning process can be made easier through the use of a domainspecific language or compiler, whi... |
1 |
2012–2016 capital equipment and technology report for the hard disk drive industry
- Coughlin, Grochowski
- 2012
(Show Context)
Citation Context ...pacity of hard disk drives ranges from less than 300GB (for highend “Enterprise Class” storage) to 3 TB (for consumer-grade storage), with a relatively slow growth in areal density of 20–25% per year =-=[16]-=-. Previously, the storage industry had enjoyed some years of more than 100% growth in areal density per year, which accompanied the introduction of the Giant magnetoresistance (GMR) read head, but sub... |
1 |
al Dennard. Design of ion-implanted MOSFET’s with very small physical dimensions
- et
- 1974
(Show Context)
Citation Context ...ing hardware for exascale computation. 4.1 Evolution of Moore’s Law The exponential increase in computing capability has been enabled by two technological trends: Moore’s law [31] and Dennard scaling =-=[18]-=-. Moore’s law refers to the observation by GordonMoore that the number of transistors on a microprocessor essentially doubles every 18–24 months. Shown in Figure 4-1 is the number of transistors assoc... |
1 |
The future of MPI. Presentation to JASON
- Heroux
- 2012
(Show Context)
Citation Context ...ely, although it incurs a significant penalty in efficiency if used globally. There are some programming tools that allow the user to specify reliability levels for different parts of the computation =-=[24]-=-. It is important to 56 continue work on software-based mitigation measures to deal with intermittent system failures. However, it is also important that a systematic investigation be undertaken to un... |
1 |
Application of the wiener– hermite expansion to turbulence of moderate reynolds number
- Lee, Meecham, et al.
- 1982
(Show Context)
Citation Context ...prediction or climate modeling. 130 There is a thread of estimating uncertainty quantification (UQ) that appears often in DOE reports and projections going under the name of Wiener-Hermite expansions =-=[28]-=- or polynomial chaos [56]. The basic idea, due to Weiner in 1938 [52] is this: if we have a field u(x, t) satisfying some partial differential equation which contains uncertain elements such as transp... |
1 |
Power issues. Presentation to JASON
- Murphy
- 2012
(Show Context)
Citation Context ...ycle time (blue curve) is plotted over time and compared with DRAM access time (green dashed curve). The ratio of the two is also shown (red dashed curve). Note that over time the ratio has increased =-=[32]-=-. problem the “memory wall” or the “von Neumann bottleneck”, since it was von Neumann who first developed the idea of an independent processor that retrieved its instructions and data from a separate ... |
1 |
A silicon photonic microring link case study for high-bandwidth density low-power chip I/O. draft in preparation
- Ophir, Mines
- 2008
(Show Context)
Citation Context ...d switches) with low loss. Some of these concepts are shown in Figure 6-3. Typical silicon waveguide cross-sectional areas are 0.1 micron2 and the bending radii of waveguides are as small at 1 micron =-=[35]-=-. This scaling down in size of critical optical components makes possible higher density integration of photonic switches, links and modulators, but it is also subject to issues of crosstalk. The trad... |
1 |
and solutions for future main memory
- Challenges
- 2009
(Show Context)
Citation Context ...current discussion. The trend for memory bandwidth improvement, however, has been the subject of much less focus. Over the past decade, memory bandwidth to CPU has doubled approximately every 3 years =-=[38]-=-, and current indications are that the growth rate will decrease absent new developments. Making the optimistic assumption that memory bandwidth trends will continue (at a doubling every 3 years), and... |
1 |
Dawn/sequoia experience. Presentation to JASON
- Still
- 2012
(Show Context)
Citation Context ...dship. The blue curve indicates projects peak capability. The red region derates this due to memory limitations. The green curve further derates this performance based on memory bandwidth projections =-=[46]-=-. are now well into the exabyte range. The implications of the discussion above is that some calculations of interest to the DOE/NNSA mission will not be achievable without a larger memory capacity th... |
1 |
Programming tools. Presentation to JASON
- Vetter
- 2012
(Show Context)
Citation Context ...y 10% of integer instructions are used for actual integer computation. A much more detailed look is provided by the (very busy) graph shown in Figure 4-11 as developed by J. Vetter and his colleagues =-=[49]-=-. In this Figure, the 33 Figure 4-11: Instruction mix for DOE/NNSA applications [49] 34 Figure 4-12: Histograms of instruction mix for DOE/NNSA applications. [49] resource attributes of a number of ap... |
1 |
The Green Flash project. Presentation to JASON
- Wehner
- 2012
(Show Context)
Citation Context ... are not viewed as a credible path to exascale computing. We briefly describe an approach that is intermediate between general purpose and special purpose computers. We were briefed by Michael Wehner =-=[50]-=- on the Green Flash project at Berkeley. In this work the climate modeling application discussed in Section 5.1 was used as a target application and a reference hardware design was constructed. The cl... |
1 |
et al Williams. JEDEC server memory roadmap
- Dee
- 2011
(Show Context)
Citation Context ... is will be $5B. It is anticipated that while memory costs will decrease, extrapolations using the JEDEC memory roadmap still indicate a cost of perhaps $1 per gigabyte or more in the 2020 time-frame =-=[53]-=-. As a result, even in 2020 an exabyte of memory will cost on the order of $1B. Typical budgets for DOE/NNSA high end computer systems are on the order of $100–200M, and so using current memory techno... |
1 |
Exascale programming model challenges. Presentation to JASON
- Yelick
- 2012
(Show Context)
Citation Context ...apable of a peak floating point rate of 4 teraflop/sec. This corresponds to an efficiency of only 1.5%! Yelick and her colleagues at Berkeley have applied the roof-line model to DOE/NNSA applications =-=[57]-=-. In Figure 4-16, we show the roof-line curve for the Intel Xeon 550, a commercial processor used in workstations and servers. The Xeon processor has a peak speed of about 256 gigaflops per second usi... |