Results 1 - 10
of
15
The Landscape of Parallel Computing Research: A View from Berkeley
- TECHNICAL REPORT, UC BERKELEY
, 2006
"... All rights reserved. ..."
GARNET: a detailed onchip network model inside a full-system simulator
- in Proceedings of the International Symposium on Performance Analysis of Systems and Software, 2009
"... Until very recently, microprocessor designs were computation-centric. On-chip communication was frequently ignored. This was because of fast, single-cycle on-chip communication. The interconnect power was also insignificant compared to the transistor power. With uniprocessor designs providing dimini ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Until very recently, microprocessor designs were computation-centric. On-chip communication was frequently ignored. This was because of fast, single-cycle on-chip communication. The interconnect power was also insignificant compared to the transistor power. With uniprocessor designs providing diminishing returns and the advent of chip multiprocessors (CMPs) in mainstream systems, the on-chip network that connects different processing cores has become a critical part of the design. Transistor miniaturization has led to high global wire delay, and interconnect power comparable to transistor power. CMP design proposals can no longer ignore the interaction between the memory hierarchy and the interconnection network that connects various elements. This necessitates a detailed and accurate
Computer Science Education in the 21st Century
- Communications of the ACM
, 2006
"... Whereas in the past we created obstacles to reduce the number of CS majors, today we must recruit students to have the workforce we need to meet the challenges and opportunities of information technology in this century. We should take advantage of the reduced pressures from the dip in enrollments t ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Whereas in the past we created obstacles to reduce the number of CS majors, today we must recruit students to have the workforce we need to meet the challenges and opportunities of information technology in this century. We should take advantage of the reduced pressures from the dip in enrollments to revamp our curriculum. First, let’s start with the state of the world in 2006 rather than 1976. Second, let’s create courses that we would love to take if we were students, and that we would love to teach if given the chance. Such enthusiasm would be attractive and contagious. Rather than wax on philosophically, I’ll confine myself to four concrete suggestions: two technological upgrades and two examples of courses that that I would love to take and teach. All leverage technology not available when our curriculum was first created. Technological upgrade for 21 st century CS: Use Tools and Libraries There is a huge disconnect between the experience of most professors, who have never worked as professional programmers and often write software for a 30-year old environment, and the way in which cutting-edge software is written today. For example, although many professors use a more recent programming language like
BEE3: Revitalizing Computer Architecture Research
, 2009
"... In recent years, advances in computer architecture have slowed dramatically with most simulation results demonstrating only incremental architectural innovation. This is further exacerbated by increased processor and system complexity spurred by a seemingly unlimited number of transistors at compute ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In recent years, advances in computer architecture have slowed dramatically with most simulation results demonstrating only incremental architectural innovation. This is further exacerbated by increased processor and system complexity spurred by a seemingly unlimited number of transistors at computer architect’s disposal. Computer architects produce a myopic view of their systems through the lens of slow, highly-detailed software simulation or fast, coarse-grained software simulation, with fidelity always in question. By leveraging silicon technology scaling in Field Programmable Gate Arrays (FPGAs), hardware can be used to accelerate simulation, emulation, or prototyping of systems. Furthermore, because the base components are reconfigurable, the same system can be used for a variety of research projects, amortizing the cost, both in dollars and in learning time. In this paper, we present the third generation of the Berkeley Emulation Engine or BEE3 system. We demonstrate a new collaboration methodology between academia and industry and compare the industrial and academic system design process. The BEE3 is a production multi-FPGA system with up to 64 GB of DRAM and several I/O subsystems that can be used to enable faster, larger and higher fidelity computer architecture or other systems research. Using a widely available hardware platform also facilitates a software community that can generate and share software modules, thereby enabling rapid system development for computer architecture research. 1
ProtoFlex: FPGA-accelerated Hybrid Functional Simulation
, 2007
"... Abstract. PROTOFLEX is an FPGA-accelerated hybrid simulation/emulation platform designed to support large-scale multiprocessor hardware and software research. Unlike prior attempts at FPGA multiprocessor system emulators, PROTOFLEX emulates full-system fidelity—i.e., runs stock commercial operating ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. PROTOFLEX is an FPGA-accelerated hybrid simulation/emulation platform designed to support large-scale multiprocessor hardware and software research. Unlike prior attempts at FPGA multiprocessor system emulators, PROTOFLEX emulates full-system fidelity—i.e., runs stock commercial operating systems with I/O support. This is accomplished without undue effort by leveraging a hybrid emulation technique called transplanting. Our transplant technology uses FPGAs to accelerate only common-case behaviors while relegating infrequent, complex behaviors (e.g., I/O devices) to software simulation. By working in concert with existing full-system simulators, transplanting avoids the costly and unnecessary construction of the entire target system in FPGA. We report preliminary findings from a working hybrid PROTOFLEX emulator of an UltraSPARC workstation running Solaris 8. We have also started developing a novel multiprocessor emulation approach that interleaves the execution of many (10s to 100s) processor contexts onto a shared emulation engine. This approach decouples the scale and complexity of the FPGA host from the simulated system size but nevertheless enables us to scale the desired emulation performance by the number of emulation engines used. Together, the transplant and interleaving techniques will enable us to develop full-system FPGA emulators of up to thousands of processors without an overwhelming development effort. 1.
ATLAS: A Chip-Multiprocessor with Transactional Memory Support
"... Chip-multiprocessors are quickly becoming popular in embedded systems. However, the practical success of CMPs strongly depends on addressing the difficulty of multithreaded application development for such systems. Transactional Memory (T M) promises to simplify concurrency management in multithread ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Chip-multiprocessors are quickly becoming popular in embedded systems. However, the practical success of CMPs strongly depends on addressing the difficulty of multithreaded application development for such systems. Transactional Memory (T M) promises to simplify concurrency management in multithreaded applications by allowing programmers to specify coarse-grain parallel tasks, while achieving performance comparable to fine-grain lock-based applications. This paper presents AT LAS, the first prototype of a CMP with hardware support for transactional memory. AT-LAS includes 8 embedded PowerPC cores that access coherent shared memory in a transactional manner. The data cache for each core is modified to support the speculative buffering and conflict detection necessary for transactional execution. We have mapped ATLAS to the BEE2 multi-FPGA board to create a full-system prototype that operates at 100MHz, boots Linux, and provides significant performance and ease-of-use benefits for a range of parallel applications. Overall, the ATLAS prototype provides an excellent framework for further research on the software and hardware techniques necessary to deliver on the potential of transactional memory. 1
An Infrastructure for HW/SW Partitioning and Synthesis of Architectural Simulators
"... Many researchers are interested in using FPGAs to accelerate architectural simulation. Partitioning of the simulator between hardware and software is an important problem which has not been explored because of the enormous effort required to develop different RTL and communication infrastructure for ..."
Abstract
- Add to MetaCart
Many researchers are interested in using FPGAs to accelerate architectural simulation. Partitioning of the simulator between hardware and software is an important problem which has not been explored because of the enormous effort required to develop different RTL and communication infrastructure for each potential partition. We are developing a hybrid HW/SW simulation infrastructure which will provide tools for partitioning architectural simulators and synthesizing RTL for the hardware portions. This infrastructure will allow the community to explore and understand the partitioning problem and will eventually lead to automated partitioning algorithms.
Building and Using the ATLAS Transactional Memory System
- in Proceedings of the Workshop on Architecture Research using FPGA Platforms, held at HPCA12. 2006
, 2006
"... this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA) or the U.S. Government ..."
Abstract
- Add to MetaCart
this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA) or the U.S. Government
Saint Louis UniversityEmpirical Performance Assessment Using Soft-Core Processors on Reconfigurable Hardware ∗
"... Simulation has been the de facto standard method for performance evaluation of newly proposed ideas in computer architecture for many years. While simulation allows for theoretically arbitrary fidelity (at least to the level of cycle accuracy) as well as the ability to monitor the architecture witho ..."
Abstract
- Add to MetaCart
Simulation has been the de facto standard method for performance evaluation of newly proposed ideas in computer architecture for many years. While simulation allows for theoretically arbitrary fidelity (at least to the level of cycle accuracy) as well as the ability to monitor the architecture without perturbing the execution itself, it suffers from low effective fidelity and long execution times. We (and others) have advocated the use of empirical experimentation on reconfigurable hardware for computer architecture performance assessment. In this paper, we describe an empirical performance assessment subsystem implemented in reconfigurable hardware and illustrate its use. Results are presented that demonstrate the need for the types of performance assessment that reconfigurable hardware can provide.

