Results 1 - 10
of
65
Implementation and performance of Munin
- IN PROCEEDINGS OF THE 13TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES
, 1991
"... Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, sha ..."
Abstract
-
Cited by 587 (22 self)
- Add to MetaCart
to keep memory consistent. Munin's multiprotocol release consistency is implemented in software using a delayed update queue that buffers and merges pending outgoing writes. A sixteen-processor prototype of Munin is currently operational. We evaluate its implementation and describe the execution
Evaluation of Release Consistent Software Distributed Shared Memory on Emerging Network Technology
"... We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidat ..."
Abstract
-
Cited by 467 (43 self)
- Add to MetaCart
independent of the protocol used. Medium-grained applications, such as Water, can achieve good performance, but the choice of protocol is critical. For sixteen processors, the best protocol, lazy hybrid, performed more than three times better than the worst, the eager update. Fine-grained applications
Digital Beam Former Architecture for Sixteen Elements Planar Phased Array Radar
"... Abstract- Beam forming is a signal processing technique used in antenna arrays for directional signal transmission or reception. Phased array radar is very important in modern radar development, and multiple digital beams forming technology is the most significant technology in phased array radar. D ..."
Abstract
- Add to MetaCart
. Digital multiple beam forming on each antenna element about large phased array radar is impossible in processor based digital processing units, because it needs simultaneous processing many A/D channels. This paper describes architecture for a digital beam former developed for 16 element phased array
Design and Analysis of a Many-Core Processor Architecture for Multimedia Applications
"... Abstract-We present a design of many-core processor architecture with superior cost-effectiveness to fulfill the rapid increasing demand of high-speed embedded multimedia applications. The prototype platform consists of sixteen processor cores and a 4-by-4 mesh-based duplex network interconnection ..."
Abstract
- Add to MetaCart
Abstract-We present a design of many-core processor architecture with superior cost-effectiveness to fulfill the rapid increasing demand of high-speed embedded multimedia applications. The prototype platform consists of sixteen processor cores and a 4-by-4 mesh-based duplex network interconnection
The Hardware Architecture and Linear Expansion of Tandem NonStop Systems
- Proc. 12th Int. Conf. Computer Architecture
, 1985
"... The Tandem NonStop TXP is a commercially available multiple processor system that delivers mainframe class performance for transaction processing applications. Several sixteen-processor systems may be configured in a ring structure using fiber optics. This structure allows from two to over two hundr ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The Tandem NonStop TXP is a commercially available multiple processor system that delivers mainframe class performance for transaction processing applications. Several sixteen-processor systems may be configured in a ring structure using fiber optics. This structure allows from two to over two
Scientific Computing Research Environments for the Mathematica Sciences
, 2001
"... This report describes the research projects and accomplishments made possible through the availability of the sixteen processor SGI Origin 2000, purchased in parts with the funds from NSF SCREMS grant NSF 98-72009. To date the SGI Origin 2000 has served as the main computing facility in many inte ..."
Abstract
- Add to MetaCart
This report describes the research projects and accomplishments made possible through the availability of the sixteen processor SGI Origin 2000, purchased in parts with the funds from NSF SCREMS grant NSF 98-72009. To date the SGI Origin 2000 has served as the main computing facility in many
An algorithm-by-blocks for SuperMatrix band Cholesky factorization
- In VECPAR ’08: Proceedings of the Eighth International Meeting on High Performance Computing for Computational Science
"... Abstract. We pursue the scalable parallel implementation of the factorization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large number of fine-grained operations exposing a higher degree of parallelism. Th ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
. The SuperMatrix run-time system allows an out-of-order scheduling of operations that is transparent to the programmer. Experimental results for the Cholesky factorization of band matrices on two parallel platforms with sixteen processors demonstrate the scalability of the solution.
Utilizing Memory Bandwidth in DSP Embedded Processors
, 2001
"... This paper presents a network flow approach to solving the register binding and allocation problem for multiword memory access DSP processors. In recently announced DSP processors, such as Star*core, sixteen bit instructions which simultaneously access four words from memory are supported. A pol ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents a network flow approach to solving the register binding and allocation problem for multiword memory access DSP processors. In recently announced DSP processors, such as Star*core, sixteen bit instructions which simultaneously access four words from memory are supported. A
Optical logic array processor using shadowgrams
- J. Opt. Soc. Am
, 1983
"... On the basis of a lensless shadow-casting technique, a new, simple method of optically implementing digital logic gates has been developed. These gates are capable of performing a complete set of logical operations on a large array of binary variables in parallel, i.e., the pattern logics. A light-e ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
-emitting diode (LED) array is used as an inco-herent light source in the lensless shadow-casting system. Sixteen possible functions of two binary variables are simply realizable with these gates in parallel by controlling the switching modes of the LED's. Experimental re-sults demonstrate the feasibility
A Comparative Evaluation of Parallel Garbage Collector Implementations
, 2001
"... While uniprocessor garbage collection is relatively well understood, experience with collectors for large multiprocessor servers is limited and it is unknown which techniques best scale with large memories and large numbers of processors. In order to explore these issues we designed a modular gar ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
as how little memory they can run them in. All of our collectors scale linearly up to sixteen processors. The least memory is usually required by the hybrid mark-sweep collector that uses a copying collector for its nursery, although sometimes the non-generational mark-sweep collector requires less
Results 1 - 10
of
65