77 citations found. Retrieving documents...
G. Bell. Ultracomputers A teraflop beforeits time. Comm. of the ACM, 35(8):26--47, Augus 1992.

 Home/Search   Document Not in Database   Summary   ACM   TOC   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Scalable Resource Management in High Performance.. - Frachtenberg, Petrini.. (2001)   (Correct)

....packet, as long as the multicast set is physically contiguous. For a multicast packet to be successfully delivered, a positive acknowledgment must be received from all the recipients of the multicast group. The Elite switches combine the acknowledgments, as pioneered by the NYU Ultracomputer [4] [24] returning a single one to the source. Acknowledgments are combined in a way that the worst ack wins (a network error wins over an unsuccessful transaction, which on its turn wins over a successful one) returning a positive ack only when all the partners in the collective communication ....

G. Bell. Ultracomputer: a Teraflop before its time. Communications of the ACM, 35(8):27--47, 1992.


The Homotopy Method Applied to the Symmetric Eigenproblem - Oettli (1995)   (1 citation)  (Correct)

....mid 1970s was capable of 130 Mflops (million floating point operations per second) The eight processor Cray Y MP 8 of the late 1980s was already capable of 2. 8 Gflops peak performance [58, 37] And now the race is on to increase both memories and speeds of computers to the teraflop (Tflops) level [12]. This is not likely to be solved without highly parallel computers. This development is driven not least by the wish to solve very large scientific and engineering problems in order to enable technological progress. Some of these grand challenges of the 1990s are, for instance, more precise ....

G. Bell. Ultracomputers -- a teraflop before its time. Comm. ACM, 35:27--47, 1992. 93


A Model for Synchronized Distributed Memory Machines - James Carrig Jr   (Correct)

....the scalability of the algorithm on the target machine. The practicality of using computational models has led to the development and use of many different models for algorithm design. PRAM models have been used extensively to simulate shared memory machines [6, 9] Many people, including Bell [2] and Dongarra, Duff, Sorensen, and van der Vorst [5] have noted that the trend in parallel computing is to build distributed memory machines which emulate shared memory operation. Although these machines may look like shared memory machines to the user, Cypher and Sanz [4] note that they do not ....

Bell, G. Ultracomputers a teraflop before its time. Communications of the ACM 33, 8 (August 1992), 27--47. 14


Using Smalltalk for Wait-Free Implementation of.. - Scratchley (1993)   (1 citation)  (Correct)

....want to operate on an object simultaneously. The solution to consistency problems must be chosen very carefully for highly concurrent objects where the number of such processes can order a thousand or more. With major computer manufacturers soon to deliver massively parallel computers [Cahur92] [Bell92], such questions concerning concurrent objects become very important. The outline of the paper is this: in section 2 I will discuss the conventional approach to prevent concurrent object operations from mutually interfering and the drawbacks of the approach. In section 3 I will outline a ....

G. Bell, "Ultracomputers: a Teraflop Before its Time," Communications of the ACM, Volume 35, Number 8, August 1992, pp. 26 - 47.


An Integrated Software Development Model for.. - Parashar, Hariri.. (1993)   (Correct)

....magnitude and diversity would require a general, cost effective, scalable, yet powerful computing model which will be able to efficiently support its varied computational and communication requirement. It is this realization that has spurred intense research in heterogeneous computing environments [2, 3, 4, 5, 6, 1, 7, 8]. We believe that the future of parallel computing lies in the integration of the plethora of specialized architectures into a single Heterogeneous High Performance Computing (HHPC) environment that allows them to cooperate in solving complex problems (Figure 1) The HHPC environment will ....

....Software development in any Parallel Distributed environment is a non trivial process and requires a thorough understanding of the application and the architecture. This apparent from the fact that, applications are currently able to achieve only a fraction of peak available performance [7, 1]. The percentage of the peak performance achieved by standard parallel benchmarks on current parallel distributed systems is Northeast Parallel Architectures Center ffl Syracuse University Science and Technology Center ffl 111 College Place ffl Syracuse, NY 13244 4100 Tel: 315) 443 1722, 1723; ....

Gordon Bell, "Ultracomputers: A Teraflop Before Its Time", Communications of the ACM, vol. 35, pp. 27--47, aug 1992.


The Cranium Network Interface Architecture: Support for Message.. - McKenzie (1997)   (Correct)

....pins in the processor. However, cache bus connected designs are difficult to implement and offer very limited support for message passing primitives. As a result, very few network interfaces connect through the cache bus. One notable example is the architecture of the Kendall Square Research KSR 1 [33]. The design is based on a principle called COMA, meaning Cache Only Memory Access (also known as ALLCACHE tm ) In essence, all memory is cache and there is no main memory per se. Memory bus connected network interfaces In most scalable parallel computers the network interface is located at ....

....memory model is popular for scientific computing systems. There are many examples of multicomputers that use this communication model, including Cray Research T3D and T3E [19, 20, 21] Fujitsu AP1000 [48, 49] Stanford DASH [45] 37 Tera MTA 1 [46] and Kendall Square Research KSR 1 and KSR 2 [33]. Under the remote memory communication model (also known as remote load store, put get, shared memory or non uniform memory access) processors access remote memory locations directly using load and store operations. Remote memory is actually two communication models: a remote load model and a ....

Gordon Bell. Ultracomputers: a Teraflop Before Its Time. Communications of the ACM 35(8), August 1992, pp. 26-47.


The Performance Impact of False Subpage Sharing in KSR1 - Cukic, Bastani (1995)   (1 citation)  (Correct)

....performance and a familiar programming environment. Unfortunately, it is difficult to achieve both of these objectives simultaneously. Shared memory machines with a single global memory, often known as Uniform Memory Access (UMA) multiprocessors, have failed to scale beyond a few dozen processors [1]. Newer machines, such as the KSR1, have a global logical memory space, but the memorymodules are physically distributed around the system. As a result, the time to access the memory This material is based in part upon work supported by the Texas Advanced Research Program under Grant No. ....

....supported by the NSF Grand Challenge Grant No. ASC 9217374. 1 ALLCACHE is a trademark of Kendall Square Research Corporation depends on the position of the memory module and the accessing processor, creating what is called a NonUniform Memory Access (NUMA) multiprocessor [2] In a 1992 article [1], Gordon Bell stated: KSR machine is most likely the blueprint for future scalable massively parallel computers. The logical gap between the decentralized hardware and the shared memory programming paradigm is solved through the ALLCACHE TM memory scheme which dynamically binds ....

G. Bell, " Ultracomputers: a Teraflop Before Its Time," Communications of the ACM, Vol. 35, No. 8, August 1992.


Architecture and Performance of the Mether Network Shared.. - Shaffer, Minnich, Smith   (Correct)

....Distributed Programming Paradigms A distributed system[38] 15] is a group of computers cooperating with each other to achieve some goal. These computers are autonomous, in that each computer has an independent flow of control. We assume there is no physical sharing of memory among computers[21] [3]. Processes running on different computers have distinct address spaces. They communicate by sending and receiving data encapsulated as messages. Message passing primitives are then used by applications to communicate with cooperating computers. One necessary characteristic of cooperation is some ....

Gordon Bell. Ultracomputers: A teraflop before its time. Communications of the ACM, 35(8):27--47, August 1992.


Dump: Competitive Distributed Paging - Awerbuch, Bartal, Fiat (1993)   (5 citations)  (Correct)

....of the local caches. In the context of parallel multiprocessors, this setting is referred to as Cache Only Memory Architecture (COMA) HALH91] and it has has been employed in a number of recently developed parallel machines such as the Data Diffusion Machine [HALH91] and in the new KSR1 machine [Bel92] where it is referred to as AllCache Engine. This setting also corresponds to that of a homogeneous distributed file server, comprised of a collection of disk less workstations. Previous results. Some systems work in a related setting has been reported in [LH86] In theory community, this ....

Gordon Bell. Ultracomputers: A Teraflop before its time. Comm. of the ACM, 35(8), 1992.


System Software and Software Systems: Concepts And Methodology - .. - Rus, Rus   (Correct)

.... concepts of scalable computer and computing continuum are introduced and they are illustrated with the diagram of Encore Multimax and Encore Computing continuum as defined by[9] A good tutorial providing further readings about high performance computer systems and scalable machines is provided in [10]. 156 Hardware system 156 Hardware system Chapter 3 Process and resource representation 3.1 Processes and resources In order to organize the collection of programs of a computer system as a software system, we first review the hardware system from the viewpoint of the services it provides to ....

....of the active process. An example is provided by the simulation of the context switch operation for the Mix machine. This simulation is expressed by the C function called SwitchMixContext which in turn uses the generic function SwitchContext. SwitchMixContext( f int i; struct MixPCB RPCB, PCB[10], ToPCB; struct MixProcess MixPDS; struct MixProcess MixHead, MixTail; MixPDS = MakeList(MixHead, MixTail) for (i = 0; i 10; i ) Create 10 MixPCB s f ToPCB = MakeProc( PCB[i] Append (MixPDS, ToPCB) g RPCB = init (MixPDS) Initialize RPCB for (i = 0; i 100; i ) ....

C. Gordon Bell. Ultracomputers a teraflop before its time. Communications of the ACM, 35(8):27--47, August 1992.


Techniques in Computational Stochastic Dynamic Programming - Hanson (1996)   (1 citation)  (Correct)

.... massively parallel processors,machine performance has gone from megaflops (millions of floating point operations per seconds) to gigaflops (billions of floating point operations per second) and heading towards the ultracomputing goal of teraflops (trillions of floating point operations per second) [7]. Supercomputers have major differences in architecture. However, each compiler uses some variant of Fortran 90 [71, 72] so that many code optimizations are portable from one machine to the next. Vectorization can be viewed as a basic form of parallelism implemented by pipelining and so shares ....

....thrust in the future will be implementation on a wide range of architecture to maintain portability and avoidance of over reliance on machines currently under development that will not survive the high performance computing environment. Getting access to the current generation of ultracomputers [7], such as the Cray C90, CM 5 and Intel Paragon, is essential for solving large scale computing problems. The largest problem that we have computed is 6 states with 16 nodes per state, using about 60MW double precision memory with a total of 1M nodes (i.e. one million discrete states) A dedicated ....

G. Bell, "Ultracomputers: A Teraflop Before Its Time," Communications of ACM 35 (8), pp. 26-47 (1992).


Locality And Loop Scheduling On Numa Multiprocessors - Hui Li Sudarsan (1993)   (9 citations)  (Correct)

....Section 3 describes data locality and why it is an important factor that cannot be neglected in loop scheduling algorithms. In particular, we argue that data locality is important even in systems with hardwarebased cache coherence and in cache only memory architectures (COMA) such as the KSR [1]. The locality based dynamic scheduling (LDS) algorithm we propose in this paper is described in Section 4, and compared against the affinity scheduling algorithm developed at the University of Rochester, the only other loop scheduling algorithm we are aware of that also takes memory access ....

Gordon Bell. Ultracomputers: A teraflop before its time. CACM, 35(8):27--47, August 1992.


On The Implementation And Effectiveness Of Autoscheduling For.. - Moreira (1995)   (16 citations)  (Correct)

....communicate by passing messages explicitly. For this reason, private memory multiprocessors are commonly known as message passing architectures. Some authors also reserve the name multiprocessor for shared address space MIMD computers, and use the term multicomputer for private memory computers [9]. It is worth noting that shared memory and message passing are architectural features because, following the definition by Lorin [10] they define a machine s programming model and rules for program correctness. Shared memory and message passing can also be organizational features, that is, ....

....and obtain the address trace. We then run the address trace through the network simulation and obtain the following results: Latency[ 0] 0 Latency[ 1] 0 Latency[ 2] 0 Latency[ 3] 0 Latency[ 4] 0 Latency[ 5] 0 Latency[ 6] 0 Latency[ 7] 100215 Latency[ 8] 103453 Latency[ 9] = 70088 Latency[ 10] 41828 Latency[ 11] 23233 Latency[ 12] 11318 Latency[ 13] 5380 Latency[ 14] 2374 Latency[ 15] 906 Latency[ 16] 396 Latency[ 17] 86 Latency[ 18] 46 280 Latency[ 19] 3673 Total = 362996 Average = 8.718440 We note that the ....

[Article contains additional citation context not shown here]

G. Bell, "Ultracomputers: A teraflop before its time," Communications of the ACM, vol. 35, August 1992.


Analysis of A Scalable, All-Optical Interconnection Network For.. - Jones (1998)   (Correct)

....complexity (such as a low node degree, thus low cost and ease of implementation) relatively small diameter for such a large number of PNs, a high degree of scalability and expandability and most importantly, efficient support for both local and remote communications. Recent studies [9, 10] have shown that efficient implementation of local communications (spatial locality) is a fundamental requirement for interconnection networks since PNs engage in data transfers more frequently with nearby neighbors than with more distant PNs. We should note that the diameter of a network remains ....

....considered. System performance is analyzed in Chapter 8 in terms of scalability, message delay and throughput, node complexity, OPB and BER. Chapter 9 discusses dynamic channel allocation (DCA) and the conclusions are presented in Chapter 10. 13 CHAPTER 3 GENERAL DESCRIPTION It has been shown [3, 9, 10, 28] that a PN engages in data transfer more frequently with nearby neighbors (local communication) than with more distant nodes (remote communications) In many applications, nearby neighbors are the only destinations for interprocessor communications. In image processing, for example, ....

[Article contains additional citation context not shown here]

G. Bell, "Ultracomputers: A Teraflop Before Its Time," Communication of the ACM, vol. 35, pp. 27 -- 47, August 1992.


Hierarchical Optical Ring INterconnection (HORN): A Scalable.. - Louri, Gupta   (Correct)

....routing, diameter, link complexity, fault tolerance, and an example of a multiple access protocol. 7 2. 1 Definition of HORN It has been shown that a PE engages in data transfer more frequently with nearby neighbors (local communication) than with more distant nodes (remote communications) [2, 18]. Therefore, the interconnection topology must be designed so that it can efficiently support local data transfers (spatial locality) This emphasis has led us to consider a hierarchical interconnection network topology in which the lower level network supports local communications very ....

Gordon Bell, "Ultracomputers: A teraflop before its time." Communication of the ACM, v. 35 , n. 8, pp. 27-47, August 1992.


A Scalable Optical Hypercube-based Interconnection Network for.. - Louri, al. (1994)   (2 citations)  (Correct)

.... per second) supercomputers combined with the launching of the High Performance Computing and Communication (HPCC) initiative is putting major emphasis on exploiting massive parallelism with greater than one thousand processing elements networked to form massively parallel computers (Ultracomputers)[1, 2]. A key element, and deciding factor in terms of performance and cost of these computers is the interconnection network[3] The interconnection network for massively parallel computers must not only be adequate in terms of communication bandwidth, latency, and connectivity but it must also be ....

....network for massively parallel computers must not only be adequate in terms of communication bandwidth, latency, and connectivity but it must also be modular and scalable. Scalability of a network consists of two aspects; size scalability and generation scalability (or timescalability) [2]. Size scalability refers to the property that the size of the network (e.g. the number of communicating nodes) can be increased with nominal change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance comparable to the increasing ....

G. Bell, "Ultracomputers: A Teraflop Before Its Time," Communications of the ACM, vol. 35, pp. 27--47, Aug 1992.


An Efficient 3D Optical Implementation of Binary de Bruijn.. - Louri, Sung   (Correct)

....mesh network also suffers from a major limitation which is its large diameter (N 1=2 for an N node network) along with its limited connectivity. The recent quest for massively parallel computing systems is placing a major emphasis on scalable networks with small diameters and bounded node degrees[14]. As an alternative to the hypercube and the mesh topologies, the de Bruijn topology[15, 16] has recently been receiving much attention. Its properties and applications have been studied by several researchers[2, 17, 18, 19, 20] Its topological properties show that the de Bruijn network is a good ....

G. Bell, "Ultracomputers: A Teraflop Before Its Time," Communications of the ACM, vol. 35, pp. 27--47, Aug 1992.


Optimal Parallel Sorting in Multi-Level Storage - Alok Aggarwal Ibm (1994)   (21 citations)  (Correct)

....amount of data at each processor (and thus a negligible memory hierarchy at each processor) such applications are probably closer to the exception than the rule. The coming generation of tera computers can be expected to consist of thousands of processors, each with its own multi gigabyte storage [6]. Thus, an extension of sequential multi level storage models to the parallel domain would seem to be well motivated. In fact, Vitter and Shriver [20] Nodine and Vitter [17] and Vitter and Nodine [19] have proposed just such a series of extensions, and have examined the complexity of sorting in ....

G. Bell. Ultracomputers: A teraflop before its time. Communications of the ACM, 35(8):26--47, 1992.


High Performance Computing: Crays, Clusters, and Centers. What .. - Gordon Bell And (2001)   (3 citations)  Self-citation (Bell)   (Correct)

....of the ACM. Copyright may be transferred without further notice and the publisher may then post the accepted version. A version of this article appears at http: research.microsoft.com pubs 2 High Performance Computing: Crays, Clusters, and Centers. What Next Gordon Bell and Jim Gray GBell, Gray Microsoft.com Bay Area Research Center Microsoft Research August 2001 Abstract : After 50 years of building high performance scientific computers, two major architectures exist: 1) clusters of Cray style vector supercomputers; 2) clusters of scalar uni and multi processors. ....

Bell, G., "Ultracomputers: A Teraflop Before Its Time", Communications of the ACM, Vol. 35, No. 8, August 1992, pp 27-45.


Computing with Faulty Shared Objects - Afek, Greenberg, Merritt.. (1995)   (17 citations)  Self-citation (Bell)   (Correct)

....system functions and hardware remove from the user the burden of coordinating the accesses to the shared data. One such system has been proposed, implemented and analyzed in [LH89] The KSR, DASH, and Alewife machines are a few examples of machines that use such distributed shared memory [Bel92] Another, more software oriented, approach is implemented in the the Linda system [CG89] In Linda, an abstract tuple space is shared (instead of cache lines or pages) and operations are available to insert and delete tuples. Obviously, the choice of one of these methods of implementing sharing ....

G. Bell. Ultracomputers: A teraflop before its time. CACM, 35(8):27--47, August 1992.


Increasing Perfect Nests in Scientific Programs - Tarek Abdelrahman And   (Correct)

No context found.

G. Bell. Ultracomputers A teraflop beforeits time. Comm. of the ACM, 35(8):26--47, Augus 1992.


Objects Shared by Byzantine Processes - Malkhi, Merritt, Reiter, Taubenfeld (2003)   (4 citations)  (Correct)

No context found.

G. Bell. Ultracomputers: A teraflop before its time. Communications of the ACM 35(8): 27--47, 1992


Software---Practice And Experience, Vol. 24(8).. -..   (Correct)

No context found.

G. Bell, `Ultracomputers, a teraflop before its time', Communications of the ACM, 35, (8), 27--47 (1992).


The Queue-Read Queue-Write PRAM Model: Accounting for.. - Gibbons, al. (1996)   (6 citations)  (Correct)

No context found.

G. Bell. Ultracomputers: A teraflop before its time. Communications of the ACM, 35(8):26--47, 1992.


Models and Resource Metrics for Parallel and Distributed.. - Li, Mills, Reif (1989)   (12 citations)  (Correct)

No context found.

G. Bell, "Ultracomputers : A teraflop before its time," Communications of the ACM, vol. 35, no. 8, pp. 26--47, 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC