16 citations found. Retrieving documents...
Hiroaki Ishihata, Takeshi Horie, Satoshi Inano, Toshiyuki Shimizu, and Sadayuki Kato. An architecture of highly parallel computer AP1000. In 133 Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, pages 13--16, May 9--10, 1991. Available from ftp:// fcapwide.fujitsu.co.jp/ap1000/english/rim/rim 91.ps.Z.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
The Impact of Message Traffic on Multicomputer Memory Hierarchy.. - Pakin (1995)   (2 citations)  (Correct)

....memory to minimize cache pollution, and short messages can be transmitted using the cache to minimize latency. Two examples of architectures using a cache and memory network interface are the AP1000 and Alewife. The AP1000 computer shares with the CM 5 the goal of low latency communication [24]. It, too, logically connects the network interface and the cache, but in a substantially different manner from the CM 5 (Figure 2.4) Primarily, the AP1000 logically connects the network interface to both the cache and primary memory. This organization alleviates cache pollution The T9000 can be ....

Hiroaki Ishihata, Takeshi Horie, Satoshi Inano, Toshiyuki Shimizu, and Sadayuki Kato. An architecture of highly parallel computer AP1000. In Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, pages 13-- 16, May 1991. Available from ftp://fcapwide.fujitsu.co.jp/ap1000/english/rim/ rim 91.ps.Z.


Performance of the Hough transform on a distributed.. - Underhill..   (Correct)

....We have implemented four versions of the parallel Hough Transform on the AP1000 multiprocessor using paraML. In this section, we describe the architecture of the Fujitsu AP1000 multiprocessor system and functionalities of paraML which are of interest to this work. 2.1. The AP1000 The AP1000 [16], originally known as CAP (cellular array processor) was developed by Fujitsu as a research machine and is thus not yet in commercial production. The AP1000 (see Fig. 1) is a powerful, highly parallel, distributed memory, scalable computer. It can contain between 16 and 1024 processing elements, ....

H. Ishihata, T. Horie, S. Inano, T. Shimizu, S. Kato, An architecture of highly parallel computer AP1000, IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, May 1991, pp. 13--16.


ABCL/onEM-4: A New Software/Hardware Architecture for.. - Yasugi, Matsuoka..   (Correct)

....followed by a request message send to the created object and a reply reception from the object, for the clock speed of 12.5 MHz. Compared to our other implementation work of an OOCP language on a more conventional multicomputer without provisions for concurrent OO style computing (Fujitsu AP1000[9], a 512 node multicomputer based on SPARC chips) we have been able to achieve an order of magnitude improvement in inter node message passing latency (approximately 35 seconds vs. a few seconds) Even compared to the Cosmos JMachine [5] which is highly optimized for concurrentOO computation, ....

....has proven to be extremely well suited for concurrent OO computation when combined with our software architectural technologies. As we have indicated, compared to our other implementation work of ABCL on a more conventional multicomputer based on conventional SPARC chips (Fujitsu AP1000[9]) we have been able to achieve an order of magnitude improvement in inter node message passing speed (approximately 35 seconds vs. a few seconds) We have also been able to surpass another concurrent OO software hardware architecture based on microcodes by two orders of magnitude. The principle ....

Hiroaki Ishihata, Takeshi Horie, Satoshi Inano, Toshiyuki Shimizu, and Sadayuki Kato. An architecture of highly parallel computer AP1000. In IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 13--16, May 1991.


Music Understanding At The Beat Level Real-time Beat Tracking.. - Goto, Muraoka (1997)   (11 citations)  (Correct)

....perform a computationally intensive task such as processing and understanding complex audio signals in real time, parallel processing provides a practical and realizable solution. BTS has been implemented on a distributed memory parallel computer, the Fujitsu AP1000 that consists of 64 cells 9 [Ishihata et al. 1991] . We apply four kinds of parallelizing techniques to simultaneously execute the heterogeneous processes described in the last section [Goto and Muraoka, 1995] 6 Experiments and Results We tested BTS on 42 popular songs in the rock and pop music genres. The input was a monaural audio signal ....

H. Ishihata, T. Horie, S. Inano, T. Shimizu, and S. Kato. An architecture of highly parallel computer AP1000. In IEEE Pacific Rim Conf. on Communications, Computers, Signal Processing, pages 13016, 1991.


Beat Tracking based on Multiple-agent Architecture A Real-time .. - Goto, Muraoka (1996)   (1 citation)  (Correct)

....and feasible solution to the problem of performing a computationally intensive task, such as processing and understanding complex audio signals, in real time. Our system has been implemented on a distributed memory parallel computer, the Fujitsu AP1000 that consists of 64 processing elements(Ishihata et al. 1991). A different element or group of elements is assigned to each module, such as FFT, the onset time finder, the onset time vectorizer, the agent, the higher level checker, and the manager. These modules run concurrently and communicate with others by passing messages between processing elements. We ....

Ishihata, H.; Horie, T.; Inano, S.; Shimizu, T.; and Kato, S. 1991. An architecture of highly parallel computer AP1000. In IEEE Pacific Rim Conf. on Communications, Computers, Signal Processing, 13016.


Thal: An Actor System For Efficient And Scalable Concurrent.. - Kim (1997)   (8 citations)  (Correct)

....39, 51, 52, 94] they have grown in performance by a factor of almost 2 every year. The performance improvement in off the shelf microprocessors together with the availability of different kinds of low latency high bandwidth interconnects has caused stock hardware parallel machines to proliferate [65, 122, 134, 66, 60, 32, 77, 31, 110, 8]. Such machines offer a vast amount of computation capability to an extent that we have never dreamed of before. Indeed, it has been a challenge from the beginning of the parallel computing era to develop a general purpose programming system which allows users to enjoy the dramatically increased ....

H. Ishihata, T. Horie, S. Inano, T. Shimizu, and S. Kato. An Architecture of Highly Parallel Computer AP1000. In Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 13--16, May 1991.


A General Framework For Compiling Fine-Grain Threads In Concurrent .. - Oyama (1996)   (Correct)

....matching usual parallel machines[7] As communication latency gets smaller and smaller, intra node performance will be necessarily more highlighted. The system is implemented on a workstation and supports only pseudo parallelism now. In the future, it will be developed on the multicomputer AP1000[8]. After the next chapter introduces related work, we suggests requirement of concurrent programming language in chapter 3. Chapter 4 outlines our system organization and chapter 5 and 6 explain the details of the surface language and the intermediate language, respectively. After discussing the ....

Ishihata, H., T. Horie, S. Inano, T. Shimizu, S. Kato, An Architecture of Highly Parallel Computer AP1000, IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 13-16, May 1991.


Compiler Assisted Distributed Memory Parallelization of an.. - Pommerell, Rühl (1993)   (1 citation)  (Correct)

....simulators. We have parallelized PILS using the compiler Oxygen [7] a parallelizing Fortran compiler developed as part of the K2 project [8] The performance of the generated parallel program is evaluated on two DMPP systems, namely the CM5 from Thinking Machines [9] and the AP1000 from Fujitsu [10]. The paper is organized as follows. After a more precise definition of the project goal, we will introduce the hardware platforms used, and the two software packages PILS and Oxygen. We will then discuss additions to PILS that enable an efficient parallelization with Oxygen, and conclude with a ....

....tree s lowest level. As shown in Fig. 1 this is done by recursively decomposing the two dimensional network required by Oxygen. 3.1.2 Fujitsu AP1000 The AP1000 is not a commercial system, but an experimental multiprocessor with 64 to 1024 PEs. A detailed description of the system can be found in [10]. Each PE consists of a 25 MHz SPARC with FPU, 16 Mbytes DRAM (organized in four interleaved banks) and 128 Kbytes direct mapped cache memory. An additional message controller (MSC) and a routing controller (RSC) manage interprocessor communication. Three communication networks are available: the ....

H. Ishihata et al. An architecture of highly parallel computer AP1000. In Pacific Rim Conference on Communications, Computers and Signal Processing, pages 13--16. IEEE, May 1991.


A Compilation Framework for Languages with Dynamic Thread.. - Oyama, Taura, Yonezawa (1996)   (Correct)

....Schematic, which we have designed and implemented, is concurrent objectoriented extension to Scheme [18] Schematic encourages aggressive dynamic fine grain thread creations and accomplishes high efficiency of them. Schematic currently works on both workstation and parallel computer AP1000 [11]. Its global garbage collection technique [6] is also being researched. We explain the features of Schematic in section 2, and the structure of our compiler in section 3. We describe the intermediate language Venezia in section 4 and show its execution model in section 5. Section 6 states code ....

H. Ishihata, T. Horie, S. Inano, T. Shimizu, and S. Kato. An Architecture of Highly Parallel Computer AP1000. In IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 13--16, May 1991.


A Concurrent Object-Oriented Programming Language System for.. - Yasugi (1994)   (1 citation)  (Correct)

....lost here; and extensive flow analysis performed by the compiler will only serve as a minor solution. 7.4. 3 Comparison to Implementation on Stock Multicomputer There is another implementation work of ABCL, ABCL onAP1000[24, 35] on a multicomputer based on conventional SPARC chips (Fujitsu AP1000[13]) In ABCL onAP1000, an active message[25] is used for a remote message send, which is implemented by preparing a collection of message handlers and placing the first instruction address in the message when sending the message. Each handler code is specialized for the individual message format, ....

H. Ishihata, T. Horie, S. Inano, T. Shimizu, and S. Kato. An architecture of highly parallel computer AP1000. In IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 13--16, May 1991.


ABCL/f: A Future-Based Polymorphic Typed Concurrent.. - Taura, MATSUOKA.. (1994)   (5 citations)  (Correct)

....(irregular) parallel algorithms on distributed memory machines heavily rely on customized data allocation specified by programmers. A prototype compiler of our language ABCL f has been implemented on a distributed memory massively parallel processor (Fujitsu Laboratories AP1000 with 1024 nodes [3]) and the preliminary performance measurements are highly encouraging. In fact, an intra node method invocation of ABCL f is as fast as that of a sequential C [10, 13] The details of the language design and efficient implementation scheme on MPPs are found in [9] Also it should be noted that ....

....remains on the node permanently. A remote object creation is done by invoking the constructor function at the remote node. 7. Implementation Status and Preliminary Performance Results 7.1. Implementation Status. A prototype implementation of ABCL f on a distributed memory multicomputer AP1000 [3] has been nearly completed. AP1000 is a distributed memory multicomputer developed by the Fujitsu Laboratories Ltd. which comprises 32 1024 Sparc chips operating at 25Mhz clock cycle. Each node has a 16MB local (non shared) memory. Instead of developing a direct translation scheme from ABCL f to ....

Hiroaki Ishihata, Takeshi Horie, Satoshi Inano, Toshiyuki Shimizu, and Sadayuki Kato, An architecture of highly parallel computer AP1000, IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, 1991, pp. 13--16.


Optimisations for the memory hierarchy of a Singular Value .. - Czezowski, Strazdins (1994)   (Correct)

....processor, columns are exchanged between them according to Brent Luk ordering. 3 AP1000 Architecture The AP1000 (Array Processor 1000) is highly parallel computer with a torus topology, distributed memory (MIMD) and the main design goal to attain low latency and high throughput communication [9]. The system can have from 64 to 1024 processors (128 at the ANU) and has three independent networks: torus network (T net) broadcast network (B net) and synchronisation network (S net) High throughput communication is achieved by using independently those three communication networks. The T net ....

H. Ishihata, T. Horie, et al. An architecture of highly parallel computer AP1000. In IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing, pages 13--16. IEEE, 1991.


Prototyping Parallel LAPACK using Block-Cyclic Distributed BLAS - Strazdins (1994)   (1 citation)  (Correct)

....However, scalar based cells have typically a much lower bandwidth, compensated for by at least one level of cache. Large caches, however, requires w to be of the order of 100 for near peak performance, and hence require that w r for optimal performance [1, 8, 3] The Fujitsu AP1000 [4] is a scalar based distributed memory processor with a relatively high communication to computation ratio and low communication startup overhead. It can be configured into a P ThetaQ (toroidal) configuration, provided PQ does not exceed the total number of cells. The AP1000 cells are SPARC 1s ....

S. Inano H. Ishihata, T. Horie, T. Shimizu, and S. Kato. An Architecture of Highly Parallel Computer AP1000. In IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing, pages 13--16, May 1991.


A High Performance, Portable Distributed BLAS Implementation - Strazdins   (Correct)

....shape and size. These communication suboperations are performed inside the Distributed Matrix Communication module of Figure 1. This module assumes the underlying communication primitives effectively buffer the message in a system area upon receipt, e.g. in the ring buffers of the AP1000 or AP [4]. Where the communication requires large messages, partitioning of the messages occur so that the maximum size of unreceived messages can be kept within a constant defined in the Machine Parameters module (see Figure 1) 3 Portability Issues in Communication The ApLib BLACS Interface Module of ....

....as well as indexing calculations. It should be noted on a SPARC 1, an integer multiply or divide operation can take up to 50 cycles; the computation of oe(A; i; j) alone can involve about 20 such operations, amounting to overhead that is comparable to the AP1000 communication startup time [4]. These overheads are not parallelizable in any way, and thus impact on algorithm scalability. Their extent can be exposed by a careful performance modelling of the matrix factorization computation, which does not take into account any O(N) overheads except communication startup: the discrepancy ....

H. Ishihata, T. Horie, S. Inano, T. Shimizu, and S. Kato. An Architecture of Highly Parallel Computer AP1000. In IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing, pages 13--16, May 1991.


AP1000+: Architectural Support of PUT/GET.. - Hayashi, Doi.. (1994)   Self-citation (Ishihata Horie Shimizu)   (Correct)

....PUT GET operations as the low level communications for our parallelizing compilers. 1.4 New Highly Parallel Computer AP1000 After studying the mechanisms required by parallelizing compilers, we developed the AP1000 , a new distributed memory parallel computer. The AP1000 is an enhanced AP1000 [10, 20] that supports the architecture required for parallelizing compilers. The AP1000 uses PUT GET for basic data transfer and supports other mechanisms required for parallelizing compilers. 2 ASPLOS VI 94 2 Communication Mechanisms Required for Parallelizing Compilers 2.1 VPP Fortran and HPF ....

....value is incremented. The flag update mechanism, therefore, must be realized by fetch and increment. The mechanism must support barrier synchronization and global reduction both for all nodes and for specific groups of nodes. 4 AP1000 Architecture The AP1000 , an enhancement of the AP1000 [10, 20], is a distributed memory highly parallel computer that supports the communication mechanisms required by parallelizing compilers. Figure 4 shows the AP1000 system configuration and Figure 5 shows the processing element (cell) configuration. The AP1000 system consists of 4 to 1024 processing ....

Ishihata, H., Horie, T., Inano, S., Shimizu, T., and Kato, S. An architecture of highly parallel computer AP1000. In IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing (May 1991), pp. 13--16.


Unresponsiveness-Tolerant Collective Communication - Pakin (2001)   (Correct)

No context found.

Hiroaki Ishihata, Takeshi Horie, Satoshi Inano, Toshiyuki Shimizu, and Sadayuki Kato. An architecture of highly parallel computer AP1000. In 133 Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, pages 13--16, May 9--10, 1991. Available from ftp:// fcapwide.fujitsu.co.jp/ap1000/english/rim/rim 91.ps.Z.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC