Results 1 - 10
of
15
A Java Commodity Grid Kit
, 2000
"... ... In this paper, we explain why CoG Kits are important, describe the design and implementation of a Java CoG Kit, and use examples to illustrate how CoG Kits can enable new approaches to application development based on the integrated use of commodity and Grid technologies. ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
... In this paper, we explain why CoG Kits are important, describe the design and implementation of a Java CoG Kit, and use examples to illustrate how CoG Kits can enable new approaches to application development based on the integrated use of commodity and Grid technologies.
Automatic Parallelization for Graphics Processing Units
"... Accelerated graphics cards, or Graphics Processing Units (GPUs), have become ubiquitous in recent years. On the right kinds of problems, GPUs greatly surpass CPUs in terms of raw performance. However, because they are difficult to program, GPUs are used only for a narrow class of special-purpose app ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
Accelerated graphics cards, or Graphics Processing Units (GPUs), have become ubiquitous in recent years. On the right kinds of problems, GPUs greatly surpass CPUs in terms of raw performance. However, because they are difficult to program, GPUs are used only for a narrow class of special-purpose applications; the raw processing power made available by GPUs is unused most of the time. This paper presents an extension to a Java JIT compiler that executes suitable code on the GPU instead of the CPU. Both static and dynamic features are used to decide whether it is feasible and beneficial to off-load a piece of code on the GPU. The paper presents a cost model that balances the speedup available from the GPU against the cost of transferring input and output data between main memory and GPU memory. The cost model is parameterized so that it can be applied to different hardware combinations. The paper also presents ways to overcome several obstacles to parallelization inherent in the design of the Java bytecode language: unstructured control flow, the lack of multi-dimensional arrays, the precise exception semantics, and the proliferation of indirect references. 1.
A Device Level Communication Library for the HPJava Programming Language
- In the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS
, 2003
"... Two characteristic run-time communication libraries of HPJava are developed as an application level library and device level library. A high-level communication API, Adlib, is developed as an application level communication library. This communication library supports collective operations on distri ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
Two characteristic run-time communication libraries of HPJava are developed as an application level library and device level library. A high-level communication API, Adlib, is developed as an application level communication library. This communication library supports collective operations on distributed arrays. The mpjdev API is a device level underlying communication library for HPJava. This library is developed to perform actual communication between processes.
Automatic Translation of Fortran to JVM Bytecode. Concurrency: Practice 98 References Experience
, 2003
"... This paper reports on the design of a Fortran-to-Java translator whose target language is the instruction set of the Java Virtual Machine. The goal of the translator is to generate Java implementations of legacy Fortran numerical codes in a consistent and reliable fashion. The benefits of directly g ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
This paper reports on the design of a Fortran-to-Java translator whose target language is the instruction set of the Java Virtual Machine. The goal of the translator is to generate Java implementations of legacy Fortran numerical codes in a consistent and reliable fashion. The benefits of directly generating bytecode are twofold. First, compared with generating Java source code, it provides a much more straightforward and efficient mechanism for translating Fortran GOTO statements. Second, it provides a framework for pursuing various compiler optimizations, which could be beneficial not only to our project, but to the Java community as a whole. Copyright c ○ 2003 John Wiley & Sons, Ltd. KEY WORDS: Fortran; Java; JVM; bytecode; numerical libraries
Effective Enhancement of Loop Versioning in Java
"... Run-time exception checking is required by the Java Language Specification (JLS). Though providing higher software reliability, that mechanism negatively affects performance of Java programs, especially those computationally intensive. This paper pursues loop versioning, a simple program transfo ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Run-time exception checking is required by the Java Language Specification (JLS). Though providing higher software reliability, that mechanism negatively affects performance of Java programs, especially those computationally intensive. This paper pursues loop versioning, a simple program transformation which often helps to avoid the checking overhead. Basing upon the Java Memory Model precisely defined in JLS, the work proposes a set of sufficient conditions for applicability of loop versioning. Scalable intra- and interprocedural analyses that efficiently check fulfilment of the conditions are also described. Implemented in Excelsior JET, an ahead-of-time compiler for Java, the developed technique results in significant performance improvements on some computational benchmarks.
Using MPI with C# and the Common Language Infrastructure
, 2002
"... We describe two interfaces for using the Message Passing Interface (MPI) with the C# programming language and the Common Language Infrastructure (CLI). The first interface provides CLI bindings that closely match the original MPI library specification. The second library presents a fully object-orie ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We describe two interfaces for using the Message Passing Interface (MPI) with the C# programming language and the Common Language Infrastructure (CLI). The first interface provides CLI bindings that closely match the original MPI library specification. The second library presents a fully object-oriented interface to MPI and exploits modern language features of C#. The interfaces described here use the P/Invoke feature of the CLI to dispatch to a native implementation of MPI (in our case, LAM/MPI). Performance results using the Shared Source CLI demonstrate there is only a small performance overhead incurred.
X10: an object-oriented approach to non-uniform cluster computing
, 2005
"... It is now well established that the device scaling predicted by Moore’s Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
It is now well established that the device scaling predicted by Moore’s Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor to multiprocessor configurations, so as to use parallelism in-stead of frequency scaling as the foundation for increased compute capacity. The dominant emerging multiprocessor structure for the future is a Non-Uniform Cluster Computing (NUCC) system with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and in-terconnected in horizontally scalable cluster configurations such as blade servers. Unlike previous generations of hard-ware evolution, this shift will have a major impact on exist-ing software. Current OO language facilities for concurrent and distributed programming are inadequate for addressing the needs of NUCC systems because they do not support the notions of non-uniform data access within a node, or of tight coupling of distributed nodes. We have designed a modern object-oriented programming language, X10, for high performance, high productivity pro-gramming of NUCC systems. A member of the parti-tioned global address space family of languages, X10 high-lights the explicit reification of locality in the form of places;
Data Structures in Java for Matrix Computations
, 2002
"... In this paper it is shown how to utilize Java arrays for matrix computations. We discuss the disadvantages of Java arrays when used as two-dimensional array for dense matrix computation, and how to improve the performance. We show how to create efficient dynamic data structure for sparse matrix comp ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
In this paper it is shown how to utilize Java arrays for matrix computations. We discuss the disadvantages of Java arrays when used as two-dimensional array for dense matrix computation, and how to improve the performance. We show how to create efficient dynamic data structure for sparse matrix computation using Java's native arrays. We construct a data structure for large sparse matrices that is unique for Java. This datastructure is shown to be more dynamic and efficient than the traditional storage schemes for large sparse matrices. Numerical results show that this new data structure, called Java Sparse Array (JSA), is competitive with the traditionally Compressed Row Storage scheme (CRS) on matrix computation routines. Java gives exibility without loosing efficiency. Compared to other object oriented data structures it is shown that JSA has the same flexibility.
Collective communication for
"... This paper addresses functionality and implementation of a HPJava version of the Adlib collective communication library for data parallel programming. We begin by illustrating typical use of the library, through an example multigrid application. Then we describe implementation issues for the high-le ..."
Abstract
- Add to MetaCart
(Show Context)
This paper addresses functionality and implementation of a HPJava version of the Adlib collective communication library for data parallel programming. We begin by illustrating typical use of the library, through an example multigrid application. Then we describe implementation issues for the high-level library. At a software engineering level, we illustrate how the primitives of the HPJava language assist in writing library methods whose implementation can be largely independent of the distribution format of the argument arrays. We also describe a low-level API called mpjdev, which handles basic communication underlying the Adlib implementation. Finally we present some benchmark results, and some conclusions. Copyright c © 2005 John Wiley & Sons, Ltd. KEY WORDS: HPspmd programming model; HPJava; Adlib; Java
ProActive: Using a Java Middleware for HPC Design, Implementation and Benchmarks
"... Abstract-Although Java is among the most used programming languages, its use for HPC applications is still marginal. This article reports on the design, implementation and benchmarking of a Java version of the NAS Parallel Benchmarks translated from their original Fortran / MPI implementation. We h ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-Although Java is among the most used programming languages, its use for HPC applications is still marginal. This article reports on the design, implementation and benchmarking of a Java version of the NAS Parallel Benchmarks translated from their original Fortran / MPI implementation. We have based our version on ProActive, an open source middleware designed for parallel and distributed computing. This paper gives a description of the ProActive middleware principles, and how we have implemented the NAS Parallel Benchmark on such Java library. We Also gives some basic rules to write HPC code in Java. Finally, we have compared the overall performance between the legacy and the Java ProActive version. We show that the performance varies with the type of computation but also with the Java Virtual Machine, no single one providing the best performance in all experiments. We also show that the performance of the Java version is close to the Fortran one on computational intensive benchmarks. However, on some communications intensive benchmarks, the Java version exhibits scalability issues, even when using a high performance socket implementation (JFS).