Results 1 
9 of
9
Parallel Lattice Boltzmann Flow Simulation on a Lowcost Playstation 3 Cluster
 International Journal of Computer Science
, 2008
"... Abstract. A parallel Lattice Boltzmann Method (pLBM), which is based on hierarchical spatial decomposition, is designed to perform largescale flow simulations. The algorithm uses critical sectionfree, dual representation in order to expose maximal concurrency and data locality. Performances of eme ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract. A parallel Lattice Boltzmann Method (pLBM), which is based on hierarchical spatial decomposition, is designed to perform largescale flow simulations. The algorithm uses critical sectionfree, dual representation in order to expose maximal concurrency and data locality. Performances of emerging multicore platforms—PlayStation3 (Cell Broadband Engine) and Compute Unified Device Architecture (CUDA)—are tested using the pLBM, which is implemented with multithread and messagepassing programming. The results show that pLBM achieves good performance improvement, 11.02 for Cell over a traditional Xeon cluster and 8.76 for CUDA graphics processing unit (GPU) over a Sempron central processing unit (CPU). The results provide some insights into application design on future manycore platforms.
Performance loss between concept and keyboard
"... Abstract. Standards bodies and commercial software vendors have defined parallel constructs to harness the parallelism in computations. Using the task graph model of parallel program execution, we show how common programming constructs that impose seriesparallel task dependencies can lead to unbou ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Standards bodies and commercial software vendors have defined parallel constructs to harness the parallelism in computations. Using the task graph model of parallel program execution, we show how common programming constructs that impose seriesparallel task dependencies can lead to unbounded slowdown compared to the inherent parallelism in the algorithm. We describe various ways in which this slowdown can be avoided. Inexpensive multicore processors have brought parallelism to the desktop computer We focus on a specific structure on the dependencies between program tasks which some constructs impose. This structure is called seriesparallel and can be most easily expressed as those task graphs generated by the language P ::= seq(P, P )  par (P, P )  a where a is a task or activity which represents some amount of program code (possibly as small as a single arithmetic operation) to be executed on one processor. Seriesparallel task graphs are not only easy to express, but also have modular structure which can be exploited for efficient scheduling We can represent the tasks in a parallel program, and the dependencies between tasks, in a task graph (also known as an activity network). A task graph is a directed graph where each node is labelled with a distinct activity name, and associated with each activity is a positive real number, its duration, describing how long the activity will take to execute. Arcs in the task graph capture dependencies between tasks (also known as precedence constraints). The left graph in the figure is seriesparallel and can be written as seq 4 Via the MATLAB Parallel Computing Toolbox.
Experimental Demonstration of EndtoEnd Message Passing for HPC systems through a Hybrid Optical Switch
, 2011
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
 Add to MetaCart
(Show Context)
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Associate Editor: XXXXXXX
"... Motivation: Many algorithms used in analysis of high dimensional data require significant processing time due to the sheer number of values compared. We describe the results of the parallelization of two algorithms central to the functionality of the network analysis tool BioLayout Express 3D; the c ..."
Abstract
 Add to MetaCart
(Show Context)
Motivation: Many algorithms used in analysis of high dimensional data require significant processing time due to the sheer number of values compared. We describe the results of the parallelization of two algorithms central to the functionality of the network analysis tool BioLayout Express 3D; the calculation of correlation (Pearson, Spearman Rank) coefficient matrices used to define relationships in large datasets, such as between gene expression profiles in microarray analyses and the FruchtermanRheingold graph layout algorithm used in the visualization of the resulting networks. Results: Initially, the Java 1.6 and ANSI C99 languages were used to provide multithreaded implementations of these algorithms and to run on all available CPUs. Secondly, the OpenCL C language was used as part of the OpenCL API to harness the processing power of GPUs. Both approaches have been implemented using a platform and hardware independent approach. We discuss the issues associated with the parallelization of these very different algorithms and provide detailed comparisons of the results where we have achieved speedups of more than 60x times compared to nonparallel implementations. Availability: The code is publicly available and utilized within the
Design and Implementation of Rough Set Algorithms on FPGA: A Survey
"... Abstract—Rough set theory, developed by Z. Pawlak, is a powerful soft computing tool for extracting meaningful patterns from vague, imprecise, inconsistent and large chunk of data. It classifies the given knowledge base approximately into suitable decision classes by removing irrelevant and redundan ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Rough set theory, developed by Z. Pawlak, is a powerful soft computing tool for extracting meaningful patterns from vague, imprecise, inconsistent and large chunk of data. It classifies the given knowledge base approximately into suitable decision classes by removing irrelevant and redundant data using attribute reduction algorithm. Conventional Rough set information processing like discovering data dependencies, data reduction, and approximate set classification involves the use of software running on general purpose processor. Since last decade, researchers have started exploring the feasibility of these algorithms on FPGA. The algorithms implemented on a conventional processor using any standard software routine offers high flexibility but the performance deteriorates while handling larger real time databases. With the tremendous growth in FPGA, a new area of research has boomed up. FPGA offers a promising solution in terms of speed, power and cost and researchers have proved the benefits of mapping rough set algorithms on FGPA. In this paper, a survey on hardware implementation of rough set algorithms by various researchers is elaborated. Keywords—Rough set theory; Discernibility matrix; reduct; Core; FPGA; classification I.
LIST OF TABLES......................................... viii
, 2011
"... Analysis and acceleration of data mining algorithms on high performance reconfigurable computing platforms ..."
Abstract
 Add to MetaCart
Analysis and acceleration of data mining algorithms on high performance reconfigurable computing platforms
OSA / IPR/PS 2010 Experimental Demonstration of OpticallyConnected SDRAM
"... Abstract: A fourchannel, 2.5Gb/s, alloptical WDM link is established between SDRAM and an emulated CPU. Data integrity and errorfree performance are verified with a sequence of SDRAM write and read operations. ©2010 Optical Society of America ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: A fourchannel, 2.5Gb/s, alloptical WDM link is established between SDRAM and an emulated CPU. Data integrity and errorfree performance are verified with a sequence of SDRAM write and read operations. ©2010 Optical Society of America