• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Design and Evaluation of a Smart Disk Cluster for DSS Commercial Workloads (2001)

by Gokhan Memik, Mahmut T. Kandemir, Alok Choudhary
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

Exploiting Programmable Network Interfaces for Parallel Query Execution

by V. Santhosh Kumar, M. J. Thazhuthaveetil, R. Govindarajan - in Workstation Clusters. TR-HPC-10/2005, LHPC, SERC, IISc , 2005
"... Workstation clusters equipped with high performance interconnect having programmable network processors facilitate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in parallel databa ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Workstation clusters equipped with high performance interconnect having programmable network processors facilitate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in parallel database query execution is performed on the network processor. We evaluate the performance of TPC-H queries executing on a high end cluster where all tuple processing is done on the host processor, using a timed Petri net model, and find that tuple processing costs on the host processor dominate the execution time. These results are validated using a small cluster. We therefore propose 4 schemes where certain tuple processing activity is offloaded to the network processor. The first 2 schemes offload the tuple splitting activity – computation to identify the node on which to process the tuples, resulting in an execution time speedup of 1.09 relative to the base scheme, but with I/O bus becoming the bottleneck resource. In the 3rd scheme in addition to offloading tuple processing activity, the disk and network interface are combined to avoid the I/O bus bottleneck, which results in speedups upto 1.16, but with high host processor utilization. Our 4th scheme where the network processor also performs a part of join operation along with the host processor, gives a speedup of 1.47 along with balanced system resource utilizations. Further we observe that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64.

Improving Transaction Processing using a Hierarchical Computing Server

by Juan Rubio, Madhavi Valluri, Lizy John , 2002
"... Transaction processing workloads impose heavy demands on the memory and storage sub-systems and oten result in large amounts of franc in I/O and memory buses. In this paper, we design a hierarchical computing system that utilizes processing elements distributed across the storage hierarchy, with the ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Transaction processing workloads impose heavy demands on the memory and storage sub-systems and oten result in large amounts of franc in I/O and memory buses. In this paper, we design a hierarchical computing system that utilizes processing elements distributed across the storage hierarchy, with the objective of reducing the data movement between the storage subsystem and the processing units. The computing elements can be commodity processing elements or specialized devices similar to active memory modules and active disk devices emerging from other research groups. Database queries are partitioned across the different layers of the hierarchy depending on the anity of code to a particular layer. Commands percolate down into the lower layers of the hierarchy and partially processed information flows up into the higher layers, where if necessary subsequent operations are performed. All layers actively participate in the processing of the transaction by doing tasks for which they are particularly suited. We evaluate the effectiveness of the hierarchical computing model using the SimOS full system simulator. On TPC-H queries, hierarchical computing systems with 4 and 8 memory modules can yield speedups in the range of 1.18x to 1.5x when compared with equivalent shared-memory multiprocessor systems. Likewise, on a hierarchical system with 2 memory modules and 4 disk modules, we see speedups between 1.07x and 1.27x with respect to a 8-way SMP system, even though the hierarchical system has one less processor. In comparison to uniprocessors, the speedups are between 2.9x and 3.9x for the 4-way hierarchical system, up to 7.2x for the 8-way hierarchical system and up to 6.8x for a hierarchical computing system with 2 memory modules and 4 disk modules. We also implement the hierarchical computing paradigm on SMP hardware, however, bus contention is still seen to limit performance.

Acceleration of a Content-Based Image-Retrieval Application on the RDISK Cluster ∗

by Auguste Noumsi, Steven Derrien, Patrice Quinton
"... Because of the growing use of multimedia content over Internet, Content-Based Image Retrieval (CBIR) has recently received a lot of interest. While accurate search techniques based on local image descriptors exist, they suffer from very long execution time. We propose to accelerate CBIR on the RDISK ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Because of the growing use of multimedia content over Internet, Content-Based Image Retrieval (CBIR) has recently received a lot of interest. While accurate search techniques based on local image descriptors exist, they suffer from very long execution time. We propose to accelerate CBIR on the RDISK machine, a cluster of FPGA-enhanced hard-drives, that follows the philosophy of smart-disks. Our platform combines coarse and fine grain parallelism thanks to the concurrent use of the cluster nodes and of a programmable logic device. The implementation of the CBIR application on this mixed hardware/software platform follows a strict methodology, that was validated on realistic data-set (image database of more than 30,000 images). This methodology allows us to adapt the original algorithm to suit a hardware implementation, and to select the values of some key design parameters to maximize global performance. Our preliminary results indicate that speed-ups between 120 and 200 could be obtained for a cluster of 32 nodes compared with a software implementation running on a standard desktop PC. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University