• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines (1998)

by M Uysal, T Kurc, A Sussman, J Saltz
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 31
Next 10 →

Adaptive Performance Prediction for Distributed Data-Intensive Applications

by Marcio Faerman, Alan Su, Richard Wolski, Francine Berman , 1999
"... The computational grid is becoming the platform of choice for large-scale distributed data-intensive applications. Accurately predicting the transfer times of remote data les, a fundamental component of such applications, is critical to achieving application performance. In this paper, we introduce ..."
Abstract - Cited by 43 (4 self) - Add to MetaCart
The computational grid is becoming the platform of choice for large-scale distributed data-intensive applications. Accurately predicting the transfer times of remote data les, a fundamental component of such applications, is critical to achieving application performance. In this paper, we introduce a performance prediction method, ARM (Adaptive Regression Modeling), to determine data transfer times for network-bound distributed dataintensive applications. We demonstrate the eectiveness of the ARM method on two distributed data applications, SARA (Synthetic Aperture Radar Atlas) and SRB (Storage Resource Broker) , and discuss how it can be used for application scheduling. Our experiments demonstrate that applying the ARM method to these applications predicted data transfer times in wide-area multi-user grid environments with accuracy of 88% or better. 1 Introduction Ensembles of distributed computational, storage, and other resources, also known as computational grids [12, 14], are...

Performance Modelling of Parallel and Distributed Computing Using PACE

by J. Cao, D.J. Kerbyson, E. Papaefstathiou, G.R. Nudd
"... There is a wide range of performance models being developed for the performance evaluation of parallel and distributed systems. A performance modelling approach described in this paper is based on a layered framework of the PACE methodology. With an initial implementation system, the model described ..."
Abstract - Cited by 32 (18 self) - Add to MetaCart
There is a wide range of performance models being developed for the performance evaluation of parallel and distributed systems. A performance modelling approach described in this paper is based on a layered framework of the PACE methodology. With an initial implementation system, the model described by a performance specification language, CHIPS, can provide a capability for rapid calculation of relevant performance information without sacrificing accuracy of predictions. An example of the performance evaluation of an ASCI kernel application, Sweep3D, is used to illustrate the approach. The validation results on different parallel and distributed architectures with different problem sizes show a reasonable accuracy (approximately 10% error at most) can be obtained, allows cross-platform comparisons to be easily undertaken, and has a rapid evaluation time (typically less than 2s).
(Show Context)

Citation Context

...uler using expected performance as an aid. Performance predictions are generated from structural models, consisting of components that represent the performance activities of the application. • CHAOS =-=[12]-=-. A part of this work is concerned with the performance prediction of large-scale data intensive applications on large-scale parallel machines. It includes a simulation-based framework to predict the ...

Querying Very Large Multi-dimensional Datasets in ADR

by Tahsin Kurc , Chialin Chang , Renato Ferreira , Alan Sussman , Joel Saltz , 1999
"... Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space, and access to data ..."
Abstract - Cited by 30 (10 self) - Add to MetaCart
Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space, and access to data items is described by range queries. The basic processing involves mapping input data items to output data items, and some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on distributed-memory parallel architectures with multiple disks attached to each node. In this paper we address efficient execution of range queries on distributed memory parallel machines within ADR framework. We present three potential strategies, and evaluate them under different application scenarios and machine co...

Database support for data-driven scientific applications

by Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek, Joel Saltz - in the grid. Parallel Processing Letters , 2003
"... krishnan,kurc,umit,jsaltz¢ In this paper we describe a services oriented software system to provide basic database support for efficient execution of applications that make use of scientific datasets in the Grid. This system supports two core operations: efficient selection of the data of interest f ..."
Abstract - Cited by 22 (8 self) - Add to MetaCart
krishnan,kurc,umit,jsaltz¢ In this paper we describe a services oriented software system to provide basic database support for efficient execution of applications that make use of scientific datasets in the Grid. This system supports two core operations: efficient selection of the data of interest from distributed databases and efficient transfer of data from storage nodes to compute nodes for processing. We present its overall architecture and main components and describe preliminary experimental results. 1
(Show Context)

Citation Context

... the associated grid point. An application emulator preserves the important data, computation, and communication characteristics of the application, but allows the input and output sizes to be scaled =-=[85]-=-. For the experiments, we generated a dataset corresponding to 60 days of AVHRR satellite data. The total size of the dataset is 60GB. The dataset was divided into about 55000 data chunks, each of whi...

Optimizing Retrieval and Processing of Multi-dimensional Scientific Datasets

by Chialin Chang, Tahsin Kurc, Alan Sussman, Joel Saltz , 2000
"... Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets ..."
Abstract - Cited by 21 (11 self) - Add to MetaCart
Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we investigate approaches to guide and automate the selection of the best strategy for a given application and machine configuration. We describe analytical models to predict the relative performance of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output da...

Processing Large-Scale Multidimensional Data in Parallel and Distributed Environments

by Michael Beynon, Chialin Chang, Umit Catalyurek, Tahsin Kurc, Alan Sussman, Henrique Andrade, Renato Ferreira, Joel Saltz , 2002
"... Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability ..."
Abstract - Cited by 19 (13 self) - Add to MetaCart
Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.

A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O

by Gaurav Khanna, Nagavijayalakshmi Vydyanathan, Tahsin Kurc, Umit Catalyurek, Pete Wyckoff, Joel Saltz - In Proceedings of the 5th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid , 2005
"... This paper proposes a novel, hypergraph partitioning based strategy to schedule multiple data analysis tasks with batchshared I/O behavior. This strategy formulates the sharing of files among tasks as a hypergraph to minimize the I/O overheads due to transferring of the same set of files multiple ti ..."
Abstract - Cited by 13 (8 self) - Add to MetaCart
This paper proposes a novel, hypergraph partitioning based strategy to schedule multiple data analysis tasks with batchshared I/O behavior. This strategy formulates the sharing of files among tasks as a hypergraph to minimize the I/O overheads due to transferring of the same set of files multiple times and employs a dynamic scheme for file transfers to reduce contention on the storage system. We experimentally evaluate the proposed approach using application emulators from two application domains; analysis of remotelysensed data and biomedical imaging. 1

Meandre: Semantic-driven data-intensive flows in the clouds

by Xavier Llorà, Bernie Ács, Loretta S. Auvil, Boris Capitanu, Michael E. Welge, David E. Goldberg, Xavier Llorà, Bernie Ács, Loretta S. Auvil, Boris Capitanu, Michael E. Welge, David E. Goldberg - In Proceedings of the 4th IEEE International Conference on e-Science , 2008
"... Data-intensive flow computing allows efficient processing of large volumes of data otherwise unapproachable. This paper introduces a new semantic-driven data-intensive flow infrastructure which: (1) provides a robust and transparent scalable solution from a laptop to large-scale clusters,(2) creates ..."
Abstract - Cited by 12 (6 self) - Add to MetaCart
Data-intensive flow computing allows efficient processing of large volumes of data otherwise unapproachable. This paper introduces a new semantic-driven data-intensive flow infrastructure which: (1) provides a robust and transparent scalable solution from a laptop to large-scale clusters,(2) creates an unified solution for batch and interactive tasks in high-performance computing environments, and (3) encourages reusing and sharing components. Banking on virtualization and cloud computing techniques the Meandre infrastructure is able to create and dispose Meandre clusters on demand, being transparent to the final user. This paper also presents a prototype of such clustered infrastructure and some results obtained using it. 1
(Show Context)

Citation Context

...oreover, the growth of the internet is pushing researchers from all disciplines to deal with volumes of information where the only viable way of processing it is to utilize data-intensive frameworks (=-=Uysal, Kurc, Sussman, & Saltz, 1998-=-; Beynon, Kurc, Sussman, & Saltz, 2000; Foster, 2003; Mattmann, Crichton, 1 http://hadoop.apache.org/Medvidovic, & Hughes, 2006). However, the current frameworks have a large entry toll for non tech-...

Speed vs. Accuracy in Simulation for I/O-Intensive Applications

by Hyeonsang Eom, Jeffrey K. Hollingsworth - IPDPS , 2000
"... This paper presents a family of simulators that have been developed for data-intensive applications, and a methodology to select the most efficient one based on a usersupplied requirement for accuracy. The methodology consists of a series of tests that select an appropriate simulation based on the a ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
This paper presents a family of simulators that have been developed for data-intensive applications, and a methodology to select the most efficient one based on a usersupplied requirement for accuracy. The methodology consists of a series of tests that select an appropriate simulation based on the attributes of the application. In addition, each simulator provides two estimates of application execution time: one for the minimum expected time and the other for the maximum. We present the results of applying the strategy to existing applications and show that we can accurately simulate applications tens to hundreds of times faster than application execution time.
(Show Context)

Citation Context

...cess patterns of the applications in a controlled manner. They efficiently provide a representation of the application’s dynamic behavior. More information about application emulators can be found i=-=n [20]-=-. Our hardware simulators perform discrete-event simulation by processing the events of WFG produced by applications or emulators. The simulators maintain one global simulation clock and a resource cl...

Modelling of ASCI High Performance Applications Using PACE

by Junwei Cao, Darren J. Kerbyson, Efstathios Papaefstathiou, Graham R. Nudd - PROCEEDINGS OF 15TH ANNUAL UK PERFORMANCE ENGINEERING WORKSHOP , 1999
"... There is a wide range of models being developed for the performance evaluation of parallel and distributed systems. This has become an important area of research especially with the development of dynamic processing capabilities promised with Computational GRIDs [3]. A performance modelling approach ..."
Abstract - Cited by 9 (5 self) - Add to MetaCart
There is a wide range of models being developed for the performance evaluation of parallel and distributed systems. This has become an important area of research especially with the development of dynamic processing capabilities promised with Computational GRIDs [3]. A performance modelling approach described in this paper is based on a layered framework of the PACE methodology. In this system, the model described by a Performance Specification Language (PSL) provides the capability for rapid calculation of relevant performance information without sacrificing accuracy of predictions. An example of the performance evaluation of an ASCI kernel application, Sweep3D, is used to illustrate the approach. The validation of the model is shown for a cross-platform analysis on two parallel and distributed architectures with different problem sizes. Results show that a reasonable accuracy (approximately 10% error at most) can be obtained with a rapid evaluation time (typically less than 2s).
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University