Results 1 - 10
of
13
Uniform Distributed Cache Service for Grid Computing ∗
"... The uniform cache system, suggested in this paper, illustrates an approach for high level grid data access. It aims to optimize the use of grid data resources and increase the access performance. The system is based over the grid caching concept, where applications share and reuse data semantic info ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
The uniform cache system, suggested in this paper, illustrates an approach for high level grid data access. It aims to optimize the use of grid data resources and increase the access performance. The system is based over the grid caching concept, where applications share and reuse data semantic information (metadata and data-content). The generated information is analyzed and used to manage the system dynamically. Therefore the uniform cache system applies the techniques of collaborative caching for data management in the grid. 1.
Performance engineering in data Grids
- Concurrency & Computation: Practice & Experience
, 2005
"... The vision of Grid computing is to facilitate world-wide resource sharing among distributed collaborations. With the help of numerous national and international Grid projects this vision is becoming reality and Grid systems are attracting an ever increasing user base. However, Grids are still quite ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
The vision of Grid computing is to facilitate world-wide resource sharing among distributed collaborations. With the help of numerous national and international Grid projects this vision is becoming reality and Grid systems are attracting an ever increasing user base. However, Grids are still quite complex software systems whose efficient use is a difficult and error prone task. In this paper we present performance engineering techniques that aim to facilitate an efficient use of Grid systems, in particular systems that deal with the management of large scale data sets in the tera- and petabyte range (also referred to as Data Grids). These techniques are applicable at different layers of a Grid architecture and we discuss the tools required at each of these layers to implement them. Having discussed important performance engineering techniques we investigate how major Grid projects deal with performance issues particularly related to Data Grids and how they implement the techniques presented. 1.
Distributed Generation of NASA Earth Science Data Products
, 2005
"... The objective of this work is the development of grid-based approaches through which NASA data centers can become active participants in serving data users by transforming archived data into the specific form needed by the user. This approach involves generating custom data products from data stored ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The objective of this work is the development of grid-based approaches through which NASA data centers can become active participants in serving data users by transforming archived data into the specific form needed by the user. This approach involves generating custom data products from data stored in multiple NASA data centers. We describe a prototype developed to explore how grid technology can facilitate this multi-center product generation. Our initial example of a custom data product is phenomena-based subsetting. This example involves production of a subset of a large collection of data based on the subset’s association with some phenomena, such as a mesoscale convective system (severe storm) or a hurricane. We demonstrate that this subsetting can be performed on data located at a single data center or at multiple data centers. We also describe a system that performed customized data product generation using a combination of commodity processors deployed at a NASA data center, grid technology to access these processors, and data mining software that intelligently selects where to perform processing based on data location and availability of compute
A novel data grid coherence protocol using pipeline-based aggressive copy method
- in GPC, 2007
"... Abstract. Grid systems are well-known for its high performance computing or large data storage with inexpensive devices. They can be categorized into two major types: computational grid and data grid. Data grid is used for data intensive applications. In data grids, replication is used to reduce acc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Grid systems are well-known for its high performance computing or large data storage with inexpensive devices. They can be categorized into two major types: computational grid and data grid. Data grid is used for data intensive applications. In data grids, replication is used to reduce access latency. It can also improve data availability, load balancing and fault tolerance. If there are many replicas, they may have coherence problems while being updated. In this paper, based on the aggressive-copy method, we developed an algorithm using pipeline concept, such that the data transfer tasks can be done simultaneously. This novel Pipeline-based Aggressive Copy method can accelerate the update speed and decrease users ’ waiting times. We used Globus toolkit for our framework. Compared with the existing schemes and from the preliminary simulation results, our method shows notable improvement in overall completion time.
File-based replica management
"... Data replication is one of the best known strategies to achieve high levels of availability and fault tolerance, as well as minimal access times for large, distributed user communities using a world-wide Data Grid. In certain scientific application domains, the data volume can reach the order of sev ..."
Abstract
- Add to MetaCart
(Show Context)
Data replication is one of the best known strategies to achieve high levels of availability and fault tolerance, as well as minimal access times for large, distributed user communities using a world-wide Data Grid. In certain scientific application domains, the data volume can reach the order of several petabytes; in these domains, data replication and access optimization play an important role in the manageability and usability of the Grid. In this paper, we present the design and implementation of a replica management Grid middleware that was developed within the EDG project [European DataGrid Project (EDG), http://www.eu-egee.org] and is designed to be extensible so that user communities can adjust its detailed behavior according to their QoS requirements. © 2004 Elsevier B.V. All rights reserved.
Partner(s): CERN Lead Partner: EDG
, 2003
"... Data replication is one of the best known strategies to achieve high levels of availability and fault tolerance, as well as minimal access times for large, distributed user communities using a world-wide Data Grid. In certain scientific application domains the data volume can reach the order of seve ..."
Abstract
- Add to MetaCart
Data replication is one of the best known strategies to achieve high levels of availability and fault tolerance, as well as minimal access times for large, distributed user communities using a world-wide Data Grid. In certain scientific application domains the data volume can reach the order of several petabytes; in these domains data replication and access optimization play an important role in the manageability and usability of the Grid.
Chameleon: A Resource Scheduler in A Data Grid Environment*
"... Grid computing is moving into two ways. The Computational Grid focuses on reducing execution time of applications that require a great number of computer processing cycles. The Data Grid provides the way to solve large scale data management problems. Data intensive applications such as High Energy P ..."
Abstract
- Add to MetaCart
Grid computing is moving into two ways. The Computational Grid focuses on reducing execution time of applications that require a great number of computer processing cycles. The Data Grid provides the way to solve large scale data management problems. Data intensive applications such as High Energy Physics and Bioinformatics require both Computational and Data Grid features. Job scheduling in Grid has been mostly discussed from the perspective of computational Grid. However, scheduling on Data Grid is just a recent focus of Grid computing activities. In Data Grid environment, effective scheduling mechanism considering both computational and data storage resources must be provided for large scale data intensive applications. In this paper, we describe new scheduling model that considers both amount of computational resources and data availability in Data Grid environment. We implemented a scheduler, called Chameleon, based on the proposed application scheduling model. Chameleon shows performance improvements in data intensive applications that require both large number of processors and data replication mechanisms. The results achieved from Chameleon are presented. 1.
심 사 위 원 고 영 배 (인)
, 2003
"... - ii-The aim of grid computing is to aggregate Internet-wide distributed resources so that high performance applications such as High Energy Physics and Bioinformatics can be executed guaranteeing fast speed and reliability. Two important topics newly emerging in the grid ..."
Abstract
- Add to MetaCart
- ii-The aim of grid computing is to aggregate Internet-wide distributed resources so that high performance applications such as High Energy Physics and Bioinformatics can be executed guaranteeing fast speed and reliability. Two important topics newly emerging in the grid
OptorSim -- A Grid Simulator for . . .
- INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS
, 2003
"... Computational Grids process large, computationally intensive problems on small data sets. In contrast, Data Grids process large computational problems that in turn require evaluating, mining and producing large amounts of data. Replication, creating geographically disparate identical copies of d ..."
Abstract
- Add to MetaCart
Computational Grids process large, computationally intensive problems on small data sets. In contrast, Data Grids process large computational problems that in turn require evaluating, mining and producing large amounts of data. Replication, creating geographically disparate identical copies of data, is regarded as one of the major optimisation techniques for reducing data access costs. In this paper,
A GA-Based Replica Placement Mechanism for Data Grid
"... Abstract—Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. To increase resource availability and to ease resource sharing in such environment, there is a need for replication services ..."
Abstract
- Add to MetaCart
Abstract—Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. To increase resource availability and to ease resource sharing in such environment, there is a need for replication services. Data replication is one of the methods used to improve the performance of data access in distributed systems by replicating multiple copies of data files in the distributed sites. Replica placement mechanism is the process of identifying where to place copies of replicated data files in a Grid system. Choosing the best location is not an easy task. Current works find the best location based on number of requests and read cost of a certain file. As a result, a large bandwidth is consumed and increases the computational time. Authors proposed a GA-Based Replica Placement Mechanism (DBRPM) that finds the best locations to store replicas based on five criteria, namely, 1) Read Cost, 2)