Results 1 - 10
of
18
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets
- JOURNAL OF NETWORK AND COMPUTER APPLICATIONS
, 1999
"... In an increasing number of scientific disciplines, large data collections are emerging as important community resources. In this paper, we introduce design principles for a data management architecture called the Data Grid. We describe two basic services that we believe are fundamental to the des ..."
Abstract
-
Cited by 349 (39 self)
- Add to MetaCart
In an increasing number of scientific disciplines, large data collections are emerging as important community resources. In this paper, we introduce design principles for a data management architecture called the Data Grid. We describe two basic services that we believe are fundamental to the design of a data grid, namely, storage systems and metadata management. Next, we explain how these services can be used to develop higher-level services for replica management and replica selection. We conclude by describing our initial implementation of data grid functionality.
OLAP and Statistical Databases: Similarities and Differences
, 1997
"... During the 1980's there was a lot of activity in the area of Statistical Databases, focusing mostly on socio-economic type applications, such as census data, national production and consumption patterns, etc. In the 1990's the area of On-Line-Analytic Processing (OLAP) was introduced for the analysi ..."
Abstract
-
Cited by 59 (1 self)
- Add to MetaCart
During the 1980's there was a lot of activity in the area of Statistical Databases, focusing mostly on socio-economic type applications, such as census data, national production and consumption patterns, etc. In the 1990's the area of On-Line-Analytic Processing (OLAP) was introduced for the analysis of transaction based business data, such as retail stores transactions. Both areas deal with the representation and support of data in a multi-dimensional space. Much of the OLAP literature does not refer to the Statistical Database literature, perhaps because the connection between analyzing business data and socioeconomic data is not obvious. Furthermore, there are papers published in one area or the other whose results can be applied in both application areas. In this paper, we compare the work done in these two areas. We discuss concepts used in the conceptual modeling of the data and operations over them, efficient physical organization and access methods, as well as privaccy issues. ...
Storage of multidimensional arrays based on arbitrary tiling
- In Proc. of the 15th International Conference on Data Engineering, ICDE99
, 1999
"... Storage management of multidimensional arrays aims at supporting the array model needed by applications and insuring fast execution of access operations. Current approaches to store multidimensional arrays rely on partitioning data into chunks (equally sized subarrays). Regular partitioning, however ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Storage management of multidimensional arrays aims at supporting the array model needed by applications and insuring fast execution of access operations. Current approaches to store multidimensional arrays rely on partitioning data into chunks (equally sized subarrays). Regular partitioning, however, does not adapt to access patterns, leading to suboptimal access performance. In this paper, we propose a storage approach for multidimensional discrete data (MDD) based on multidimensional arbitrary tiling. Tiling is arbitrary in that any partitioning into disjoint multidimensional intervals as well as incomplete coverage of n-D space and gradual growth of MDDs are supported. The proposed approach allows the storage structure to be configured according to user access patterns through tunable tiling strategies. We describe four strategies and respective tiling algorithms and present performance measurements which show their effectiveness in reducing disk access and post-processing times for range queries. 1.
The RasDaMan Approach to Multidimensional Database Management
, 1997
"... Multidimensional discrete data (MDD), i.e., arrays of arbitrary size, dimension, and base type, are receiving growing attention among the database community. MDD occur in a variety of application fields, e.g., technical/scientific areas such as medical imaging, geographic information systems, climat ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Multidimensional discrete data (MDD), i.e., arrays of arbitrary size, dimension, and base type, are receiving growing attention among the database community. MDD occur in a variety of application fields, e.g., technical/scientific areas such as medical imaging, geographic information systems, climate research, scientific simulations, and businessoriented applications like OLAP and data mining. In all these application fields the data managed can be modeled as MDD. RasDaMan (Raster Data Management in Databases) is a basic research project sponsored by the European Community where industrial and research partners collaborate to develop comprehensive MDD database technology. In the approach adopted, the logical and physical levels are strictly separated. A data definition language for multidimensional arrays together with a declarative, optimizable query language allow for powerful associative retrieval. A streamlined storage manager for huge arrays enables fast, efficient access to MD...
Hierarchical Storage Support and Management for LargeScale Multidimensional Array Database Management Systems
- 3th International Conference on Database and Expert Systems Applications (DEXA), Aix en Provence
, 2002
"... Abstract. Large-scale scientific experiments or simulation programs often generate large amounts of multidimensional data. Data volume may reach hundreds of terabytes (up to petabytes). In the present and the near future, the only practicable way for storing such large volumes of multidimensional da ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Abstract. Large-scale scientific experiments or simulation programs often generate large amounts of multidimensional data. Data volume may reach hundreds of terabytes (up to petabytes). In the present and the near future, the only practicable way for storing such large volumes of multidimensional data are tertiary storage systems. But commercial (multidimensional) database systems are optimized for performance with primary and secondary memory access. So tertiary storage memory is only in an insufficient way supported for storing or retrieval of multidimensional array data. To combine the advantages of both techniques, storing large amounts of data on tertiary storage media and optimizing data access for retrieval with multidimensional database management systems is the intention of this paper. We introduce concepts for efficient hierarchical storage support and management for large-scale multidimensional array database management systems and their integration into the commercial array database management system RasDaMan. 1
Tertiary Storage: Current Status and Future Trends
- Computer Science Department, University of California, Santa Barbara
, 1996
"... This report summarizes current state of the art in tertiary storage systems. We begin with a comprehensive discussion of magnetic tape and optical storage technologies. This is followed by a classification of commercial products based on their performance characteristics. Our analysis of product dat ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This report summarizes current state of the art in tertiary storage systems. We begin with a comprehensive discussion of magnetic tape and optical storage technologies. This is followed by a classification of commercial products based on their performance characteristics. Our analysis of product data indicates that in contrast to disk technology, tertiary storage products have significant variablility in terms of data transfer rates as well as other performance figures. We then summarize efforts in the areas of operating systems, databases and advanced applications to integrate tertiary storage. We point out that different assumptions about the underlying technology result in entirely different algorithms and system design. We conclude the report with a speculation of future trends. 1 Introduction With the recent improvements in network and processor speeds, several data intensive applications have become much more feasible than ever before. Examples of such applications include digit...
Tertiary Storage Organization for Large Multidimensional Datasets
- IN 8TH NASA GODDARD CONFERENCE ON MASS STORAGE SYSTEMS AND TECHNOLOGIES
, 2000
"... ..."
Determining the Optimal File Size on Tertiary Storage Systems Based on the Distribution of Query Sizes
- PROC. 10TH INT. CONF. ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT
, 1998
"... In tertiary storage systems, the data is stored on multiple tape volumes where each tape is further divided into files. Since in many such systems the minimum unit of data transfer is a file, it is an important problem to match file sizes with the access patterns to the data. In general, if the file ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In tertiary storage systems, the data is stored on multiple tape volumes where each tape is further divided into files. Since in many such systems the minimum unit of data transfer is a file, it is an important problem to match file sizes with the access patterns to the data. In general, if the file size is large relative to the query size it will lead to the transfer of large amount of irrelevant data whereas small file sizes will incur an overhead penalty associated with reading each new file. In this work, we analyze the relationship between file sizes and query response times and provide a methodology to compute the optimal file size given information about the distribution of query sizes. Exact closed form solutions for the cost function are given for two common distributions.
Optimizing Tertiary Storage Organization and Access for Spatio-Temporal Datasets
- IN FOURTH NASA GODDARD CONFERENCE ON MASS STORAGE SYSTEMS AND TECHNOLOGIES, COLLAGE PARK
, 1995
"... We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and interactive visualization. We have developed data partitioning techniques that partition datasets into "clusters" based on analysis of data access patterns and storage device characteristics. The goal ...
Scheduling Queries on Taperesident Data
- In Proceeding of the European Conference on Parallel Computing
, 2000
"... Abstract. Tertiary storage systems are used when secondary storage can not satisfy the data storage requirements and/or it is a more cost effective option. The new application domains require on-demand retrieval of data from these devices. This paper investigates issues in optimizing I/O time for a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Tertiary storage systems are used when secondary storage can not satisfy the data storage requirements and/or it is a more cost effective option. The new application domains require on-demand retrieval of data from these devices. This paper investigates issues in optimizing I/O time for a query whose data resides on automated tertiary storage containing multiple storage devices. 1

