• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries (2002)

by C Chen, C T Cheng
Venue:In Proceedings of ACM PODS
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 22
Next 10 →

Automating Layout of Relational Databases

by Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das, Vivek Narasayya - In Proceedings of 19th International Conference on Data Engineering , 2003
"... The choice of database layout, i.e., how database objects such as tables and indexes are assigned to disk drives can significantly impact the I/O performance of the system. Today, DBAs typically rely on fully striping objects across all available disk drives as the basic mechanism for optimizing I/O ..."
Abstract - Cited by 16 (1 self) - Add to MetaCart
The choice of database layout, i.e., how database objects such as tables and indexes are assigned to disk drives can significantly impact the I/O performance of the system. Today, DBAs typically rely on fully striping objects across all available disk drives as the basic mechanism for optimizing I/O performance. While full striping maximizes I/O parallelism, when query execution involves co-access of two or more large objects, e.g., a merge join of two tables, the above strategy may be suboptimal due to the increased number of random I/O accesses on each disk drive. In this paper, we propose a framework for automating the choice of database layout for a given database that also takes into account the effects of co-accessed objects in the workload faced by the system. We formulate the above as an optimization problem and present an efficient solution to the problem that judiciously takes into account the trade-off between I/O parallelism and random I/O accesses. Our experiments on Microsoft SQL Server show the superior I/O performance of our techniques compared to the traditional approach of fully striping each database object across all disk drives. 1.
(Show Context)

Citation Context

...riate for OLTP workloads. They propose a scheme in which only the parity data is striped and the database objects themselves are not necessarily striped across all disks. Another area of related work =-=[5,13]-=- is studying how to decluster a single table across a set of disks for the case of grid-queries. The goal of these papers is to maximize I/O parallelism for the above restricted class of queries. Unli...

Efficient parallel processing of range queries through replicated declustering

by Hakan Ferhatosmanoglu, Ali Saman Tosun, Guadalupe Canahuate, Aravind Ramachandran - JOURNAL OF DISTRIBUTED AND PARALLEL DATABASES
"... A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, an ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets.

Analysis and comparison of replicated declustering schemes

by Ali S Aman Tosun - IEEE Transactions on Parallel and Distributed Systems , 2007
"... Abstract—Declustering distributes data among parallel disks to reduce the retrieval cost using I/O parallelism. Many schemes were proposed for the single-copy declustering of spatial data. Recently, declustering using replication gained a lot of interest and several schemes with different properties ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Abstract—Declustering distributes data among parallel disks to reduce the retrieval cost using I/O parallelism. Many schemes were proposed for the single-copy declustering of spatial data. Recently, declustering using replication gained a lot of interest and several schemes with different properties were proposed. An in-depth comparison of major schemes is necessary to understand replicated declustering better. In this paper, we analyze the proposed schemes, tune some of the parameters, and compare them for different query types and under different loads. We propose a three-step retrieval algorithm for the compared schemes. For arbitrary queries, the dependent and partitioned allocation schemes perform poorly; others perform close to each other. For range queries, they perform similarly with the exception of smaller queries in which random duplicate allocation (RDA) performs poorly and dependent allocation performs well. For connected queries, partitioned allocation performs poorly and dependent allocation performs well under a light load. Index Terms—Declustering, parallel I/O, spatial range query, Latin square. 1
(Show Context)

Citation Context

...-Optimal Declustering [5], General Multidimensional Data Allocation [27], cyclic allocation schemes [36], [37], Golden Ratio Sequences [7], Hierarchical Declustering [6], and Discrepancy Declustering =-=[9]-=-. Using declustering and replication, approaches including Complete Coloring [20] have optimal performance and Square Root Colors Disk Modulo [20] has one more than optimal. Some declustering techniqu...

Multidimensional declustering schemes using golden ratio and kronecker sequences

by Chung-min Chen, Randeep Bhatia, Rakesh K. Sinha - In IEEE Trans. on Knowledge and Data Engineering , 2003
"... ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...e for 8 queries of all possible shapes. Our simulation (presented in Sections 6 and 7) shows that GRS outperforms GeMDA both in terms of worst case as well as average case performance. Very recently, =-=[9-=-] described a scheme with guaranteed worst case performance for any dimensions. The scheme, however, is dened only when M is a prime number and takes exponential time to construct. Its performance in ...

Efficient retrieval of replicated data

by Ali Saman Tosun , 2006
"... Declustering is a common technique used to reduce query response times. Data is declustered over multiple disks and query retrieval can be parallelized. Most of the research on declustering is targeted at spatial range queries and investigates schemes with low additive error. Recently, declusterin ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Declustering is a common technique used to reduce query response times. Data is declustered over multiple disks and query retrieval can be parallelized. Most of the research on declustering is targeted at spatial range queries and investigates schemes with low additive error. Recently, declustering using replication has been proposed to reduce the additive overhead. Replication significantly reduces retrieval cost of arbitrary queries. In this paper, we propose a disk allocation and retrieval mechanism for arbitrary queries based on design theory. Using the proposed c-copy replicated declustering scheme, (c − 1)k 2 + ck buckets can be retrieved using at most k disk accesses. Retrieval algorithm is very efficient and is asymptotically optimal with �(|Q|) complexity for a query Q. In addition to the deterministic worst-case bound and efficient retrieval, proposed algorithm handles nonuniform data, high dimensions, supports incremental declustering and has good faulttolerance property. Experimental results show the feasibility of the algorithm.

Replicated Parallel I/O without Additional Scheduling Costs

by Mikhail J. Atallah, Keith Frikken , 2003
"... A common technique for improving performance in a database is to decluster the database among multiple disks so that data retrieval can be parallelized. In this paper we focus on answering range queries in a multidimensional database (such as a GIS), where each of its dimensions is divided uniforml ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
A common technique for improving performance in a database is to decluster the database among multiple disks so that data retrieval can be parallelized. In this paper we focus on answering range queries in a multidimensional database (such as a GIS), where each of its dimensions is divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for this problem (a subset of which is [1,2,3,4,5,6,7,8,9,11,12,13,14,15]). A declustering scheme would be optimal if any range query could be answered by doing no more than ⌈ # of tiles inside the range/ # of disks ⌉ retrievals from any one disk. However, it was shown in [1] that this is not achievable in many cases even for two dimensions, and therefore much of the research in this area has focused on developing schemes that performed close to optimal. Recently, the idea of using replication (i.e. placing records on more than one disk) to increase performance has been introduced. [7, 12,13,15]. If replication is used, a retrieval schedule (i.e. which disk to retrieve each tile from) must be computed whenever a query is being processed. In this paper we introduce a class of replicated schemes where the retrieval schedule can be computed in time O( # of tiles inside the query’s range), which is asymptotically equivalent to query retrieval for the non-replicated case. Furthermore, this class of schemes has a strong performance advantage over non-replicated schemes, and several schemes are introduced that are either optimal or are optimal plus a constant additive factor. Also presented in this paper is a strictly optimal scheme for any number of colors that requires the lowest known level of replication of any such scheme.

Cropping-Resilient Segmented Multiple Watermarking (Extended Abstract)

by Keith Frikken, Mikhail J. Atallah , 2003
"... Watermarking is a frequently used tool for digital rights management. An example of this is using watermarks to place ownership information into an object. There are many instances where placing multiple watermarks into the same object is desired. One mechanism that has been proposed for doing this ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Watermarking is a frequently used tool for digital rights management. An example of this is using watermarks to place ownership information into an object. There are many instances where placing multiple watermarks into the same object is desired. One mechanism that has been proposed for doing this is segmenting the data into a grid and placing watermarks into different regions of the grid. This is particularly suited for images and geographic information systems (GIS) databases as they already consist of a fine granularity grid (of pixels, geographic regions, etc.); a grid cell for watermarking is an aggregation of the original fine granularity cells. An attacker may be interested in only a subset of the watermarked data, and it is crucial that the watermarks survive in the subset selected by the attacker. In the kind of data mentioned above (images, GIS, etc.) such an attack typically consists of cropping, e.g. selecting a geographic region between two latitudes and longitudes (in the GIS case) or a rectangular region of pixels (in an image). The contribution of this paper is a set of schemes and their analysis for multiple watermark placement that maximizes resilience to the above mentioned cropping attack. This involves the definition of various performance metrics and their use in evaluating and comparing various placement schemes.

Multi-site retrieval of declustered data

by A S Tosun - In Proceedings of 28th International Conference on Distributed Computing Systems (ICDCS
"... ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract not found

A Hierarchical Technique for Constructing Efficient Declustering Schemes for Range Queries

by Randeep Bhatia, Rakesh K. Sinha, Chung-min Chen
"... ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found

Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays

by Nihat Altiparmak, Available From Nihat Altiparmak, Nihat Altiparmak, Ali Şaman Tosun, San Antonio
"... Declustering techniques reduce query response times through parallel I/O by distributing data among paral-lel disks. Recently, replication-based approaches were proposed to further reduce the response time. Efficient retrieval of replicated data from multiple disks is a challenging problem. Existing ..."
Abstract - Add to MetaCart
Declustering techniques reduce query response times through parallel I/O by distributing data among paral-lel disks. Recently, replication-based approaches were proposed to further reduce the response time. Efficient retrieval of replicated data from multiple disks is a challenging problem. Existing retrieval techniques are designed for storage arrays with identical disks, having no initial load or network delay. In this article, we consider the generalized retrieval problem of replicated data where the disks in the system might be heterogeneous, the disks may have initial load, and the storage arrays might be located on different sites. We first formulate the generalized retrieval problem using a Linear Programming (LP) model and solve it with mixed integer programming techniques. Next, the generalized retrieval problem is formulated as a more efficient maximum flow problem. We prove that the retrieval schedule returned by the maximum flow technique yields the optimal response time and this result matches the LP solution. We also propose a low-complexity online algorithm for the generalized retrieval problem by not guaranteeing the optimality of the result. Performance of proposed and state of the art retrieval strategies are investigated using various
(Show Context)

Citation Context

...ata Allocation [Hua and Young 1997], cyclic allocation schemes [Prabhakar et al. 1998a, 1998b], Golden Ratio Sequences [Chen et al. 2000], Hierarchical [Bhatia et al. 2000], Discrepancy declustering [=-=Chen and Cheng 2002-=-], and Threshold-Based Declustering [Tosun 2005a, 2005c, 2007b]. Some declustering techniques utilize information about query distribution [Ghandeharizadeh and DeWitt 1990a, 1990b]. Use of combinatori...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University