10 citations found. Retrieving documents...
S. Seshadri, J.F. Naughton, Sampling issues in parallel database systems, in: Proceedings of the EDBT '92, 1992, pp. 328--343.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....the sum of sizes of the reachability sets of the sample elements times the inverse of the sampling proportion. 2. 12.5 Parallel Sampling Seshadri Naughton [NS90, Ses92] discuss the use of stratified and clustered sampling for parallel sampling on a multi processor to estimate selectivities [SN91] They show that simple random sampling is asymptotically inefficient in a parallel environment (as the the number of processors grows) as it leads to heavily skewed workloads everyone waits for the processor with the largest number of samples. They cite a theorem of Gonnet that the limit of ....

S. Seshadri and Jeffrey Naughton. Sampling Issues in Parallel Database Systems. In Proceedings of the Conference on Extending Database Technology, pages 328--343. Springer-Verlag, March 1991.


Random Sampling from Databases - A Survey - Olken, Rotem (1994)   (10 citations)  (Correct)

....of the sample elements times the inverse of the sampling proportion. 9. 5 Parallel Sampling Seshadri and Naughton (in Naughton and Seshadri (1990) and Seshadri (1992) discuss the use of stratified and clustered sampling for parallel sampling on a multi processor to estimate selectivities (see Seshadri and Naughton (1991)) They show that simple random sampling is asymptotically inefficient in a parallel environment (as the the number of processors grows) as it leads to heavily skewed workloads everyone waits for the processor with the largest number of samples. They cite a theorem of Gonnet that the limit of ....

Seshadri, S. and Naughton, J. (1991). Sampling issues in parallel database systems, Proceedings of the Conference on Extending Database Technology, Springer-Verlag, pp. 328--343.


Efficiently Following Object References for Large Object.. - Kenneth Ross (1995)   (Correct)

....obtain their R 2 object identifiers. One would partition the sample, and expect that, for a sufficiently large sample, the partitioning elements of the sample are close to the partitioning elements of the full extension of R 1 . An analysis of this kind of sampling approach is provided in [DWNS91, SN91] We need to balance the I O cost of the initial sampling step with the probability that skew will make one of the partitions in our algorithm too big to fit in main memory in Step 3. Whether or not we perform sampling, we assume a skew factor oe determines the ratio of the size of the largest ....

S. Seshadri and J. F. Naughton. Sampling issues in parallel database systems. (manuscript), 1991.


Overlapping Computations, Communications and I/O in Parallel.. - Clement, Quinn (1994)   (Correct)

....algorithm takes a sample of the data items on the local disk of the processor. Instead of a simple random sample of size ps on a p processor system, a sample consisting of p simple random samples of size s is taken, one from each processor. This technique is known as stratified random sampling [16]. Let ff, for skew, denote the ratio of the maximum number of records sorted by a processor over the average number of records sorted by a processor. Then with N records to be sorted on p processors the maximum number of records to be sorted by one processor is ffN=p. Blelloch et al. 3] proved ....

....processors, 96 of the file will have been read in the sampling phase. This strategy is clearly not suited well to machines with large numbers of processors. Page level stratified sampling (PLSS) has been suggested by Seshadri as a more efficient method of determining pivot values for a data set [16]. By using all of the records in the disk sector or page, equivalent confidence levels in skew values can be obtained using fewer disk I O operations. Seshadri proves that using all of the keys in a sector as sample values will always result in lower skew values than tuple level stratified ....

[Article contains additional citation context not shown here]

Seshadri, S. and Naughton, J. F. . Sampling issues in parallel database systems. In Proceedings of the 3rd International Conference on Extending Database Technology, pages 328--343, March 1992.


Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....the sum of sizes of the reachability sets of the sample elements times the inverse of the sampling proportion. 2. 12.5 Parallel Sampling Seshadri Naughton [NS90, Ses92] discuss the use of stratified and clustered sampling for parallel sampling on a multi processor to estimate selectivities [SN91] They show that simple random sampling is asymptotically inefficient in a parallel environment (as the the number of processors grows) as it leads to heavily skewed workloads everyone waits for the processor with the largest number of samples. They cite a theorem of Gonnet that the limit of ....

S. Seshadri and Jeffrey Naughton. Sampling Issues in Parallel Database Systems. In Proceedings of the Conference on Extending Database Technology, pages 328--343. Springer-Verlag, March 1991.


Bulletin of the Technical Committee on Data Engineering.. - Society, IEEE (1997)   (1 citation)  (Correct)

....by partitioning the data set into mutually disjoint strata and then obtaining a SRS from each stratum. The selected records then form a stratified sample. In a sharednothing parallel DBMS, for example, a stratum might correspond to the records stored at a specified processing node; see [SN92] for a discussion of why, in parallel DBMS s, stratified sampling usually is preferable to simple random sampling. Other types of samples abound [Coc77, DC72, SSW92, Sud76] we focus on simple, cluster, and stratified samples since these are the most common types of reduced data sets found in ....

S. Seshadri and J. F. Naughton. Sampling issues in parallel database systems. In Advances in Database Technology- EDBT '92, 3rd Intl. Conf. Extending Database Technology, Lecture Notes in Computer Science, pages 328--343. Springer-Verlag, 1992.


Parallel Sorting on a Shared-Nothing Architecture using .. - DeWitt, Naughton.. (1992)   (50 citations)  Self-citation (Naughton)   (Correct)

....at any site to the average number of records at a site. If we have k processors, and N records in the file to be sorted, then the average number of records per processor is N=k. Then by definition of s, the maximum number of records at any site is sN=k. By Theorem 7. 1 from Seshadri and Naughton [SN91] if we take a total of kn samples, then the probability p that any processor contains more than sN=k keys is at most p = ke Gamma(1 Gamma1=s) 2 sn=2 Solving this for n gives n = 2 ln(k=p) 1 Gamma 1=s) 2 s (1) as the number of samples required to guarantee a skew of at most s with ....

S. Seshadri and Jeffrey F. Naughton. Sampling issues in parallel database systems. Submitted for publication., June 1991.


Practical Skew Handling in Parallel Joins - DeWitt, Naughton, Schneider.. (1992)   (49 citations)  Self-citation (Seshadri Naughton)   (Correct)

....DBMS algorithms for evaluating nonequijoins [DNS91a] and for parallel external sorting [DNS91b] A theoretical investigation of R(K1,A) S(K2,B) 1,3) 1,1) 2,3) 2,2) 3,3) 3,3) 4,3) 4,4) 4,5) Table 1: Example relations R and S. the performance of sampling based range splitting appears in [SN92] In a two relation join, say R 1 S, the question arises whether an algorithm should attempt to balance the number of R tuples per node, or the number of S tuples per node, or the sum of the R and S tuples per node. The answer is not always clear, but a useful general observation is that an ....

....with each stratum consisting of the set of tuples initially residing at a processor. Within each processor, the sampling was performed using page level extent map sampling. Extent map sampling is described in Section 3. Issues involving stratified sampling and page level sampling are discussed in [SN92] We now describe the skew handling algorithms. 1. Hybrid hash. This is just the basic parallel hybrid hash algorithm (with no modifications for skew handling. A description of this algorithm and some alternatives appears in [SD89] 2. Simple range partitioning. At the top level, this ....

S. Seshadri and Jeffrey F. Naughton. Sampling issues in parallel database systems. In Proceedings of the EDBT Conference, Vienna, Austria, March 1992.


Parallel Bulk-Loading of Spatial Data - Apostolos Papadopoulos Yannis   (Correct)

No context found.

S. Seshadri, J.F. Naughton, Sampling issues in parallel database systems, in: Proceedings of the EDBT '92, 1992, pp. 328--343.


Parallel Bulk-Loading of Spatial Data - Apostolos Papadopoulos Yannis   (Correct)

No context found.

S. Seshadri, J.F. Naughton, Sampling issues in parallel database systems, in: Proceedings of the EDBT '92, 1992, pp. 328--343.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC