40 citations found. Retrieving documents...
P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. In Proc. of SIGMOD Conf., pages 341--350, 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Query Estimation By Adaptive Sampling - Wu, Agrawal, Abbadi (2002)   (2 citations)  (Correct)

....the cdf of the attribute X. Note that, with respect to attribute X, the set of tuples in relation R and the fdf and cdf of X are equivalent; i.e. they carry the same information and one can be derived from the other. The first category of the estimation techniques is tuple sampling (e.g. HS92, HS95, GM98, HHW97, HH99, LNS90] The tuple sampling technique summarizes a relation R by taking uniform samples from the tuples in R. As shown in Figure 1(a) the summarized version of relation R is the sample set r. Intuitively, when a query is posed to the estimator, the estimator logically ....

Peter J. Haas and Arun N. Swami. Sequential sampling procedures for query size estimation. In Proceedings of 1992 ACM SIGMOD international conference on Management of data, pages 341--350, 1992.


Adaptive Index Structures - Tao, Papadias (2002)   (Correct)

....information about data and query distributions. The use of histograms is crucial for effective query optimization, and has received considerable research attention. Existing approaches can be classified into two categories depending on whether they take into account only the data distribution [HS92, IP95, GM98, APR99, WAA01], or also consider the query patterns [CR94, GLR00, BCG01, WAA02] Although our framework can be used with any histogram, for the shake of simplicity and generality, we adopt the equi length method (in fact more sophisticated histograms lead to even better performance) Specifically, the data ....

Haas, P., Swami, A. Sequential Sampling Procedures for Query Size Estimation. ACM S1GMOD, 1992.


Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....scan sampling algorithms may be more efficient due to reduced seek time of sequential vs. random disk reads. While such efficiencies may be insignificant for hashed files, they are potentially significant (e.g. a factor of 3 4) for B tree files. In a subsequent paper, Haas Swami [HS92a, HS92b] developed improved stopping rules for sequential sampling of selectivity estimation. Haas Swami first observed that Lipton, et al. were using apriori bounds for the mean and variance of the population in their stopping rule. Haas Swami therefore suggested estimating the mean and variance for ....

Peter J. Haas and Arun N. Swami. Sequential Sampling Procedures for Query Size Estimation. In ACM SIGMOD International Conference on the Management of Data, pages 341--350, June 1992.


Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....Sequential scan sampling algorithms may be more efficient due to reduced seek time of sequential vs. random disk reads. While such efficiencies may be insignificant for hashed files, they are potentially significant (e.g. a factor of 3 4) for B tree files. In a subsequent paper, Haas Swami [HS92a, HS92b] developed improved stopping rules for sequential sampling of selectivity estimation. Haas Swami first observed that Lipton, et al. were using apriori bounds for the mean and variance of the population in their stopping rule. Haas Swami therefore suggested estimating the mean and ....

Peter J. Haas and Arun N. Swami. Sequential Sampling Procedures for Query Size Estimation. Technical Report RJ 8558, IBM Alamaden, January 1992.


Semantic Cardinality Estimation for Mapping and.. - Nodine, Cherniack..   (Correct)

....taken over some logical underlying domain. Statistical sampling and related techniques are frequently proposed for approximating selectivity and projectivity where the uniform distribution assumption is violated. Such approaches include Hou et.al. HOD91] Lipton et.al. LNS90] Haas and Swami [HS92] and Haas et.al [HNSS95] Histogram techniques [PC84] are also used to improve selectivity estimates. As an alternative to sampling, Sun et al. propose using a regression model to approximate the underlying distribution of the data [SLRD93] Initial results combining statistical sampling ....

Peter J. Haas and Arun N. Swami. Sequential sampling procedures for query size estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 341--350, 1992.


An Efficient Approach for Approximating Multi-dimensional.. - Domeniconi, Gunopulos (2001)   (Correct)

....kernel density estimators to efficiently address the multi dimensional range query selectivity problem. We used Scott s rule for setting the bandwidths. We presented an experimental study that shows performance improvements over traditional techniques for density estimation, including sampling (Haas Swami, 1992), multi dimensional histograms (Poosala Ioannidis, 1997) and wavelets (Vitter et al. 1998) The main advantage of kernel density estimators is that the estimator can be computed very efficiently in one dataset pass, during which we both sample the dataset and approximate the standard deviation ....

Haas, P. J., & Swami, A. N. (1992). Sequential Sampling Procedures for Query Size Estimation. Proc. of the ACM SIGMOD Intern. Conf. on Management of Data.


Query Optimization - Ioannidis (1996)   (17 citations)  (Correct)

....mathematical distribution or a polynomial. Although requiring very little overhead, these approaches are typically inaccurate because 26 most often real data does not follow any mathematical function. On the other hand, those based on sampling primarily operate at run time [OR86, LNS90, HS92, HS95] and compute their estimates by collecting and possibly processing random samples of the data. Although producing highly accurate estimates, sampling is quite expensive and, therefore, its practicality in query optimization is questionable, especially since optimizers need query result size ....

P. Haas and A. Swami. Sequential sampling procedures for query size estimation. In Proc. of the 1992 ACM-SIGMOD Conference on the Management of Data, pages 341--350, San Diego, CA, June 1992.


Approximating Multi-Dimensional Aggregate Range Queries .. - Gunopulos, Kollios.. (2000)   (13 citations)  (Correct)

.... estimators) and most of the existing techniques for estimating the selectivity of multidimensional range queries for real attributes (wavelet transform [35] multi dimensional histogram MHIST [27] one dimensional estimation techniques with the attribute independence assumption, and sampling [13]) We include the attribute independence assumption in our study as a baseline comparison. The experimental results show that we can efficiently build selectivity estimators for multi dimensional datasets with real attributes. Although the accuracy of all the techniques drops rapidly with the ....

P.J. Haas, A.N. Swami. Sequential Sampling Procedures for Query Size Estimation. Proc. of the 1992 ACM SIGMOD, pp. 341-350, June 1992.


The Golden Estimator: Efficient Range Query Estimation - Wu, Agrawal, Abbadi (2000)   (Correct)

....the resulting size of a query. In this paper, we are particularly interested in estimating the size of selection or range queries that are defined over a single attribute of a relational table. Random samples of tuples from the base relation of database can be used for selectivity estimation [HS92, HS95, GM98, HHW97, HH99, LNS90] The AQUA system [GPA 98, GM98, AGPR99] uses random samples of tuples for general purpose query result estimation. The idea to to create a down sized copy of the original relation and run the queries against the down sized copy, which is significantly smaller ....

Peter J. Haas and Arun N. Swami. Sequential sampling procedures for query size estimation. In Proceedings of 1992 ACM SIGMOD international conference on Management of data, pages 341--350, 1992.


AQUA: System and Techniques for Approximate Query.. - Gibbons, Poosala.. (1998)   (7 citations)  (Correct)

.... Other works on incremental maintenance of approximate synopses include [FM83, FM85, WVZT90, HNSS95, AMS96, GMP97b, GP97] Finally, there has been considerable work on sampling based estimation algorithms for use within a query optimizer (e.g. H OT88, H OT89, LN89, LN90, LNS90, H OD91, HS92, LS92, LNSS93, HNSS93, HNS94, LN95, HNSS95, GGMS96] None of this previous work uses the new techniques described in this paper. 9 Conclusions This paper describes the Aqua system, for fast, highly accurate approximate query answers. It is well known that join operators seriously degrade ....

P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 1--11, June 1992.


Aqua Project White Paper - Gibbons, Matias, Poosala (1997)   (2 citations)  (Correct)

....operations with smaller overheads while reporting an approximate min in response to findmin and deletemin operations. These data structures have linear space footprints. The design of sampling based estimation algorithms is a popular area of research [H OT88, H OT89, LN89, LN90, LNS90, H OD91, HS92, LS92, LNSS93, HNSS93, HNS94, LN95, HNSS95, GGMS96] Results in [LNS90, H OD91, HS92, HNS94] and elsewhere demonstrate the practicality of estimation procedures based on sampling by showing that the time taken to compute the estimate is a small fraction of the time taken to compute the actual ....

....and deletemin operations. These data structures have linear space footprints. The design of sampling based estimation algorithms is a popular area of research [H OT88, H OT89, LN89, LN90, LNS90, H OD91, HS92, LS92, LNSS93, HNSS93, HNS94, LN95, HNSS95, GGMS96] Results in [LNS90, H OD91, HS92, HNS94] and elsewhere demonstrate the practicality of estimation procedures based on sampling by showing that the time taken to compute the estimate is a small fraction of the time taken to compute the actual query. Studies of the relative merits of various types of histograms in estimating ....

P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 1--11, June 1992.


Approximating Multi-Dimensional Aggregate Range.. - Gunopulos.. (2000)   (13 citations)  (Correct)

....the density estimator for attributes with finite discrete domains. They include computing multi dimensional histograms [25] 1] 18] 2] using the wavelet transformation [33] 21] SVD [25] 2 or the discrete cosine transform [20] on the data, using kernel estimators [3] and sampling [23] 19] [11]. Density estimator techniques attempt to define a function that approximates the data distribution. Since we must be able to derive the approximate solution to a query quickly, the description of the function must be kept in memory. Further, we may have to answer queries on many datasets, so the ....

.... estimators) and most of the existing techniques for estimating the selectivity of multidimensional range queries for real attributes (wavelet transform [33] multi dimensional histogram MHIST [25] one dimensional estimation techniques with the attribute independence assumption, and sampling [11]) We include the attribute independence assumption in our study as a baseline comparison. The experimental results show that we can efficiently build selectivity estimators for multi dimensional datasets with real attributes. Although the accuracy of all the techniques drops rapidly with the ....

P.J. Haas, A.N. Swami. Sequential Sampling Procedures for Query Size Estimation. In Proc. of the 1992 ACM SIGMOD Intern. Conf. on Management of Data, June 1992.


Optimal Histograms with Quality Guarantees - Jagadish, Poosala, Koudas.. (1998)   (35 citations)  (Correct)

....[GES85] In the database community, the problem has been studied in the field of query optimization and more specifically in the context of selectivity estimation for relational operators. Several techniques have been proposed [MCS88] including histograms [Koo80, SC84, Ioa93, IP95] sampling [OR86, LNS90, HS92], and parametric techniques. Histograms are the most commonly used form of statistics in practice (e.g. they are used in DB2, Oracle, and Microsoft SQL Server) because they incur almost no run time overhead and are effective even with a very small amount of storage space. Several types of ....

P. Haas and A. Swami. Sequential Sampling Procedures for Query Size Estimation. Proceedings of ACM SIGMOD, San Diego, CA, pages 341--


Managing Memory to Meet Multiclass Workload Response Time Goals - Brown, Carey, Livny (1993)   (22 citations)  (Correct)

....system state (a recently changed resident volume) The time in this state is set to a number of transaction completions that provides statistical significance. We currently set it to 50 in all cases, but this length could also be dynamically determined for each class using sampling techniques [Haas 91] If response time goals are being met at the end of 50 completions, then the class is moved to steady state, otherwise new target residencies are set, statistics are reset, and the class moves to transition up or transition down. ffl Steady State: A class enters steady state when its response ....

P. Haas, A. Swami, "Sequential Sampling Procedures for Query Size Estimation," Proc. ACM SIGMOD '92 Conf., San Diego, CA, June 1992.


Data Engineering - December Vol No   (Correct)

....delivers sufficient accuracy for a few nested operators above the retrieval nodes. Sampling techniques for a variety of stored data structures are described in [OlRo89] OlRo90] Ant93] OlRo93] Algorithms and stop rules for sampling estimation of joins and selects are presented in [LiNS90] [HaSw92]. The more operators are involved in a subquery subject to direct sampling estimation, the larger certainty areas can be potentially uncovered. However, restrictions on sample sizes aiming to keep estimation cost lower than execution cost limit the number of nested operators to a few for any ....

P. Haas and A. Swami, "Sequential Sampling Procedures for Query Size Estimation," Proceedings of the ACM SIGMOD Conference, (June 1992).


Selectivity Estimation for Joins Using Systematic Sampling - Banchong Harangsri John   (Correct)

....Good estimates for the cost of database operations are thus critical to the effective operation of query optimisers and ultimately of the database systems that rely on them. This paper proposes a novel sampling based method to improve such cost estimation for the join operation. Most previous work [7, 9, 6, 3, 4] on sampling based methods has focused on simple random sampling (SRS) whereby each unit (tuple) in the population (relation) of interest has an equal chance to be selected in the sample. Simple random sampling can be performed under two distinct regimes. The first is with replacement; that is, ....

....it is simple to implement. The second scheme does not allow replacement; any unit (tuple) already selected can not be selected again. This scheme which we call SRSWOR requires a more sophisticated data structure to do the sampling. The simple random sampling methods proposed in the literature [9, 6, 3] differ from one another primarily in their stopping conditions, i.e. when to stop sampling. Systematic sampling was first proposed by [12] in the context of multidatabase systems; this work made no assumptions about the sortedness of the underlying relations. In this paper, we suggest that a ....

P. J. Haas and A. N. Swami. Sequential Sampling Procedures for Query Size Estimation. In ACM SIGMOD Conference on the Management of Data, pages 341--350, 1992.


Optimization Of Parallel Execution For Multi-Join Queries - Chen, Yu, Wu (1995)   (2 citations)  (Correct)

....assumption is not essential but will simplify our presentation. Also, all tuples are assumed to have the same size. In the presence of certain database characteristics and data skew, we only have to modify the formula for estimating the cardinalities of resulting relations from joins accordingly [20, 23] when applying our join sequence scheduling and processor allocation schemes. Results on the effect of data skew can be found in [27, 51] 3 Determining the Execution Sequence of Joins In this section, we shall propose and evaluate various join sequence heuristics. Specifically, we focus on ....

P. Haas and A. Swami. Sequential Sampling Procedures for Query Size Estimation. Proceedings of ACM SIGMOD, pages 341--350, June 1992.


Query Size Estimation using Systematic Sampling - Banchong Harangsri (1996)   (Correct)

....thus critical to the effective operation of query optimisers and ultimately of the database systems that rely on them. This paper proposes a novel method to improve such cost estimation. There has been a considerable amount of work on the issue of selectivity estimation over one and a half decades [22, 6, 7, 19, 13, 11, 17, 18, 16, 8, 23, 5]. This work can be classified into four categories [23, 5] namely parametric, histogram, curve fitting and sampling. Let us briefly describe each of them; the reader can find more details in the references given above. Parametric The parametric methods [22, 6, 7] are ones which depend upon ....

P. Haas and A. Swami. Sequential Sampling Procedures for Query Size Estimation. In ACM SIGMOD Conference on the Management of Data, pages 341--350, 1992.


Query Size Estimation using Machine Learning - Banchong Harangsri (1996)   (1 citation)  (Correct)

....thus critical to the effective operation of query optimisers and ultimately of the database systems that rely on them. This paper proposes a novel method to improve such cost estimation. There has been a considerable amount of work on the issue of selectivity estimation over one and a half decades [19, 5, 6, 16, 11, 9, 14, 15, 13, 7, 20, 4]. This previous work can be classified into four categories [20, 4] namely non parametric, parametric, sampling and curve fitting. Let us briefly describe each of them; the reader can find more details in the references given above. The non parametric method is table or histogrambased [16, 15] ....

....The method will give accurate query size estimates if the actual data distribution follows the a priori assumption. In reality, data distributions in real databases may not fit well with the assumptions and, consequently, the quality of the size estimates could be unreliable. The sampling method [13, 7] has recently received considerable interest. The accuracy of this method depends upon the size of samples; the higher the sample size, the better the estimation. Given complex queries which consist of several selection and join operations, the method may require a nontrivial amount of time to do ....

P. Haas and A. Swami. Sequential Sampling Procedures for Query Size Estimation. In ACM SIGMOD Conference on the Management of Data, pages 341--350, 1992.


Pattern Discovery In Sequence Databases: Algorithms And.. - Chirn (1997)   (1 citation)  (Correct)

....in general. 2.3.1 Pruning Unlikely Candidates We would like to compare only the most likely candidate patterns with the entire set. The main question from an optimization point of view is which candidates to compare. Our strategy is as follows. We use simple random sampling without replacement [28, 38, 51, 64] to select sample sequences from the set. Consider a candidate pattern P . Let D (a, respectively) denote the number of sequences in the entire set D (the sample A, respectively) that contain P within the allowed number of distance. Let N be the database size and n the sample size; F = D=N and f = ....

P. J. Haas and A. N. Swami, "Sequential sampling procedures for query size estimation," in Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, San Diego, CA, pp. 341--350, June 1992.


One-Pass Wavelet Synopses for Maximum-Error Metrics - Karras, Mamoulis (2005)   (Correct)

No context found.

P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. In Proc. of SIGMOD Conf., pages 341--350, 1992.


Data Mining Techniques for Geospatial Applications - Gunopulos   (Correct)

No context found.

P.J. Haas, A.N. Swami. Sequential Sampling Procedures for Query Size Estimation. In Proc. of the 1992.


An Integrated Method for Estimating Selectivities in a.. - Zhu (1993)   (2 citations)  (Correct)

No context found.

P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. In Proceedings of VLDB, pages 341-- 50, 1992.


Random Sampling from Databases - A Survey - Olken, Rotem (1994)   (10 citations)  (Correct)

No context found.

Haas, P. J. and Swami, A. N. (1992b). Sequential sampling procedures for query size estimation, ACM SIGMOD International Conference on the Management of Data, pp. 341--350.


Random Sampling from Databases - A Survey - Olken, Rotem (1994)   (10 citations)  (Correct)

No context found.

Haas, P. J. and Swami, A. N. (1992a). Sequential sampling procedures for query size estimation, Technical Report RJ 8558, IBM Alamaden.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC