7 citations found. Retrieving documents...
Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeao K. Taneja. Statistical Estimators for Relational Algebra Expressions. In Proc. 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 276--287, Austin, March 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Query Execution Techniques for Caching Expensive Methods - Hellerstein, Naughton (1996)   (17 citations)  (Correct)

....of either partitioning (for hashing) or merging (for sorting) DKO 84, Knu73] Our implementation of Hybrid Cache in Illustra includes recursive partitioning to handle this situation. The number of input values v can be estimated by using stored statistics [SAC 79] or via sampling [HOT88, HNSS95] Unfortunately this estimation is subject to error, so we must consider how the algorithm will behave if estimates are imperfect. If v is estimated too high, h will be underestimated i.e. memory will be underutilized during the first two phases of Hybrid Cache by wasting memory on ....

Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeao K. Taneja. Statistical Estimators for Relational Algebra Expressions. In Proc. 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 276--287, Austin, March 1988.


Practical Selectivity Estimation through Adaptive Sampling - Lipton, Schneider (1990)   (98 citations)  (Correct)

....subset of a query answer without computing the full answer. This problem is complementary to size estimation, since an algorithm for size estimation does not imply an algorithm for constructing a random sample, and vice versa. The most closely related work is that of Hou, Ozsoyoglu, and Taneja [HOT88, HOT89] In that work, the emphasis is on the estimation of aggregate queries in real time environments, rather than on query size estimation. The papers present data relating the number of samples to accuracy, but do not explicitly consider timing considerations. A comparison in Sections 4 and 5 ....

....to the tuple size. The next subsection discusses two ways to deal with large c values. tup size est. comp. est. comp. 200 bytes 1.4 9.0 0.156 600 bytes 1.4 26.0 0.054 Table 4: Time vs. tuple size for selectivity tests (k = 3.0, s = 0:10. Comparison with Hou et al. Hou, Ozsoyoglu, and Taneja [HOT88, HOT89] describe another algorithm for estimating selectivities through random sampling. While similar, there are a number of differences between their approach and ours. First, they sample without replacement, whereas we sample with replacement. For the large population sizes we are considering, ....

[Article contains additional citation context not shown here]

Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeao K. Taneja. Statistical estimators for relational algebra expressions. In Proceedings of the Seventh ACM Symposium on Principles of Database Systems, pages 276--287, Austin, Texas, March 1988.


Predicate Migration: Optimizing Queries with Expensive.. - Hellerstein, Stonebraker (1993)   (80 citations)  (Correct)

....card(input(p) and make the assumption that selectivities of different predicates are independent. Typically these estimations are based on default values and system statistics [SAC 79] although recent work suggests that accurate and inexpensive sampling techniques can be used [LNSS93, HOT88] 2.1 Cost of User Defined Functions in POSTGRES In an extensible system such as POSTGRES, arbitrary userdefined functions may be introduced into both restriction and join predicates. These functions may be written in a general programming language such as C, or in the database query language, ....

Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeao K. Taneja. Statistical Estimators for Relational Algebra Expressions. In Proc. 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 276--287, Austin, March 1988.


Query Execution Techniques for Caching Expensive Methods - Hellerstein, Naughton (1996)   (17 citations)  (Correct)

....of either partitioning (for hashing) or merging (for sorting) DKO 84, Knu73] Our implementation of Hybrid Cache in Illustra includes recursive partitioning to handle this situation. The number of input values v can be estimated by using stored statistics [SAC 79] or via sampling [HOT88, HNSS95] Unfortunately this estimation is subject to error, so we must consider how the algorithm will behave if estimates are imperfect. If v is estimated too high, h will be too small i.e. memory will be underutilized during the first two phases of Hybrid Cache. If v is estimated too low, ....

Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeao K. Taneja. Statistical Estimators for Relational Algebra Expressions. In Proc. 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 276--287, Austin, March 1988.


Predicate Migration: Optimizing Queries with Expensive Predicates - Hellerstein (1992)   (80 citations)  (Correct)

....card(input(p) and make the assumption that selectivities of different predicates are independent. Typically these estimations are based on default values and system statistics [SAC 79] although recent work suggests that accurate and inexpensive sampling techniques can be used [LNSS93, HOT88] 2.1 Cost of User Defined Functions in POSTGRES In an extensible system such as POSTGRES, arbitrary user defined functions may be introduced into both restriction and join predicates. These functions may be written in a general programming language such as C, or in the database query language, ....

Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeao K. Taneja. Statistical Estimators for Relational Algebra Expressions. In Proc. 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 276--287, Austin, March 1988.


The Case for Online Aggregation: New Challenges in User.. - Hellerstein   (Correct)

....More than Sampling The concept of trading accuracy for efficiency in a database system is not a new one: a large body of work on database sampling has been devoted to this problem. The sampling work closest in spirit to this paper focuses on returning approximate answers to aggregate queries [HOT88, HOT89] and other relational queries [OR86, Olk93, etc. Online aggregation is different than traditional database sampling in number of ways particularly in its interface, but also in its architecture and statistical methods. In this section we focus on the interface distinctions between sampling ....

Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeao K. Taneja. Statistical Estimators for Relational Algebra Expressions. In Proc. 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Austin, March 1988, pages 276-287.


Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....are attempting to estimate some aggregate property of a set of records, such as the total number of records which satisfy some predicate. CHAPTER 1. INTRODUCTION 6 Thus random sampling is typically used to support statistical analysis of a dataset, either to estimate parameters of interest [HOT88, HOT89, HO91] or for hypothesis testing. See [Coc77] for a classic treatment of the statistical methodology. Applications include scientific investigations such as high energy particle physics experiments. Other applications include quality control and public policy analyses. For example, one ....

....and where the cost in time or money to fully evaluate the query may be excessive. In his dissertation [Mor80] Morgenstein discussed this issue of estimation procedures for various aggregate queries (e.g. COUNT) Sampling procedures were only briefly discussed. More recently, Hou, et al. HOT88, HO91] have discussed the construction of statistical estimators for arbitrary relational expressions for COUNT aggregates. They also envision the application of their methods to real time applications [HOT89] Sampling may also be used to estimate the database parameters used by the query ....

[Article contains additional citation context not shown here]

Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeo K. Taneja. Statistical Estimators for Relational Algebra Expressions. In Proceedings of the Seventh ACM Conference on Principles of Database Systems, pages 288--293, March 1988.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC