| F. Olken and D. Rotem. Random sampling from databases: A survey. Statistics & Computing, 5(1):25--42, Mar. 1995. 83 |
....in non Euclidean similarity search. If updates are performed online, we may choose to allow the centroids to diverge from their true values using ACCURATE. 5.2. Sampling Sampling scans are beginning to appear in commercial database systems. Sampling has many financial and scientific applications [OLKE95]. It can also be used to improve response time for decision support queries that do not need completely accurate answers [INFO97a] Sampling access methods support some type of randomized probing operation. Augmented trees can be used, as can some variation of acceptance rejection (A R) sampling, ....
F. Olken and D. Rotem, "Random Sampling from Databases: A Survey," Statistics and Computing 5,1 (Mar. 1995), 25-42.
....the previous work in this area is discussed more thoroughly in Section 3 and Appendix B. The methods described in this paper makes the I O budget explicit and then make the resulting imprecision explicit as well. 6.4. Sampling Database sampling takes many forms and has many applications; see [OLKE95] and [BARB97, Sec. 9] for recent surveys. However, we have already described the most relevant index assisted sampling literature in Section 3. 21 6.5. Tree condensation Tree condensation always reduces the mean root to leaf path length and eliminates any dead ends in the condensed region. ....
F. Olken and D. Rotem, "Random Sampling from Databases: A Survey," Statistics & Computing 5, 1 (Mar. 1995), 25-42.
....operations, storing these objects at higher levels of the index structure, as exploited in [5] is a good idea. This property is also utilized in this paper, but without the need for sampling which is sometimes unreliable due to the skew in the distributions of the objects and or object sizes [6]. Our technique, called D tree, uses uniform node sizes, but still can maintain a maximum fanout at each tree node. In addition, we want to improve on another aspect of Segment R tree, namely, its sensitivity to the insertion order of the data items. Depending on the insertion order, optimal data ....
Frank Olken and Doron Rotem. Random sampling from databases: A survey. Statistics and Computing, 5:25--42, 1995.
....us methods for establishing a sample size to achieve predetermined accuracy of the estimates. It is then possible to supplement our estimates with confidence intervals. For more detailed discussion on sampling from databases the reader is referred to the literature on the topic (see, for example, [12] for a good survey) Note that two different populations must be sampled. To estimate soundness we sample the given (stored) view, whereas to estimate completeness, we sample the ideal view. To establish both soundness and completeness it is necessary to have access to the ideal database. For ....
F. Olken and D. Rotem. Random sampling from databases---a survey. Statistics and Computing, 5(1), 1995.
....for establishing a sample size needed to achieve a predetermined accuracy of the estimates. It is then possible to supplement our estimates with confidence intervals. For more detailed discussion on sampling from databases the reader is referred to the literature on the topic (see, for example, [Olken, Rotem, 1995] for a good survey) Note that two different populations must be sampled. To estimate soundness we sample the stored database, whereas to estimate completeness we sample the true database. To establish both soundness and completeness it is necessary to have access to the true database. For ....
F. Olken and D. Rotem. Random sampling from databases--- a survey. Statistics and Computing, 5(1), 1995.
....other aggregates must be computed from the base data. Space and time constraints can be prohibitive for pre computing all results, while computing aggregates from scratch results in long response times. In this case, an attractive alternative is the use of sampling techniques to answer the queries [24]. Using sampling, only a small sample of the available data is read and used to estimate the result of the query. This can produce very fast response times, while maintaining a relatively high degree of accuracy for the result. It has been shown that summarizability is equivalent to the ....
F. Olken, D. Rotem. Random Sampling from Databases - A Survey. Statistics & Computing, 5(1):25-- 42, March 1995.
....used in query size estimation. This may assist the user by providing approximate answers. Such information can be used for statistical analyses of databases, where approximate answers would suffice. It may also be used to estimate selectivities or intermediate result sizes for query optimization [8]. The application of sampling for mining association rules was suggested in [7] In this paper we extend their suggestion by evaluating the effectiveness of sampling in practice. 3.1 Binomial Distribution A Bernoulli trial is an experiment with only two outcomes. Namely, success, which occurs ....
F. Olken and D. Rotem. Random sampling from databases - a survey. In Draft. ICS Div, Lawrence Berkeley Lab., Mar. 1994.
....management system that support statistical (and OLAP) data. There was quite a bit of work in this area, that include random sampling from relational databases, from hash files, from B trees, from spatial databases etc. The authors of many of these papers provided a good summary of this work in [OR95] . It seems that except for special operations where efficiency can improve greatly, data management systems and statistical packages will continue their independent existence. Therefore, clean interfaces between them is the key to future integration of these technologies. 5.7 Support for ....
Olken, F. and Rotem, D., Random Sampling from Databases - A Survey," (invited paper), Statistics & Computing, March 1995, vol.5, no.1, pages 25-42. (many more reference in http://www.lbl.gov/~olken/sampling.html)
No context found.
F. Olken and D. Rotem. Random sampling from databases: A survey. Statistics & Computing, 5(1):25--42, Mar. 1995. 83
No context found.
F. Olken and D. Rotem. Random sampling from databases - a survey. In Statistics and Computing (invited paper), pages 25--42, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC