9 citations found. Retrieving documents...
M. Astrahan, M. Schkolnick, and K.-Y. Whang. Approximating the number of unique values of an attribute without sorting. Information Systems, 12(1):11-- 15, 1987. 147

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....the estimation of the size of projections via sampling. Approaches to the problem vary depending on whether there is an index available on some or all of the projection attributes: 1. Do the projection, then count. 2. Scan the relation, doing probabilistic counting [WVZT90, FM85, Fla85, FM83, ASW87] 3. Sample the base relation; for each element of the sample find the number of the records in the base relation which match the sampled record on the projection attributes, call this jx i j. Estimate the size of the projection as the estimated average contribution of each element of the base ....

M. Astrahan, M. Schkolnick, and K.-Y. Whang. Approximating the number of unique values of an attribute without sorting. Information Systems, 12(1):11-- 15, 1987. 147


Online Prediction Algorithms for Databases and Operating Systems - Krishnan (1995)   (7 citations)  (Correct)

....quite different. Query optimizers have cost models that estimate the access cost as a function of the predicted number of qualifying rows and find the cheaper alternative. Models already exist in current day relational database management systems (RDBMSs) to predict selectivity for numeric fields [ASW, FlM, Iye, SAC, WVT]. With the popularity of textual data being stored in RDBMS, it has become important to predict the selectivity accurately even for alphanumeric fields. A particularly problematic predicate used against alphanumeric fields is the like predicate [Iye] For example, consider the inventory of a ....

....phase must be minimal. In Chapter 11 we present our techniques for predicting selectivity for the like predicate; i.e. techniques for estimating alphanumeric selectivity. 10.1 Background and Related Work Models already exist in current day RDBMSs to predict selectivity for numeric fields [ASW, FlM, Iye, SAC, WVT]. Typically, in the preprocessing phase, a few numbers that capture the distribution of data are accumulated and stored in the catalog. In the earlier example dealing with salaries, the RDBMS would perform an analysis of the data in the salary field of the database, and select a small set of ....

[Article contains additional citation context not shown here]

M. M. Astrahan, M. Schkolnick, and K.-Y. Whang, "Approximating the Number of Unique Values of an Attribute Without Sorting ," Information Systems 12 (1987), 11--15.


RainForest - A Framework for Fast Decision Tree.. - Gehrke, Ramakrishnan.. (1998)   (26 citations)  (Correct)

....I . If parent of node splits on I we know the size of I s AVC set at node exactly) Even though this approach usually overestimates the sizes of AVC groups, it worked very well in practice. There are algorithms for the estimation of the number of distinct values of an attribute ( ASW87, HNSS95] we intend to explore their use in future research. 5 Experimental results In the machine learning and statistics literature, the two main performance measures for classification tree algorithms are: i) The quality of the rules of the resulting tree, and (ii) The decision tree ....

M.M. Astrahan, M. Schkolnick, and K.-Y. Whang. Approximating the number of unique values of an attribute without sorting. Information Systems, 12(1):11--15, 1987.


Random Sampling from Databases - A Survey - Olken, Rotem (1994)   (10 citations)  (Correct)

....on whether there is an index available on some or all of the projection attributes: 1. Do the projection, then count. 2. Scan the relation, doing probabilistic counting (see Whang, Vander Zanden and Taylor (1990) Flajolet and Martin (1985) Flajolet (1985) Flajolet and Martin (1983) and Astrahan, Schkolnick and Whang (1987)) 3. Sample the base relation; for each element of the sample find the number of the records in the base relation which match the sampled record on the projection attributes, call this jx i j. Estimate the size of the projection as the estimated average contribution of each element of the base ....

Astrahan, M., Schkolnick, M. and Whang, K.-Y. (1987). Approximating the number of unique values of an attribute without sorting, Information Systems 12, 11--15.


RainForest - A Framework for Fast Decision Tree.. - Gehrke, Ramakrishnan.. (1998)   (26 citations)  (Correct)

....attribute a. If parent p of node n splits on a we know the size of a s AVC set at node n exactly) Even though this approach usually overestimates the sizes of AVC groups, it worked very well in practice. There are algorithms for the estimation of the number of distinct values of an attribute ( ASW87, HNSS95] we intend to explore their use in future research. 5 Experimental results In the machine learning and statistics literature, the two main performance measures for classification tree algorithms are: i) The quality of the rules of the resulting tree, and (ii) The decision tree ....

M.M. Astrahan, M. Schkolnick, and K.-Y. Whang. Approximating the number of unique values of an attribute without sorting. Information Systems, 12(1):11--15, 1987.


Estimating Alphanumeric Selectivity in the Presence of.. - Krishnan, Vitter, Iyer (1996)   (11 citations)  (Correct)

....on accurate cost estimation of various query reorderings [BGI] Estimating predicate selectivity, or the fraction of rows in a database that satisfy a selection predicate, is key to determining the optimal join order. Previous work has concentrated on estimating selectivity for numeric fields [ASW, HaSa, IoP, LNS, SAC, WVT]. With the popularity of textual data being stored in databases, it has become important to estimate selectivity accurately for alphanumeric fields. A particularly problematic predicate used against alphanumeric fields is the SQL like predicate [Dat] Techniques used for estimating numeric ....

....consulted to estimate selectivity; the processing in the query optimization phase must be minimal. Further, the space available in the metadata descriptors for any one column of the database is limited. Models already exist in current day relational DBMS to estimate selectivity for numeric fields [ASW, HaSa, IoP, LNS, SAC, WVT]. Typically, in the runstats phase, a few numbers that capture the distribution of data are accumulated and stored in the metadata, as histograms, for example. The problem of estimating alphanumeric selectivity is a natural extension to the problem of estimating numeric selectivity: the like ....

M. M. Astrahan, M. Schkolnick, and K. Y. Whang, "Approximating the Number of Unique Values of an Attribute Without Sorting," Inf. Sys. 12 (1987), 11--15.


On Adaptive Sampling - Flajolet (1990)   (1 citation)  (Correct)

....We analyze the storage accuracy trade off of an adaptive sampling algorithm due to Wegman that makes it possible to evaluate probabilistically the number of distinct elements in a large file stored on disk. 1 Introduction A problem that naturally arises in query optimization of data base systems [1] is to estimate the number of distinct elements (also called cardinality) of a large collection of data with unpredictable replications. The trivial solution that consists in building a list of distinct elements is usually too much resource consuming both in terms of storage and processing time ....

....in terms of processing time and of conceptual simplicity. It is also totally free of non linearities when estimating the cardinalities of small files, a feature that may prove useful in several aplications. In contrast, Probabilistic Counting in only asymptotically unbiased. Astrahan et al. [1] report on their experience with implementing Probabilistic Counting and Adaptive Sampling in the context of IBM s database system R. In terms of processing time, these probabilistic algorithms typically outperform standard sorting methods by a factor of about 8. In terms of storage consumptions, ....

M. M. Astrahan, M. Schkolnick, and K-Y. Whang. Approximating the number of unique values of an attribute without sorting. Information Sciences, 12:11--15, 1987.


Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....the estimation of the size of projections via sampling. Approaches to the problem vary depending on whether there is an index available on some or all of the projection attributes: 1. Do the projection, then count. 2. Scan the relation, doing probabilistic counting [WVZT90, FM85, Fla85, FM83, ASW87] 3. Sample the base relation; for each element of the sample find the number of the records in the base relation which match the sampled record on the projection attributes, call this jx i j. Estimate the size of the projection as the estimated average contribution of each element of the base ....

M. Astrahan, M. Schkolnick, and K.-Y. Whang. Approximating the number of unique values of an attribute without sorting. Information Systems, 12(1):11-- 15, 1987. BIBLIOGRAPHY 148


Histogramm-Based Estimation Techniques in Database Systems - Poosala (1997)   (2 citations)  (Correct)

No context found.

Morton M. Astrahan, Mario Schkolnick, and Kyu-Young Whang. Approximating the number of unique values of an attribute without sorting. Information Systems, 12(1):11--15, 1987.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC