| J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985. 36 |
....a position at random from f1; ng. After each new insertion, we would like to replace each sample point by the next point independently with probability , without incurring the Theta(s) time per insert. For this, we use the following skipping approach employed in reservoir sampling [Vit85] Considering each i as its own reservoir sample of size 1, the skipping approach computes the next random position that would succeed in replacing the current point. That is, if Pos[i] n, then this position is replaced by position n 1 with probability , by position n 2 with probability ....
....error with high probability if the random sample has size at least cn =B, for a constant c 3 determined by the desired accuracy and confidence. Note that random samples of each relation can be maintained incrementally with small overheads as new data is inserted or deleted into the relation [Vit85, GMP97] and hence one can track join sizes in O(n =B) memory words using this approach. 4.2 Lower bounds on signature schemes for join size estimation We prove that, to within constant factors on the signature size, the simple sampling algorithm in the previous subsection cannot be improved ....
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....directly on the stream, that is, in one pass over the data in the fixed order of arrival; this requirement renders conventional approximate query processing tools inapplicable in a data stream setting. Note that, even though random sample data summaries can be easily constructed in a single pass [23], it is well known that such summaries typically give very poor result estimates for queries involving one or more joins [1, 6, 2] Our Contributions. In this paper, we tackle the hard technical problems involved in the approximate processing of complex (possibly multi join) aggregate ....
J.S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1), 1985.
....Adaptive compression algorithms attune themselves gradually to changes in the redundancies within a le by modifying parameters used by the algorithm, such as the dictionary, during execution. For example, adaptive alphabetic distribution based algorithms such as dynamic Hu man encoding [Vit89] maintain a tree structure to minimize the encoded length of the most frequently occurring characters. This property can be made to change continuously as a le is processed. An adaptive textual substitution algorithm is Lempel Ziv compression, a title which refers to two distinct variants of a ....
....coding. We should note that in earlier implementations of the heterogeneous compressor we used a dynamic Hu man algorithm instead of arithmetic coding. We changed our implementation with we found that the WittenNeal Cleary algorithm [WNC87] outperformed our implementation of dynamic Hu man coding [Vit89, Toa90] in both space savings and execution time. Run length encoding (RLE) algorithms compress data by replacing contiguous occurrences of a single unit symbol (either bit or byte) by an eciently coded count of these runs, usually a single occurrence of the symbol and the number of occurrences. We have ....
[Article contains additional citation context not shown here]
Jerey S. Vitter. Dynamic Human Coding. ACM Transactions on Mathematical Software, June 1989.
....and ensures that the input data set fits in main memory. With an appropriate sample size, the quality of the clustering is not sacrificed. On the contrary, random sampling aids clustering by filtering outliers. Efficient algorithms for selecting random samples from a database can be found in [Vit85], and we do not discuss them here. Note that the salient feature of our clustering method is not sampling but the clustering algorithm that utilizes links instead of distances. Handling Outliers: In our clustering scheme, outliers can be handled fairly effectively. The first pruning occurs when ....
Jeff Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985. 23
....the samples in one pass through the relation and to maintain the samples without accessing the stored relation. Constructing using a data cube. Biased samples of a target size for each group can be constructed for all groups in one pass through the relation, using independent reservoir samplings [Vit85] for each group. Given a data cube of the counts of each group in all possible groupings, the target sizes are known, and any of our biased samples can be constructed in one pass. However, in the absence of a data cube, we can still construct our biased samples in one pass, by applying the ....
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....into the data set. Since the number of sample points provided by a concise sample depends on the data distribution, the problem of maintaining a concise sample as new data arrives is more difficult than with traditional samples. For traditional samples, the reservoir sampling algorithm of Vitter [Vit85] can be used to maintain a sample in the presence of insertions of new data (see Section 5.1 for details) However, this algorithm relies heavily on a priori knowledge of the target sample size (which, for traditional samples, equals the footprint divided by log n) With concise samples, the ....
....algorithm requiring m D I O operations for its sampling into an algorithm that potentially requires no I O operations. Maintaining backing samples. A uniform random sample of a target size m can be maintained under insertions to the data set using Vitter s reservoir sampling technique [Vit85] The algorithm proceeds by inserting the first m items into a reservoir. Then a random number of new items are skipped, and the next item replaces a randomly selected item in the reservoir. Another random number of items are then skipped, and so forth. The distribution function of the length ....
J. S. Vitter, Random sampling with a reservoir, ACM Transactions on Mathematical Software 11 (1985), no. 1, 37--57.
....into the data set. Since the number of sample points provided by a concise sample depends on the data distribution, the problem of maintaining a concise sample as new data arrives is more difficult than with traditional samples. For traditional samples, the reservoir sampling algorithm of Vitter [Vit85] can be used to maintain a sample in the presence of insertions of new data (see Section 5.1 for details) However, this algorithm relies heavily on a priori knowledge of the target sample size (which, for traditional samples, equals the footprint divided by log n) With concise samples, the ....
....algorithm requiring m D I O operations for its sampling into an algorithm that potentially requires no I O operations. Maintaining backing samples A uniform random sample of a target size m can be maintained under insertions to the data set using Vitter s reservoir sampling technique [Vit85] The algorithm proceeds by inserting the first m items into a reservoir. Then a random number of new items are skipped, and the next item replaces a randomly selected item in the reservoir. Another random number of items are then skipped, and so forth. The distribution function of the length ....
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....changes. This is very useful in Aqua because it enables us to implement various novel operators needed in our research very easily. For example, we implemented a sample operator, which samples its input stream and outputs randomly chosen tuples, either a fixed number (using reservoir sampling [Vit85] or a desired fraction of the input stream. Clearly, this is very beneficial in the study of sampling techniques in Aqua. The query processing engine is used for producing both approximate answers and exact answers. 3.2 Aqua synopses and their maintenance The basic Aqua query processing ....
....a group of its size. Since each group is its own uniform random sample, we have considerable flexibility in deciding sample rates (e.g. we need not be fair) If we wish to maintain a constant total sample size n, divided evenly among the (unknown number of) groups, we perform reservoir sampling [Vit85] on each group such that if we have observed g groups, we maintain a target sample size of n=g for each group. When a new group appears, we decrease the target sample size and (lazily) evict random tuples from each existing group. If the number of groups becomes large, we may wish to keep track ....
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....Huffman coding was first conceived independently by Faller and Gallager [1, 2] Knuth [3] proposed an efficient implementation to generate such codes. The resulting algorithm is referred as the FGK algorithm. Another version for the adaptive Huffman coding is the algorithm described by Vitter [12, 13]. Both algorithms require O(l) time for each encoding and decoding operation, where l is the current length of the codeword assigned to the symbol to be transmitted. An important issue related to coding schemes is their compression loss. A simple measure ffl of the compression loss due to a ....
Vitter, J. S., Algorithm 673: Dynamic Huffman coding, ACM Transactions on Mathematical Software, vol 15, no 2(1989), 158-167.
....accessing R very infrequently (R is accessed only when an update sequence deletes about half the tuples in R) The algorithm is presented next. Let S be a backing sample of target size n maintained for a relation R. We first consider insertions to R. We use Vitter s reservoir sampling technique [Vit85] The algorithm proceeds by inserting the first n tuples into a reservoir. Then a random number of new tuples are skipped, and the next tuple replaces a randomly selected tuple in the reservoir. Another random number of tuples are then skipped, and so forth. The distribution function of the ....
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....with high probability if the random sample has size at least cn 2 =B, for a constant c 3 determined by the desired accuracy and confidence. Note that random samples of each relation can be maintained incrementally with small overheads as new data is inserted or deleted into the relation [Vit85, GMP97b] and hence one can track join sizes in limited storage using this approach. 4.2 Lower bounds on signature schemes for join size estimation We prove that, to within constant factors on the signature size, the simple sampling algorithm in the previous subsection cannot be improved ....
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....brought down to N O(1) registers ( 9] There are numerous variations to the basic problem. In the basic version we have as an input the list of symbols with their frequencies. However, we do not always have a possibility to construct such a list. In this case we use dynamic Huffman coding (cf. [13, 14]) Even more generalized version of the problem is when we do not have the list of symbols either (cf. 7] Yet another version of the problem is to construct Huffman codes that must be shorter than some predefined constant (cf. 8] The solution presented in this paper can be applied to all ....
....programming, which takes O(l 02 log l 0 ) time. In the worst, and very pathological, case, l can be as large as N . The described procedure is used for constructing and mutating of a static Huffman tree. When considering a dynamic Huffman tree we use the structure presented by Vitter in [13, 14] and mutate typedef struct f tCodeLength shiftBits; tNumNodes blacks; tNumNodes grays; tCodeLength upper; tCodeLength lower; tNumNodes firstOnLevel; g tIndx; static const tIndx indx[ static const tCodeLength startAt= tNumNodes Decode ( f tCodeLength code= 0; bool goUp= false; ....
J.S. Vitter. Algorithm 673, Dynamic Huffman coding. ACM Transactions on Mathematical Software, 15(2):158--167, June 1989.
....data is inserted into the data warehouse. Since the number of sample points provided by a concise sample depends on the data distribution, the problem of maintaining a concise sample as new data arrives is more difficult than with traditional samples. The reservoir sampling algorithm of Vitter [Vit85] that can be used to maintain a traditional sample in the presence of insertions of new data (see [GMP97b] for extensions to handle deletions) relies heavily on the fact that we know in advance the sample size (which, for traditional samples, equals the footprint size) With concise samples, ....
....on the desired decrease in the footprint. Note that instead of flipping a coin for each insert into the data warehouse, we can flip a coin that determines how many such inserts can be skipped before the next insert that must be placed in the sample (as in Vitter s reservoir sampling Algorithm X [Vit85] the probability of skipping over exactly i elements is (1 Gamma 1= i Delta (1= As gets large, this results in a significant savings in the number of coin flips and hence the update time. Likewise, since the probability of evicting a sample point is typically small (i.e. 0 = ....
[Article contains additional citation context not shown here]
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....improvements in execution times for CURE can be realized. Also, random sampling can improve the quality of clustering since it has the desirable effect of filtering outliers. Efficient algorithms for drawing a sample randomly from data in a file in one pass and using constant space are proposed in [Vit85]. As a result, we do not discuss sampling in any further detail, and assume that we employ one of the well known algorithms for generating the random sample. Also, our experience has been that generally, the overhead of generating a random sample is very small compared to the time for performing ....
Jeff Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....improvements in execution times for CURE can be realized. Also, random sampling can improve the quality of clustering since it has the desirable effect of filtering outliers. Efficient algorithms for drawing a sample randomly from data in a file in one pass and using constant space are proposed in [Vit85]. As a result, we do not discuss sampling in any further detail, and assume that we employ one of the well known algorithms for generating the random sample. Also, our experience has been that generally, the overhead of generating a random sample is very small compared to the time for performing ....
Jeff Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985.
....T for the sample. The sample data distribution is then used as an estimate of the real data distribution. To obtain the random sample in a single linear pass, the method of choice is the skip based method [24] when the number of tuples T is known beforehand or the reservoir sampling variant [23] when T is unknown. A running up to date sample can be kept using a backing sample approach [4] We do not consider in this paper the issues dealing with sample size and the er2 rors caused by sampling. Our experiments confirm that wavelet based histograms that use random sampling as a ....
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, March 1985.
No context found.
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, 1985. 36
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC