| P. Krishnan, J. S. Vitter, and B. R. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In SIGMOD 1996. |
....of Suffix Tree The suffix tree structure has been used extensively in many areas besides its original indexing function in substring matching. It has been proven to be a very successful model to capture significant patterns in sequences. The substring selectivity estimation in a text database [13, 14, 16] is one of the successful achievements. The objective is to obtain a good estimate for a given substring matching query. A (pruned) suffix tree is built for the entire database where each node is associated with a counter to record the number of occurrences of the corresponding substring. The ....
P. Krishnan, J. Vitter, and B. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. Proc. of ACM SIGMOD, pp. 282-293, 1996.
....nodes using a combination of subtree promotion and leaf node deletion. The precise semantics of operator under the projected model will be precisely defined in Section 5.4.2.1. 68 Selectivity estimation for strings have been studied in literature: 1 dimensional string estimation [KVI96] and its evaluation [JNS99] multi dimensional substring estimation [JKN99] etc. In general, selectivity estimation for XML query is more complicated due to the fact that in a tree shaped query, one needs to take the correlation between paths of the query into consideration. Therefore, an array ....
P. Krishnan, J. S. Vitter, and B. R. Iyer. "Estimating Alphanumeric Selectivity in the Presence of Wildcards". In ACM SIGMOD, Montreal, Quebec, Canada, Jun. 1996.
....One technique for ltering trees out faster is to use selectivity estimation. In [69] McHugh and Widom describe Lorel s cost based query optimizer, which maintains statistics about subpaths of length k, and uses it to infer selectivity estimates of longer path queries. Krishnan et al. [61] described similar techniques for processing query strings containing wildcards, i.e. estimating the number of strings in a database that contain a given query string with wildcards. Other relevant work can be found in [26, 52, 54, 99] Chen et al. 25] generalized the selectivity estimation ....
P. Krishnan, J. S. Vitter, and B. R. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 282-293, 1996.
....tables. Section 5 presents an experimental evaluation of the proposed techniques. Section 6 contains concluding remarks. 2 Related Work Estimating the selectivity of XML path expressions is related to estimating the selectivity of substring predicates, which has been addressed in several papers [KVI96, JNS99, JKNS99, CKKM00] These papers all use variants of the pruned suffix tree data structure. A suffix tree is a trie that stores all the strings in a database and all their suffixes. A pruned suffix tree is a suffix tree in which nodes corresponding to low frequency strings are pruned so that ....
P. Krishnan, Jeffrey Scott Vitter, and Bala Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 282--293, Montreal, Canada, June 1996.
....their techniques do not allow them to accurately estimate the selectivity of twig queries, which are very natural in Lorel. Using our techniques, one could accurately estimate the selectivity of Lorel twig queries, potentially resulting in substantially better physical plans being generated. In [12], the problem of substring selectivity estimation was introduced. An approach based on pruned suffix trees was presented wherein queries are parsed via a greedy strategy into substrings present in the pruned suffix tree, and the selectivities of these substrings are multiplied based on the ....
....is a generalization of the problem of obtaining subpath count estimates when the data and query are both singlepath trees. The analogy between subpaths and substrings leads us to look for a solution to our problem based on the approachknown for estimating substring selectivity. The approach in [11, 12] is to keep statistics about frequently occurring substrings in a summary data structure, the pruned suffix tree (PST) A query string is parsed into pieces contained in the PST and the estimate for the entire query is synthesized from the counts of the parsed components. The natural ....
[Article contains additional citation context not shown here]
P. Krishnan, J. S. Vitter, and B. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 282-- 293, 1996.
....set, and on whether it supports answers that progressively improve the approximation with time. The list of data structures work that could be considered synopsis data structures is extensive. We have described a few of these works in the paper; here we mention several others. Krishnan et al. [KVI96] proposed and studied the use of a compact suffix tree based structure for estimating the selectivity of an alphanumeric predicate with wildcards. Manber [Man94] considered the use of concise signatures to find similarities among files. Broder et al. [BCFM98] studied the use of (approximate) ....
P. Krishnan, J. S. Vitter, and B. Iyer, Estimating alphanumeric selectivity in the presence of wildcards, Proc. ACM SIGMOD International Conf. on Management of Data, June 1996, pp. 282--293.
....set, and on whether it supports answers that progressively improve the approximation with time. The list of data structures work that could be considered synopsis data structures is extensive. We have described a few of these works in the paper; here we mention several others. Krishnan et al. [KVI96] proposed and studied the use of a compact suffix tree based structure for estimating the selectivity of an alphanumeric predicate with wildcards. Manber [Man94] considered the use of concise signatures to find similarities among files. Broder et al. [BCFM98] studied the use of (approximate) ....
P. Krishnan, J. S. Vitter, and B. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 282--293, June 1996.
....pruned count suffix trees directly. We then present two techniques to obtain good estimates for a given multi dimensional substring matching query, using a pruned countsuffix tree. The first one, called GNO (for Greedy Non Overlap) generalizes the greedy parsing suggested by Krishnan et al. [9] for one dimensional substring selectivity estimation. The second one, called MO (for Maximal Overlap) uses all maximal multi dimensional substrings of the query for estimation; these multi dimensional substrings help to capture the correlation that may exist between strings Supported in part ....
....of the query as a product of the selectivity of each individual dimension can be grossly inaccurate. In this paper, we study the problem of multidimensional substring selectivity estimation, and make the following contributions: ffl We propose a novel generalization of 1 D countsuffix trees [9], referred to as a k D count suffix tree, as the basic data structure for solving the problem (Section 2) Our trees can handle not only substring matches, but also prefix, suffix and exact matches. ffl Given the enormous size of these trees for large databases and for multiple dimensions, it is ....
[Article contains additional citation context not shown here]
P. Krishnan, J. S. Vitter, and B. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 282--293, 1996.
....query, called MO (for Maximal Overlap) that estimates the selectivity of a query based on all maximal substrings of the query in the pruned count suffix tree. We show that MO is provably better than the (independence based) substring selectivity estimation technique proposed by Krishnan et al. [6], called KVI, under the natural assumption that strings exhibit the so called short memory property. We complement our analysis with an experiment, using a real AT T data set, that demonstrates that MO is substantially superior to KVI in the quality of the estimate. Finally, we develop and ....
....which is a trie that satisfies the following property: whenever a string ff is stored in the trie, then all suffixes of ff are stored in the trie as well. Given a substring query oe, one can locate all the desired matches using the suffix tree. To estimate substring selectivity, Krishnan et al. [6] considered two variations of the suffix tree: i) a count suffix tree, which maintains a count, C ff , for each substring ff in the tree; and (ii) a pruned count suffix tree, which retains only those substrings ff (and their counts) for which C ff exceeds some prune threshold p. The substring ....
[Article contains additional citation context not shown here]
P. Krishnan, J. S. Vitter, and B. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 282--293, 1996.
No context found.
P. Krishnan, J. S. Vitter, and B. R. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In SIGMOD 1996.
....of the tag tag pairs exactly if there is enough memory. Otherwise, pruning or aggregation techniques in [1] can be used. For the counts of tag value pairs, we use the bucketing technique described next. Storing the counts of the tag value pairs efficiently is similar to the problem addressed in [12]; however, we adopt a simpler approach, because we found empirically that most of the probability mass is concentrated in a very small number of tag value pairs. This suggests an approach similar to [17] for storing the counts of tag value pairs. 1. For the top k tag value pairs with the largest ....
P. Krishnan, J. S. Vitter, and B. R. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In H. V. Jagadish and I. S. Mumick, editors, SIGMOD 1996.
....of the tag tag pairs exactly if there is enough memory. Otherwise, pruning or aggregation techniques in [1] can be used. For the counts of tag value pairs, we use the bucketing technique described next. Storing the counts of the tag value pairs eciently is similar to the problem addressed in [12]; however, we adopt a simpler approach, because we found empirically that most of the probability mass is concentrated in a very small number of tag value pairs. This suggests an approach similar to [17] for storing the counts of tag value pairs. 1. For the top k tag value pairs with the largest ....
P. Krishnan, J. S. Vitter, and B. R. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In H. V. Jagadish and I. S. Mumick, editors, SIGMOD 1996.
....[6, 11] Almost all previous work dealt with the estimation of numeric selectivity, i.e. the predicate P contains only numerical variables. The more general problem of estimating alphanumeric selectivity has attracted attention only very recently, due to the work of Krishnan, Vitter and Iyer [4, 5]. Since alphanumeric data are quite different from numeric data, many known techniques for estimating numeric selectivity are no longer suitable for the more general case. In [4, 5] Krishnan, Vitter and Iyer proposed to use suffix trees to estimate alphanumeric selectivity for the case when only ....
....alphanumeric selectivity has attracted attention only very recently, due to the work of Krishnan, Vitter and Iyer [4, 5] Since alphanumeric data are quite different from numeric data, many known techniques for estimating numeric selectivity are no longer suitable for the more general case. In [4, 5], Krishnan, Vitter and Iyer proposed to use suffix trees to estimate alphanumeric selectivity for the case when only one individual column is involved (we call this onecolumn estimation) In practice, a query usually involves many columns. A survey of DB2 (relational database system from IBM) ....
[Article contains additional citation context not shown here]
P. Krishnan, J. S. Vitter, and B. Iyer. Estimating Alphanumeric Selectivity in the Presence of Wildcards, Proceedings of SIGMOD 1996, 282--293.
No context found.
Krishnan P., Vitter J., Iyer B.: Estimating Alphanumeric Selectivity in the Presence of Wildcards. SIGMOD Conf. (1996) 282-293
No context found.
P. Krishnan, J. S. Vitter, and B. R. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In on Management of Data, pages 282---293, 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC