| S. D. Lee, David W. Cheung, and Ben Kao. Is sampling useful in data mining? a case in the maintenance of discovered association rules. Data Mining and Knowledge Discovery, 2(3):233--262, 1998. |
....algorithm. Second, we are currently studying how to improve the whole process of incremental mining. According to experiments we would like to nd out some measures which can suggest to us when Ise should be applied to nd out the new frequent sequences in the updated database. In other context [9], such an approach is proposed and is based on a sampling technique in order to estimate the di erence between the old and new association rules. We are currently investigating if when analysing the data distribution of the original database we could nd out other measures. ....
S.D. Lee, D.W. Cheung, and B. Kao. Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rule. Data Mining and Knowledge Discovery, 2(3):233 262, 1998.
....the population and the sample, the Chernoff bounds [3] gives jsj 1 2 2 ln 2 ffi (2.6) such that Pr(e(X,s) ffi for a certain error bound . Based on this result, we can get the sample size s with a reasonable error bound. The other example can be taken from the S.D. Lee et al. [36], whichis applied in the data maintenance. Given a database with transactions jDj and oe x transactions containing itemset X, the probability of the transactions selected from that database is p x = oe x jDj . Let m be the number of sample transactions Chapter 2 Literature Survey on Data ....
....distribution. The width of the confidence interval is Z ff=2 q oe x (jDj;oex ) m . By their assumption, this width is no more than 2jDjZ ff=2 s s (1 ; s ) m (2.8) where s is the support threshold. So, if wewant the width of the interval to be less than jDj Thetas 5 , as quoted from [36], the following inequality can be establish 2jDjZ ff=2 s s (1 ; s ) m jDj Thetas 5 (2.9) Thus m, the sample size, can be found. From this size, we can get the sample from the new database, which is part of the DELI algorithm. What can we do after getting the sample size There are ....
[Article contains additional citation context not shown here]
S.D. Lee, David W. Cheung, and Ben Kao. Is sampling useful in data mining? a case in the maintenance of discovered association rules. Technical report, Department of Computer Science, the University of Hong Kong, 1996.
....[15] ffl Dynamic nature of some large databases will affect discovered knowledge. For example, new data may be added from time to time and as a result, some existing knowledge would become invalid. The existing knowledge must be updated and the whole database must be used to obtain new knowledge [22]. ffl Association rule algorithms require multiple passes over the whole database, and subsequently, database size is by far the most influential factor [26] 4 Approaches for Scaling up data mining As discussed above, the need for scaling up data mining is a natural consequence of the meager ....
....loci, all of the alleles in the base collection were captured. 5.2 Sampling for mining association rules Sampling can speed up the mining process by more than an order of magnitude by reducing I O cots and drastically shrinking the number of transactions to be considered. The work of Lee et al. [22] and Zaki et al. [29] clearly show that sampling of transactions from the large database is an effective method for finding association rules. Zaki et al. [29] used random sampling to find association rules. They empirically tested in three large databases and convincingly showed the usefulness of ....
[Article contains additional citation context not shown here]
S.D.Lee, D.W. Cheung, and B. Kao. Is sampling useful in data mining? a case in the maintenance of discovered association rules. Data Mining and Knowledge Discovery, 2(3):233--262, 1998.
No context found.
S. D. Lee, David W. Cheung, and Ben Kao. Is sampling useful in data mining? a case in the maintenance of discovered association rules. Data Mining and Knowledge Discovery, 2(3):233--262, 1998.
No context found.
Lee, S.D., Cheung, D.W., Kao, B.: Is sampling Useful in Data Mining ? A Case in the Maintenance of Discovered Association Rules, Data Mining and Knowledge Discovery, 1998, pp. 233-262.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC