| C. Glymour, D. Madigan, D. Preigbon, and P. Smyth. Statistical Inference and Data Mining. Journal of the CACM, 39(11):35--41, 1996. |
....dealt with in the proceedings of the International Conference on Knowledge Discovery and Data Mining series (the two most recent proceedings being [12] and [1] and the journal Data Mining and Knowledge Discovery. Papers discussing the relationship between statistics and data analysis include [8], 4] and [10] 5. ....
Glymour C., Madigan D., Pregibon D., and Smyth P. (1996) Statistical inference and data mining. Communications of the ACM, 39, 35-41.
....(OLAP) GBLP96, HRU96, LS97, Sho97, CD97] The real challenge of this sort of data is caused by the rather intricate semantics of summary values, that is not handled by classical database systems. The relationships between OLAP and data mining have also been investigated by several authors [GMPS96, Han97, Han98]. The research on statistical databases has been mainly concerned with conceptual modelization. The focus is on the macro data obtained by grouping and aggregating the original micro data. In SDBs, it is generally assumed that the micro data are not available, both for efficiency reasons (the ....
C. Glymour, D. Madigan, D. Pregibon, and P. Smyth. Statistical inference and data mining. Communications of the ACM, 39(11):35-- 41, 1996.
....= 0 8u 2 T ; hence equivalently OE(S) 1 2 P u;v2S wuw v OE u;v : We will also assume throughout that OE is nonnegative. For general references in the field of clustering see [10, 38, 33, 69, 27, 36, 53, 54, 2, 7] for discussions of a variety of interesting methods and application areas see [24, 68, 65, 58, 60, 67, 48, 26, 46]. A key role in our method is played by a random sampling process which, given T , picks a very small weighted collection of points. We show that for a range of cost functions, the cost of this collection is with high probability close to that of the original collection T . Moreover in the case OE ....
C. Glymour, D. Madigan, D. Pregibon, and P. Smyth. Statistical inference and data mining. Communications of the ACM, 39(11), November 1996.
....in particular statistical methods, for the evaluation of hypotheses in that space. Moreover, that similarity goes even further, covering also the main kinds of statistical methods employed for the evaluation, namely statistical hypotheses testing, most often in the context of contingency tables [5, 9, 31, 32, 55, 56]. Scope. GUHA relates, in particular, to mining association rules. Indeed, if A = fA 1 ; Am g is the set of binary attributes in a database of size k, and if X;Y ae A; X Y = then the association rule X ) Y is significant in the database (according to [1, 2, 27, 28, 40, 47, 54] if ....
Glymour, C., Madigan, D., Pregibon, D., and Smyth, P. Statistical inference and data mining. Communications of the ACM 39 (1996), 35--41.
....missing, biased or not applicable Is the system able to reason with noisy data, or must the data be cleaned 2.2. 2 Consistency Does the system discover inconsistency Is it able to reason with inconsistency Is the system trying to hide uncertainty or is it actively using and revealing it [4] Will the system discover latent attributes 2.2.3 Prior Knowledge Do we have any prior knowledge of the system that is to be analyzed Is it in the form of a data dictionary or domain knowledge Is it extracted automatically Is the method using metadata to eliminate relations between data that ....
Clark Glymour et al. Statistical inference and data mining. Communications of the ACM, 39(11), 1996.
....a bank for accounting purposes) Thus, issues such as experimental design (the construction of an experiment to collect data to test a specific hypothesis) are not typically within the vocabulary or tool set of a data miner. For other general discussions on statistical aspects of data mining see [EP96, GMPS96, GMPS97, HPS97, Han98, Lam00, Smy00]. 3. A Reductionist View of Data Mining Let us consider a very high level view of data mining and try to reduce a generic data mining algorithm into its component parts. The particular reductionist viewpoint proposed here is not necessarily unique, but it nonetheless does provide some insight ....
Glymour C., Madigan D., Pregibon D., Smyth P. (1996) Statistical inference and data mining, Communications of the ACM, 39(11), 35--41.
No context found.
C. Glymour, D. Madigan, D. Preigbon, and P. Smyth. Statistical Inference and Data Mining. Journal of the CACM, 39(11):35--41, 1996.
No context found.
Glymour C., MadiganD., Pregibon D, Smyth P, "Statistical Inference and Data Mining", in CACM v39 (11), 1996, pp. 35-42
No context found.
C. Glymour et al., "Statistical Inference and Data Mining," Comm. ACM, Vol. 39, No. 11, Nov. 1996, pp. 35--41.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC