| Hosking J, Pednault E, Sudan E (1997) A statistical perspective on data mining. Future Gener Comput Syst 13(2):117--134 |
....a bank for accounting purposes) Thus, issues such as experimental design (the construction of an experiment to collect data to test a specific hypothesis) are not typically within the vocabulary or tool set of a data miner. For other general discussions on statistical aspects of data mining see [EP96, GMPS96, GMPS97, HPS97, Han98, Lam00, Smy00]. 3. A Reductionist View of Data Mining Let us consider a very high level view of data mining and try to reduce a generic data mining algorithm into its component parts. The particular reductionist viewpoint proposed here is not necessarily unique, but it nonetheless does provide some insight ....
Hosking, J. R. M., Pednault, E. P. D., and Sudan, M. (1997) A statistical perspective on data mining, Future Generation Computer Systems, 13, 117-134.
.... and NSD CLARANS (non spatial dominant algorithm) Ester et al. 1995) In statistical methods, the focus is on exploiting statistical approaches (probability distributions, hypothesis testing, model estimation and scoring) for performing the mining task of extracting discriminators from a data set (Hosking et al. 1997). Statistical techniques applied for extracting categories are based on supervised unsupervised learning, cluster analysis, and related methods. We are particularly interested in unsupervised methods that can be used to uncover unknown spatiotemporal patterns in large data sets. One potentially ....
Hosking, J.R.M., Pednault, E.P.D. and Sudan, M., 1997, A statistical perspective on data mining. Future Generation Computer Systems, 13, 117-134.
....the same assumption. The ability to ascertain whether the model is truly applicable to new data seems to be beyond the scope of current research efforts. Section 3. 1 below is a brief summary of the description of these theoretical developments originally written from the statistical perspectivein[11]by Hosking, Pednault and Sudan. The Support Vector Machine, discussed in section 3.2, is gaining increasing attention from researchers as a promising and practical predictive modeling approach based on the statistical learning theory. For further details on these theoretical advances, see ....
....Hosking, Pednault and Sudan. The Support Vector Machine, discussed in section 3.2, is gaining increasing attention from researchers as a promising and practical predictive modeling approach based on the statistical learning theory. For further details on these theoretical advances, see [11 14]) 3.1 Computational and statistical learning theory A model generation process can be viewed as selecting a best possible model, from a given family of models, i.e. functions that map input feature space to the target variable. The models summarize data, examples of input output combinations, ....
Hosking J.R.M., Pednault E.P.D. & Sudan M., "A Statistical Perspective on Data Mining", Future Generation Computer Systems: Special issue on Data Mining,Vol. 3, Nos. 2-3, pp. 117-134, 1997.
....so does the true error rate, until one reaches a minima. Beyond that, as the cost complexity decreases, the true error starts increasing again. Obviously, one chooses the sub tree corresponding to the minimum true error rate as the final pruned version. This is similar to the process described in [13]. In contrast to the cost complexity pruning of CART, in which the true error rate of a tree and its subtrees is predicted from a separate set of examples that are distinct from the training examples, C4.5 uses a significance test that compares a parentnodetoits children. Starting with a fully ....
J. Hosking, E. Pednault, and M. Sudan. A Statistical Perspective on Data Mining. 1997. in this issue.
....in predictive modeling products and services. ProbE might best be described as an extensible, embeddable, and scalable segmentation based modeling engine. The design of ProbE has been motivated by recent advances in integrating statistics and learning techniques with data management [6, 8, 11]. ProbE s application programming interfaces (API s) are particularly well suited for implementing segmentation based modeling techniques, wherein data records are partitioned into segments and separate predictive models are developed for each segment. At the time the ATM SE solution was ....
Hosking, J., E. Pednault, and M. Sudan, A Statistical Perspective on Data Mining. Future Generation Computer Systems, 1997. 13(2-3): p. 117-134.
.... ###FKDQJH############################# RVV#5DWLR############################ #FKDQJH############################### ###FKDQJH############################# Figure 2: Example of sensitivity analysis of a rule These challenges havemotivated our own research ( Hosking et al. 1997,Pednault, 1998 ] and have lead to the development of the ProbE (Probabilistic Estimation) predictive modeling class library.This C library embodies several innovations that address the challenges posed by insurance data. The algorithms are able to construct rigorous rule based models of ....
J. Hosking, E. Pednault, and M. Sudan. A Statistical Perspective on Data Mining. Future Generation Computer Systems,No- vember 1997.
No context found.
Hosking J, Pednault E, Sudan E (1997) A statistical perspective on data mining. Future Gener Comput Syst 13(2):117--134
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC