| S.M. Weiss, R.S. Galen, and P.V. Tadepalli. Maximizing the predictive value of production rules. Artificial Intelligence, 45:47-71, 1990. |
....interested in how the accuracy is affected by partitioning these small data sets into N disjoint subsets, learning decision trees on the subsets, generating rules from the decision trees and then merging the rules into a final set of rules. Our experiments were done using 10 fold cross validation [Weiss et at. 1990]. For an individual data set and a given number of disjoint subsets, N, 10 partitions of the data were made each consisting of 90 of the train data with a unique 10 held back for testing. From each fold, N disjoint subsets are created. C4.5 is applied to each of the N subsets and rules are ....
S.M. Weiss, R.S. Galen, and P.V. Tadepalli. Maximizing the predictive value of production rules. Artificial Intelligence, 45:47-71, 1990.
....produced accurate predictions for 78.08 of infants using the features listed in Appendix II. The predictions for the remaining 21.92 of patients involved probability. The testing was done using a tenfold cross validation scheme proposed in [27] that is widely used in data mining studies, e.g. [28]. The data set was broken into ten partitions, each consisting of 90 of the training data and 10 test examples. Rules were extracted from each of the ten training partitions and applied to each of the ten training sets. The ultimate success of accurate predictions is in the appropriate ....
....1. IF (Wire SNRT = 42) AND (HFP BDG = HFP) THEN (Post Op Arrhy1 = IART) 1, 50.00 , 100.00 ] 11] Rule 2. IF (HFP BDG = BDG) AND (Age at Fontan = 1.75) THEN (Post Op Arrhy1 = IART) 1, 50.00 , 100.00 ] 21] Rule 3. IF (Age at Fontan = 1. 6275) THEN (Post Op Arrhy1 = None) 33, 54.10 , 100.00 ][4, 5, 6, 8, 9, 10, 17, 22, 23, 25, 26, 28, 29, 31, 33, 34, 35, 36, 38, 40, 41, 43, 44, 45, 49, 53, 54, 55, 58, 59, 61, 62, 63] Rule 4. IF (PreCategory = None) THEN (Post Op Arrhy1 = None) 29, 29, 47.54 , 100.00 ] 2, 3, 6, 8, 10, 12, 14, 15, 19, 22, 23, 24, 26, 27, 30, 35, 36, 37, 42, 45, 46, 47, 50, 52, 53, 56, 59, 60, 63] Rule 5. IF (Wire TSACT = 106) THEN (Post Op Arrhy1 = None) 30, 49.18 , 100.00 ] 1, 2, ....
[Article contains additional citation context not shown here]
S. Weiss, R. Galen, and K. Bowyer, "Maximizing the predictive value of production rules," Artif. Intell., vol. 45, pp. 47--71, 1990.
.... ] Thrun et al. 1991 ] Kirkwood et al. 1989 ] Tsaptsinos et al. 1990 ] Spikovska and Reid, 1990 ] Atlas et al. 1991 ] and [ Gorman and Sejnowski, 1988 ] The comparisons by [ Weiss and Kapouleas, 1989, Weiss and Kulikowski, 1991 ] found that symbolic algorithms (PVM [ Weiss et al. 1990 ] and CART [ Breiman et al. 1984 ] were more accurate (on most datasets) than back propagation or a number of statistical algorithms (e.g. linear and logistic discriminants) In similar trials involving ID3 [ Quinlan, 1986 ] a perceptron, and back propagation [ Mooney et al. 1989, Shavlik ....
S. M. Weiss, R. S. Galen, and P. V. Tadepalli. 1990. Maximizing the predictive value of production rules. Artificial intelligence, 45 47 -- 71.
....pruning, using an information theory based measure of interestingness. The Brute programs (Riddle, Segal, Etzioni, 1994; Segal Etzioni, 1994) perform (highly optimized) gat rule space searches using Laplace corrected con dence and complexity bounds to de ne interestingness. Weiss, et al. (Weiss, Galen, Tadepalli, 1990) describe a gat rule learning search program (PVM) and several pruning heuristics for maximizing predictive value. The RL programs (Clearwater Provost, 6 1990; Provost Buchanan, 1995; Fawcett Provost, 1997) perform gat rule space search, and have been used with a variety of interestingness ....
Weiss, S. M., Galen, R. S., & Tadepalli, P. V. (1990). Maximizing the predictive value of production rules. Articial Intelligence, 45, 47-71.
....learning algorithms that provide satisfactory solutions for realworld learning problems. There are a lot of experimental results regarding the performance of various heuristic learning algorithms on a number of benchmark datasets for real world classification problems ( BN] Ho] Ma] WK90] [WGT], WK91] 3 Department of Computer Science, Princeton University, 35 Olden St. Princeton, NJ 08540, USA, e mail: dpd cs.princeton.edu. The research work of this author was supported by NSF Grant CCR93 01254 and by DIMACS, an NSF Science and Technology Center. y Department of Computer ....
S.M. Weiss, R. Galen and P.V. Tadepalli, Maximizing the predictive value of production rules. Art. Int. 45 (1990), 47-71.
....very simple hypothesis, for example all possible hypotheses that test on a single attribute and our goal is to select one of them. Simple hypotheses have been studied before by several researchers and it has been reported that in some cases they can achieve surprisingly high accuracy (see, e.g. [128, 65, 66, 14]) Moreover, with the new discovery of voting methods like boosting [49] bagging [25] or error correcting output codes [34] several of these hypotheses can be combined in a way that the overall precision becomes extremely high. Another situation where this problem appears if the following. ....
Sholom M. Weiss, Robert S. Galen, and Prasad V. Tadepalli. Maximizing the predictive value of production rules. Artificial Intelligence, 45(1-2):47--71, 1990.
.... to the search based data mining program MetaDENDRAL (Buchanan, Smith, White, Gritter, Feigenbaum, Lederberg, and Djerassi 1976) Buchanan and Feigenbaum 1978) Examples of MetaDENDRAL style rule learning include the Brute programs (Riddle, Segal, and Etzioni 1994; Segal and Etzioni 1994a) PVM (Weiss, Galen, and Tadepalli 1990), ITRULE (Smyth and Goodman 1992) the RL programs (Clearwater and Provost 1990; Provost and Buchanan 1995; Fawcett and Provost 1997) SE trees (Rymon 1993) and even Schlimmer s determination learning algorithm (Schlimmer 1993) These programs view rule learning as an explicit search of the rule ....
Weiss, S. M., R. S. Galen, and P. V. Tadepalli (1990). Maximizing the predictive value of production rules. Artificial Intelligence 45, 47--71.
....We also assume that the size of the data available is huge, and thus, it is very inefficient to use all examples in the dataset. Simple hypotheses have been studied before by several researchers and it has been reported that in some cases they can achieve surprisingly high accuracy (see, e.g. [9, 5, 1]) Moreover, with the new discover of voting methods like boosting [4] bagging [2] or error correcting output codes [3] several of these hypotheses can be combined in a way that the overall precision becomes extremely high. Perhaps the paper by Holte [5] best exemplifies our problem. In that ....
S.M. Weiss, R.S. Galen and, P.V. Tadepalli. Maximizing the Predictive Value of Production Rules. Artificial Intelligence, 45, 47--71, 1990.
....of learning algorithms that provide satisfactory solutions for real world learning problems. There are a lot of experimental results regarding the performance of various heuristic learning algorithms on a number of benchmark datasets for real world classification problems ( Min] WK90] [WGT], WK91] BN] Ho] but the set of learning algorithms that are examined in these applications is virtually disjoint from the set of learning algorithms that are traditionally considered in computational learning theory. Haussler in [Hau] provided an important link between theoretical and ....
....A minimizes the empirical error for the point set shown (circles are labeled 1, and squares are labeled 0, A fails to predict the filled point) Simple hypotheses classes of this type have turned out to be quite interesting from the point of view of applied machine learning. Weiss et al. WK90] [WGT], WK91] have shown through experiments that for many of the standard benchmark datasets a short rule that depends on only two of the attributes, and which is a boolean combination of expressions of the form a j c or a j = c , provides the best available prediction rule. For example it is ....
S.M. Weiss, R. Galen and P.V. Tadepalli, Maximizing the predictive value of production rules. Art. Int. 45 (1990), 47-71.
....machine learning concrete datasets from quite diverse application domains are viewed as prototypes for real world classification problems. The performance of many practical learning algorithms on these datasets is described in a number of interesting comparative studies (see e.g. Min89] [WGT90], WK90] WK91] BN92] Hol93] Each dataset is a list of items (typically between a few dozen and several thousand) each item consisting of n attribute values (typically n 40) and an associated classification. The attributes might be continuous, ranging over R, or categorical, ranging over ....
....algorithm. C4.5 s hypothesis class includes decision trees of arbitrary depth, but with only binary splits on continuous attributes. Of the fifteen datasets used in the experiments nine (BC,CH,G2,HD,HE,IR,LA,SE,SO) have already been used in [Hol93] and six are new. AP, the appendicitis dataset in [WGT90], was kindly supplied by S. Weiss of Rutgers University. The other five new datasets were obtained from the UCI repository. 4 Table 1 gives the datasets main characteristics. Size is the total number of examples in the dataset. Classes is the number of classes in the dataset. Baseline ....
[Article contains additional citation context not shown here]
S. M. Weiss, R. Galen, P. V. Tadepalli, Maximizing the predictive value of production rules, Art. Int., vol. 45, 1990, 47 - 71.
....rectangle Here we look at the minimizing disagreement problem when the class of hypotheses is the set R of axis aligned, but otherwise arbitrary, rectangles (Fig. 3) Simple hypotheses classes of this type have turned out to be quite interesting in applied machine learning. Weiss et al. ( WK90] [WGT], WK91] have shown through experiments that for many of the standard benchmark datasets a short rule that depends on only two of the attributes, and which is a boolean combination of expressions of the form a j c or a j = c , provides the best available prediction rule. For example it is ....
S.M. Weiss, R. Galen and P.V. Tadepalli, Maximizing the predictive value of production rules. Art. Int. 45 (1990), 47-71.
....categories [7, 19] We have found several business applications where categorizing text would aid its use, routing, or analysis. These texts often reside in large databases supporting boolean queries [29, pages 231 236] a restricted version of propositional logic. Because decision rules [27, 34] can be converted into this form (unlike probabilistic models requiring arithmetic) they make a good choice for the final classifier. Another important advantage is that they can more comprehensible to humans than decision trees [4] Our databases contain hundreds of thousands of unlabeled ....
Sholom M. Weiss, Robert S. Galen, and Prasad V. Tadepalli. Maximizing the predictive value of production rules. Artificial Intelligence, 45(1--2):47--71, September 1990.
....as an efficiency optimization, not as a means of improving the classification accuracy of the algorithm. CARPER does not share Gold digger s iterative structure or the max accuracy function. One of the first systems to apply massive search to learning is Weiss, Galen, and Tadepalli s PVM system (Weiss et al. 1990). PVM uses a beam search and pruning heuristics to do a semi exhaustive search for short classification rules. PVM differs from Brute in that PVM is designed to find classification rules rather than predictive rules and because its search is not complete. Since PVM is interested in classification ....
Weiss, S. M., Galen, R. S., and Tadepalli, P. V., 1990. Maximizing the predictive value of production rules. Artificial Intelligence, 45:47--71.
....regressor, C shows and example of a split, D is an example of a discrete regressor and E is a leaf. 37.2. 2 Comparison to CART The system is a similar in concept to recursive partitioning systems such as CART [4] and uses predictive maximization techniques similar to those used in the PVM system[5]. The main differences between the CART system and the TSIR system are that in CART Tree Structured Interpretable Regression 389 D C B A If: No parents had heart disease subtract 5 One parent had heart disease add 5 Both parents had heart disease add 10 For each excercise activity per week ....
Sholom M. Weiss, Robert S. Galen, and Prassad V. Tadepalli. Maximizing the predictive value of production rules. Artificial Intelligence, 45:47--71, 1990.
....each of which lists one of these types of example case. Rule application outcome display Most evaluation of classification rules developed by machine learning systems concentrates on predictive accuracy. However, this is but one of the useful measures by which such a system might be assessed [19] . Different measures of the quality of a classification system will be more appropriate for different contexts. In many contexts predictive accuracy will not be the relevant criterion. If we are to provide useful tools for presenting aggregate summaries of classifier performance on a body of ....
S. M. Weiss, R. S. Galen and P. Tadepalli, Maximizing the predictive value of production rules, Artificial Intelligence 45 (1990) 47-71.
....We also assume that the size of the data available is huge, and thus, it is very inefficient to use all examples in the dataset. Simple hypotheses have been studied before by several researchers and it has been reported that in some cases they can achieve surprisingly high accuracy (see, e.g. [9, 5, 1]) Moreover, with the new discover of voting methods like boosting [4] bagging [2] or error correcting output codes [3] several of these hypotheses can be combined in a way that the overall precision becomes extremely high. Perhaps the paper by Holte [5] best exemplifies our problem. In that ....
S.M. Weiss, R.S. Galen and, P.V. Tadepalli. Maximizing the Predictive Value of Production Rules. Artificial Intelligence, 45, 47--71, 1990.
.... 1987; Pearl, 1978) However, these amount to no more than proofs that there are few simple hypotheses and that the probability that one of any such small selection of hypotheses will fit the data is low (Dietterich, 1994) A further issue is that the objectives of machine learning may vary greatly (Weiss, Galen, Tadepalli, 1990). It seems surprising that a single secondary learning bias should always be appropriate irrespective of whether one is seeking to maximise sensitivity, specificity, positive predictive value, negative predictive value or accuracy. It would appear more likely that a plausible framework for ....
Weiss, S. M., Galen, R. S., & Tadepalli, P. (1990). Maximizing the predictive value of production rules. Artificial Intelligence, 45, 47-71.
....in the given databases is complete. There are various sources of noise including missing values in real world databases. To produce acceptable results for realistic applications, noise handling facilities are often essential in induction algorithms (Kibler Aha, 1989; Schlimmer Granger, 1989; Weiss, Galen, Tadepalli, 1990; Wu, Kris ar, Mahl en, 1995) ffl Numerical data and symbolic data are equally important in practical applications. Existing data mining algorithms can generally be divided into two groups: numerical methods, including statistical methods and neural networks, which are good at processing ....
Weiss, S., Galen, R., & Tadepalli, P. (1990). Maximizing the predictive value of production rules. Artificial Intelligence, 45, 678--684.
....search of the space of conjunctive rules. The time complexity of the restricted BruteDL is asymptotically faster than k DL by a factor of n. Furthermore, the unrestricted BruteDL is more general because it works for noisy domains, probabilistic concepts, and concepts not in k DL. PVM [Weiss et al. 90] does a massive search of the space of classifiers. PVM s search is not exhaustive because it uses several heuristics to reduce the search space. Even with heuristics, the doubly exponential search space explored by PVM limits it to considering classifiers that are significantly smaller than ....
Weiss, S. M., Galen, R. S., and Tadepalli, P. V. Maximizing the predictive value of production rules. Artificial Intelligence, 45:47--71, September 1990.
....any improvement can be made before adding a new test. The following steps are taken to form the single best rule: a) Make the single best swap from among all possible rule componentswaps, including deleting a component# (b) If no swap is found, add the single best component to the rule. As in [ Weiss et al. 1990 ] best is evaluated as predictivevalue, i.e. percentage correct decisions by the rule. For equal predictivevalues, maximum case coverage is a secondary criterion. Swapping and component addition terminate when 100 predictivevalue is reached. The process of generating the single best rule ....
S. Weiss, R. Galen, and P.Tadepalli. Maximizing the predictivevalue of production rules. Artificial Intelligence, 45:47--71, 1990.
No context found.
S. M. Weiss, R. S. Galen, and P. V. Tadepalli. Maximizing the predictive value of production rules. Artificial Intelligence, 45:47--71, September 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC