| Kohavi R, Sommerfield D. Data mining using MLC++: a machine learning library in C++. IEEE Computer Society Press; 1996; p. 234-45. |
....kinds of algorithms and or hybrid methods to a given data set. The most common are toolbox architectures where several algorithms are collected into a package, from which the most suitable algorithm for the target problem is somehow chosen. An example is the Machine Learning Library (MLC ) [1], which is designed to provide researchers with a wide variety of tools that can accelerate algorithm development, increase software reliability, provide comparisons, and display information visually. This is achieved through a library of C classes and functions implementing the most common ....
Kohavi, R., Sommerfield D., Dougherty J., "Data Mining using MLC++, a Machine Learning Library in C++", International Journal of Artificial Intelligence Tools, Vol. 6, No. 4, pp. 537-566, 1997.
....each machine. NCV handles the speci c e ort involved in partitioning the data for cross validation, partitioning it out to the individual machines, and aggregating and reporting results. This environment is certainly not the rst general purpose machine learning programming environment. MLC [7] and Weka [5] are two wellknown examples. NCV takes inspiration from these other approaches in providing a straightforward and simple approach for coding up algorithms to run in parallel. While we believe that NCV is quite a useful environment in and of itself, we hope that our architecture might ....
Ron Kohavi, Dan Sommereld, and James Dougherty. Data mining using MLC++: A machine learning library in C++. In Tools with Articial Intelligence, pages 234-245. IEEE Computer Society Press, 1996.
....two of the most popular existing non commercial learning environments and explains why they do not fully meet our requirements. These two freely available data mining enviroments are WEKA 1 (Waikato Environment for Knowledge Analysis) 8] developed at University of Waikato, NZ, and MLC 2 [4], first developed at Stanford University, CA, USA, and then extended by Silicon Graphics, Inc. SGI) CA, USA. Weka is a collection of machine learning algorithms implemented in Java. WEKA supports a large number of learning schemes for classification and regression (numeric prediction) like ....
Ron Kohavi, Dan Sommerfield, and James Dougherty. Data mining using MLC++: A machine learning library in C++. In Tools with Artificial Intelligence, pages 234--245, Los Alamitos, CA, USA, 1996. IEEE Computer Society Press. http://www.sgi.com/tech/mlc/.
....process within the search of rules [18] but, at the time being, MVC does not include this option. Here, experts have done this work by studying the frequency distributions of the attributes and using their knowledge of the domain. Let us note that any segmentation method could be used [9]. After the beginning of this work, physicians noticed the origin of the unknown values: the explanation was a mistake during the moving of the file between two software applications (as a matter of fact, all numbers with a comma have been lost and replaced by a missing value ) Nevertheless, we ....
R. Kohavi, D. Sommerfield and J. Dougherty. Data mining using MLC++: A machine learning library in C++, in "Tools with Artificial Intelligence", IEEE Computer Society Press, http://www.sgi.com/Technology/mlc, 1996.
....comparisons between different distance functions for dynamic integration presented in [14] we decided to use the Heterogeneous Euclidean Overlap Metric, which produced good test results earlier. The test environment was implemented within the MLC framework (the machine learning library in C ) [10]. Table 2 presents accuracy values for Bagging with different voting methods. The first column includes the name of the corresponding data set. The second column tells how many base classifiers were produced to the committee. The next three columns include the average of the minimum accuracies of ....
Kohavi, R., Sommerfield, D., Dougherty, J.: Data Mining Using MLC++: A Machine Learning Library in C++. Tools with Artificial Intelligence, IEEE CS Press (1996) 234-245.
....discovered relationships. For example, visualization of results of naive Bayesian classification may help to identify which are important factors that 50 speak for and against diagnosis [175] and a 3 D visualization of a decision tree may assist in tree exploration and increase its transparency [78]. At present, though, not many state of theart data analysis techniques by themselves include a visual interface and support visualization of results, although this may change in near feature with introduction of integrated data analysis methods. Bayesian networks [124, 164] are an increasingly ....
Kohavi, R., Sommerfield, D. and Dougherty, J., "Data mining using MLC++, a machine learning library in C++," International Journal of Artificial Intelligence Tools, 6(4): 537--566 (1997).
....the comparisons between different distance functions for dynamic integration presented in [16] it was decided to use the Heterogeneous Euclidean Overlap Metric, which produced good test results. The test environment was implemented within the MLC framework (the machine learning library in C ) [11]. In the experiments, committee size equal to 5 is set in Bagging and AdaBoost (exactly 5 classifiers are generated and integrated both in Bagging and AdaBoost) Uhiyr # Accuracy values for Bagging and AdaBoost decision committees integrated with different approaches Decision committee: C4.5 ....
Kohavi, R., Sommerfield, D., Dougherty, J.: Data Mining Using MLC++: A Machine Learning Library in C++. Tools with Artificial Intelligence, IEEE CS Press (1996) 234-245.
....of attributes of the data records. The most popular algorithm for DT construction is perhaps Quinlan s ID3 [24] and its improvements, as grouped in the set of programs C4.5 [26] The more recent DT construction algorithms deal with noise and prune trees according to information theoretic criteria [26, 20, 16]. Association Rules (ARs) are collected from the decision tree in a very straightforward manner. Every path on a decision tree is a set of attribute values that results in a goal value. This is the form of an association rule. The support and confidence are calculated from the weights of the ....
Ron Kohavi, Dan Sommerfield, and James Dougherty. Data mining using mlc++ a machine learning library in c++. Intern. Jou. Art. Intell. Tools, 6:537--566, 1997.
....The LINUS algorithm can be viewed as a toolkit of learning techniques. These include the decision tree induction system ASSISTANT [5] and two rule induction systems: an ancestor of AQ15, named NEWGEM [36] and CN2 [7, 6] Recently, LINUS has been upgraded also with an interface to MLC [22]. 3.1.1 The algorithm The LINUS learning algorithm is outlined in Figure 1. Notice that the algorithm is given a type signature t, defining the types of arguments of foreground and background predicates. In general, possible types include pre defined atomic types such as integers, reals and ....
R. Kohavi, D. Sommerfield, and J. Dougherty. Data mining using MLC++: A machine learning library in C++. In Tools 38 with Artificial Intelligence. IEEE Computer Society Press, 1996. http://www.sgi.com/Technology/mlc.
....are presented in Figure 3. We measure statistical significance of the difference of averages by using the paired differences t test based on train test crossvalidation splits. In our experiments, we use the C4.5 inductive learning algorithm taken from the machine learning library in C (MLC )[6]. The experiments are implemented within the MLC framework. In all the experiments, we use only the Euclidean distance metrics (the standard squared distance metrics) for finding the nearest neighbors. We vary the number of equi sized subsets of the training data from 2 to 32 ensuring that the ....
Kohavi, R., Sommerfield, D., Dougherty, J.: Data Mining Using MLC++: A Machine Learning Library in C++. Tools with Artificial Intelligence, IEEE CS Press (1996) 234-245
....is used to perform the last classification step. 4 2.2 Multi classifier construction We present, at this point, the methodology used in the construction of the multi classifier. Here should be Figure 2 For each single classifier, we have learned the model using MLC library (Kohavi et al. [21]) In order to obtain a training datafile to be used for the Bayesian network structure learning process, we execute a Leaving One Out sequence for each single classifier in which, for each case i in the database we learn the model using the remaining n Gamma 1 cases (all the cases except the i ....
.... Aha et al. 1] IB4 stores only misclassified instances, it keeps a classification performance record for each saved instance and removes some of the saved instances that are believed to be noisy instances using a significance test, ffl IB, a inducer developed in the MLC project (Kohavi et al. [21]) and based on the works of Aha [2] and Wettschereck [43] IB is very similar to IB4, which has all the functions of IB3 with an additional attribute weight learning capability. The attribute weights are increased for attributes with similar values for correct classifications or for attributes ....
R. Kohavi, D. Sommerfield and J. Dougherty (1997): "Data mining using MLC++, a Machine Learning Library in C++", International Journal of Artificial Intelligence Tools 6 (4), 537-566. [http://www.sgi.com/Technology/mlc/]
.... , significance at the 0:1 confidence level. Tables 4 and 5 show the average (and its standard deviation) number of features selected by different approaches. Experiments are executed in a SGI Origin 200 computer using the implementation of the Naive Bayes algorithm that appears in the MLC [35] software. Regarding these tables, the following observations can be made: ffl all FSS algorithms help NB to reduce the number of features needed to induce the final models. This dimensionality reduction is coupled with considerable accuracy improvements in all datasets except Anneal, where ....
R. Kohavi, D. Sommerfield and J. Dougherty, `Data mining using MLC++, a Machine Learning Library in C++', International Journal of Artificial Intelligence Tools, 6, 537-566, 1997.
....the parameter Problem Id. 2. Keep one batch for validation and train using all other batches. Repeat this step until eachbatch has been used for testing once. 3. Get final evaluation from validation results over the different runs. Wehave implemented this approach using the SAS system and MLC [6]. The term LOBO (leave one batch out) has been used to describe a similar evaluation method in [7] Wehave been using this method to evaluate the performance of avariety of learning algorithms for different aircraft components. 6.1 Evaluation function In the previous section, we explained a ....
....of these parameters could have significantly improved the results. For each of the 16 components, we have been using three different data mining techniques to learn the desired models: decision tree with C4.5 [9] Instance Based (one nearest neighbor) and Naive Bayes both implemented in MLC [6]. The evaluation has been performed using the methods described in Section 6. We used the same reward function for all components. The selected reward function is very similar to the one presented in Figure 4. Table 1 presents the results obtained from this large scale experimentation. The first ....
R. Kohavi, D. Sommerfield, and J. Dougherty. Data mining using MLC++: A machine learning library in C++. In Tools with Artificial Intelligence, pages 234--245. IEEE Computer Society Press, 1996. Received the best paper award.
....descriptions of a concept from a set of training examples. Practical data mining tools generally employ a number of inductive learning algorithms. For example, Silicon Graphics data mining and visualization product MineSet uses MLC as a base for the induction and classification algorithms (Kohavi, Sommerfield, and Dougherty, 1996). This work focuses on establishing causal relationships to class labels from values of attributes via a heuristic search that starts with values of individual attributes and continues to consider the double, triple, or further combinations of attribute values in sequence until the example set is ....
Kohavi, R., D. Sommerfield, and J. Dougherty. 1996. Data Mining Using MLC++: A Machine Learning Library in C++. Tools with AI: 234-245.
....consisted of 200 chest radiograph reports. Five different machine learning algorithms, representing different approaches to inductive learning, were used to generate classifiers: MC4, CN2, nave Bayes, IB, and decision tables. The algorithms were interfaced by the MLC machine learning library [8]. We studied the effect of each knowledge source (UMLS, NLP) on machine learning performance. There were four different report structures studied with machine learning algorithms: raw text, raw text with UMLS, NLP output, NLP output with UMLS. With raw text, reports were converted to document ....
Kohavi R, Sommerfield D, Dougherty J. Data mining using MLC++: a machine learning library in C++. In: Tools with Artificial Intelligence; IEEE Computer Society Press; 1996.
.... planning and measuring (the measuring methods will be plugged in later) and a data auditing system have been implemented [Sac99] HW00] On the part of commercial software tools, the Microsoft Repository [Ber97] the rule processor Ilog Rules [Ilo00] and the data mining class library MLC [KSD96] are being integrated into the DQMS. Furthermore, the ETL tool Integrity [Val00] with its data migration and scrubbing algorithms is to be integrated as well. Evaluation by means of a real world application Both the concepts of the DQMS and their implementation will be evaluated by means of a ....
Kohavi, R., Sommerfield, D., Dougherty, J.: Data Mining using MLC++ -- A Machine Learning Library in C++. Tools with AI 1996, pp. 234--245, 1996.
....at least 250 instances, but for which the accuracy for decision trees was less than 95# #because the ROC curves are di#cult to read at very high accuracies #. For each domain, we induced classi#ers for the minority class #for Road wechose the class Grass#. We selected several inducers from MLC #Kohavi et al. 1997 # : a decision tree learner #MC4#, NaiveBayes with discretization #NB#, k nearest neighbor for several k values #IBk#, and Bagged MC4 #Breiman, 1996# . MC4 is similar to C4.5 #Quinlan, 1993# ; probabilistic predictions are made by using a Laplace correction at the leaves. NB discretizes the data ....
R. Kohavi, D. Sommer#eld, and J. Dougherty. #1997# Data mining using MLC++ : A machine learning library in C++ . International Journal on Arti#cial Intelligence Tools, 6#4#:537#566. Available: http: ##www.sgi.com#Technology#mlc.
....and data mining. For example, the Statlog project compared 22 different classification algorithms (including decision tree algorithms, rule algorithms, and neural networks) on 23 datasets [9] MLC has several algorithms and a large comparison was done with 22 algorithms on eight datasets [5]. Lim et al. compared 33 classification algorithms in terms of prediction accuracy, complexity, and training time on 32 datasets [7] This paper has three main contributions. Firstly, we provide the first objective evaluation and comparison of several well known association rule algorithms on ....
Kohavi, R., Sommerfield, D., and Dougherty, J. Data mining using MLC++: A machine learning library in C++, International Journal on Artificial Intelligence Tools 6(4), 537-566. http://www.sgi.com/Technology/ mlc.
....new algorithms using artificial datasets provided by IBM Almaden. To the authors best knowledge, there has been no large thirdparty benchmark in the area of association rule discovery, while several such comparisons have been performed in other related areas of machine learning and data mining [3][6] This paper has three main contributions. Firstly, we provide the first objective evaluation and comparison of several well known association rule algorithms on real world e commerce and retail datasets. We are also donating one of these e commerce datasets for use in the research community. ....
Kohavi, R., Sommerfield, D., and Dougherty, J. Data mining using MLC++: A machine learning library in C++, International Journal on Artificial Intelligence Tools 6(4), 537-566. http://www.sgi.com/Technology/mlc.
.... performance of previous classifiers (as in boosting methods) and those that do not (as in Bagging) Algorithms that do not adaptively change the distribution include option decision tree algorithms that construct decision trees with multiple options at some nodes (Buntine 1992b, Buntine 1992a, Kohavi Kunz 1997); averaging path sets, fanned sets, and extended fanned sets as alternatives to pruning (Oliver Hand 1995) voting trees using different splitting criteria and human intervention (Kwok Carter 1990) and error correcting output codes (Dietterich Bakiri 1991, Kong Dietterich 1995) Wolpert ....
....We used four base inducers for our experiments; these came from two families of algorithms: decision trees and Naive Bayes. 3.1. The Decision Tree Inducers The basic decision tree inducer we used, called MC4 (MLC C4.5) is a TopDown Decision Tree (TDDT) induction algorithm implemented in MLC (Kohavi, Sommerfield Dougherty 1997). The algorithm is similar to C4.5 (Quinlan 1993) with the exception that unknowns are regarded as a separate value. The algorithm grows the decision tree following the standard methodology of choosing the best attribute according to the evaluation criterion (gain ratio) After the tree is grown, ....
[Article contains additional citation context not shown here]
Kohavi, R., Sommerfield, D. & Dougherty, J. (1997), `Data mining using MLC++ : A machine learning library in C++ ', International Journal on Artificial Intelligence Tools 6(4), 537-- 566.
....first wrapper paper. This chapter incorporates a number of corrections and improvements suggested by readers of our earlier publications on the wrapper method, especially Pat Langley, Nick Littlestone, Nils Nilsson, and Peter Turney. Dan Sommerfield implemented large parts of the wrapper in MLC (Kohavi et al. 1996), which was used for all of the experiments. George John s work was supported under a National Science Foundation Graduate Research Fellowship. Most of the research for this paper was completed while the authors were at Stanford University. ....
Kohavi, R., Sommerfield, D., and Dougherty, J. (1996). Data mining using MLC++ : A machine learning library in C ++ . In Tools with Artificial Intelligence, pages 234--245. IEEE Computer Society Press. Received the best paper award. http://www.sgi.com/Technology/mlc.
....Bayes rule = p( X = x j Y = y) Delta p(Y = y) p( x) p( x) is same for all label values. p(X 1 = x 1 ; Xn = xn j Y = y) Delta p(Y = y) by independence = n Y i=1 p(X i = x i j Y = y) Delta p(Y = y) The version of Naive Bayes we use in our experiments was implemented in MLC (Kohavi, Sommerfield Dougherty 1996). The probabilities for nominal features are estimated from data using maximum likelihood estimation. Continuous features are discretized using a minimum description length procedure described in Dougherty, Kohavi Sahami (1995) and were thereafter treated as multi valued nominals. Unknown ....
....like thank our anonymous reviewers. One reviewer formulated Example 3, which is much better than our original example. Pat Langley, Nick Littlestone, Nils Nilsson, and Peter Turney gave helpful feedback on the ideas and presentation. Dan Sommerfield implemented large parts of the wrapper in MLC (Kohavi et al. 1996), and all of the experiments were done using MLC . George John s work was supported under a National Science Foundation Graduate Research Fellowship. ....
Kohavi, R., Sommerfield, D. & Dougherty, J. (1996), Data mining using MLC++: A machine learning library in C++, in Tools with Artificial Intelligence, IEEE Computer Society Press, p. To Appear. http://www.sgi.com/Technology/mlc.
No context found.
Kohavi R, Sommerfield D. Data mining using MLC++: a machine learning library in C++. IEEE Computer Society Press; 1996; p. 234-45.
No context found.
Kohavi R, Sommerfield D, Dougherty J. Data mining using MLC++: a machine learning library in C++. Tools with Artificial Intelligence, 234-245. 1996.
No context found.
Kohavi, R., Sommerfield, D., and Dougherty, J. Data Mining using MLC++: A Machine Learning Library in C++. Tools with AI '96, 234-245, November 1996.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC