8 citations found. Retrieving documents...
Kohavi R, Sommerfield D, & Dougherty J (1997) Data mining using MLC++, a machine learning library in C++. Journal of Artificial Intelligence Tools, 6: 537--566.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Multivariate Discretization for Set Mining - Bay (2000)   (1 citation)  (Correct)

....and that MVD generates intervals that are meaningful while still being adaptive to the underlying interactions between variables. For our experiments we again compared MVD with Fayyad and Irani s recursive minimum entropy approach with the MDL stopping criterion (ME MDL) We used the MLC (Kohavi et al. 1997) implementation of this discretizer. Past work has shown that ME MDL is one of the best methods for classification (Dougherty et al. 1995; Kohavi Sahami, 1996) We also compared our execution times with Apriori to give an indication of how much time discretization takes relative to the ....

Kohavi R, Sommerfield D, & Dougherty J (1997) Data mining using MLC++, a machine learning library in C++. Journal of Artificial Intelligence Tools, 6: 537--566.


Parcel: Feature Subset Selection in Variable Cost Domains - Scott, Niranjan, Prager (1998)   (18 citations)  (Correct)

....2514 1257 3413 Tree 6000 3000 3392 Table 2.1: The data sets for the seven classification tasks examined in this report; the number of cases in each data set are given. In experiments using the naive Bayes classifier, discrete data sets are required. The discretize software, see Kohavi et al.[60], was used to discretise continuous data. This implements entropy based discretisation, described in Dougherty et al.[32] Only default settings were used. For each classification task, the hold out data set was discretised using parameters estimated with the training and validation data alone, ....

R. Kohavi, D. Sommerfield, and J. Dougherty. Data mining using mlc + +, a machine learning library in c + +. International Journal of Artificial Intelligence Tools,, 6(4):537--566, 1997.


Characterizing Model Performance in the Feature Space - Bay, Pazzani   (Correct)

....between two models) For example, in the Adult Census domain (Blake Merz, 1998) the goal is to predict if the salary of a person is greater or less than #50,000 from demographic variables. When we evaluate models produced for this task, we would like to obtain rules such as: Classifier MC4 (Kohavi et al. 1997) is 21 less accurate than average on people who are between 45 and 55 years of age, are high school graduates, and are married. This represents 115 misclassified instances. Alternatively, if we are comparing two classifiers on this task: MC4 and naive Bayes are 9 less likely to agree than ....

Kohavi, R., Sommerfield, D., & Dougherty, J. (1997). Data mining using MLC++, a machine learning library in C++. Journal of Artificial Intelligence Tools, 6, 537--566.


Comprehensible Knowledge Discovery: Gaining Insight from Data - Pazzani   (Correct)

....work provides no guidance on selecting among several models of the data of similar complexity. Some work on increasing the understandability of learned models concerns the construction of tools for visualizing or interactively exploring the results of learning (e.g. The MineSet Tree Visualizer Kohavi, Sommerfield, Dougherty, 1996). While these tools provide an excellent means of identifying and exploring what was learned, they do not provide insight unless the underlying learned models make sense to experts. In the remainder of this paper, we show how existing knowledge may be provided to learning systems so that they are ....

Kohavi, R., Sommerfield D., & Dougherty J., (1996). Data Mining using MLC++, a Machine Learning Library in C++. IEEE Tools With Artificial Intelligence.


An Experimental Comparison of Three Methods for Constructing.. - Dietterich (1998)   (130 citations)  (Correct)

....are diverse and yet accurate. If this can be achieved, then highly accurate classification decisions can be obtained by voting the decisions of the individual classifiers in the ensemble. Many authors have demonstrated significant performance improvements through ensemble methods (Breiman, 1996b; Kohavi Kunz, 1997; Bauer Kohavi, 1999; Maclin Opitz, 1997) Two of the most popular techniques for constructing ensembles are bootstrap aggregation ( bagging ; Breiman, 1996a) and the Adaboost family of algorithms ( boosting ; Freund Schapire, 1996) Both of these methods operate by taking a base learning ....

....than any of the ensemble methods. Figure 1 summarizes the observed differences between randomized C4.5 and bagged C4.5. Figure 2 does the same for randomized C4.5 versus boosted C4.5. These plots are sometimes called Kohavi plots , because they were introduced by Ronny Kohavi in the MLC system (Kohavi, Sommerfield, Dougherty, 1997). Each point plots the difference in the performance of the two algorithms scaled according to the performance of C4.5 alone. For example, in the sonar task, C4.5 (unpruned) gives an error rate of 0.3257; Randomized C4.5 has an error rate of 0.2018, and Bagged C4.5 has an error rate of 0.2752. ....

Kohavi, R., Sommerfield, D., & Dougherty, J. (1997). Data mining using MLC++, a machine learning library in C++. International Journal on Artificial Intelligence Tools, 6 (4), 537--566.


E4 - Machine Learning - Domingos   (Correct)

....and reliable learning component for whatever system they are building. This requires developing libraries of standard machine learning components and of ways of putting them together. Despite a number of early developments in this direction (Gilks, Thomas, Spiegelhalter, 1994; Buntine, 1994; Kohavi, Sommerfield, Dougherty, 1996), for the most part it is still not clear how best to do this. Deciding what representations and techniques to use is still a black art. The designer s personal preferences and a long trial and error process are often what determines the outcome. Many imprecise intuitions and rules of thumb ....

Kohavi, R., Sommerfield, D., & Dougherty, J. (1996). Data mining using MLC++, a machine learning library in C++. International Journal on Artificial Intelligence Tools, 6, 537-566.


Parcel: Feature Subset Selection in Variable Cost Domains - Scott, Niranjan, Prager (1998)   (18 citations)  (Correct)

....2514 1257 3413 Tree 6000 3000 3392 Table 2.1: The data sets for the seven classification tasks examined in this report; the number of cases in each data set are given. In experiments using the naive Bayes classifier, discrete data sets are required. The discretize software, see Kohavi et al.[60], was used to discretise continuous data. This implements entropy based discretisation, described in Dougherty et al.[32] Only default settings were used. For each classification task, the hold out data set was discretised using parameters estimated with the training and validation data alone, ....

R. Kohavi, D. Sommerfield, and J. Dougherty. Data mining using mlc + +, a machine learning library in c + +. International Journal of Artificial Intelligence Tools,, 6(4):537--566, 1997.


Beyond Concise and Colorful: Learning Intelligible Rules - Pazzani, Mani, Shankle (1997)   (16 citations)  (Correct)

No context found.

Kohavi, R., Sommerfield, D., and Dougherty J., 1995. Data Mining using MLC++, a Machine Learning Library in C++. IEEE Tools With Artificial Intelligence.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC