Results 1 - 10
of
34
Correlation-based feature selection for discrete and numeric class machine learning
, 2000
"... Algorithms for feature selection fall into two broad categories: wrappers use the learning algorithm itself to evaluate the usefulness of features, while lters evaluate features according to heuristics based on general characteristics of the data. For application to large databases, lters have prove ..."
Abstract
-
Cited by 88 (1 self)
- Add to MetaCart
Algorithms for feature selection fall into two broad categories: wrappers use the learning algorithm itself to evaluate the usefulness of features, while lters evaluate features according to heuristics based on general characteristics of the data. For application to large databases, lters have proven to be more practical than wrappers because they are much faster. However, most existing lter algorithms only work with discrete classi cation problems. This paper describes a fast, correlation-based lter algorithm that can be applied to continuous and discrete problems. Experiments using the new method as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees show it to be an e ective feature selector|it reduces the data in dimensionality by more than sixty percent in most cases without negatively a ecting accuracy. Also, decision and model trees built from the pre-processed data are often signi cantly smaller. 1 1
Experiments in Predicting Biodegradability
- Applied Artificial Intelligence
, 1999
"... . We present a novel application of inductive logic programming (ILP) in the area of quantitative structure-activity relationships (QSARs). The activity we want to predict is the biodegradability of chemical compounds in water. In particular, the target variable is the half-life in water for aer ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
. We present a novel application of inductive logic programming (ILP) in the area of quantitative structure-activity relationships (QSARs). The activity we want to predict is the biodegradability of chemical compounds in water. In particular, the target variable is the half-life in water for aerobic aqueous biodegradation. Structural descriptions of chemicals in terms of atoms and bonds are derived from the chemicals' SMILES encodings. Definition of substructures are used as background knowledge. Predicting biodegradability is essentially a regression problem, but we also consider a discretized version of the target variable. We thus employ a number of relational classification and regression methods on the relational representation and compare these to propositional methods applied to different propositionalisations of the problem. Some expert comments on the induced theories are also given. 1 Introduction The persistence of chemicals in the environment (or to environmen...
Corpus-based discourse understanding in spoken dialogue systems
- In Proc. Assoc. for Computational Linguistics (ACL). W. Lewis Johnson, Paola Rizzo, Wauter Bosma, Sander Kole, Mattijs Ghijsen, and Herwin van Welbergen
, 2003
"... This paper describes a method for creating an evaluation measure for discourse understanding in spoken dialogue systems. No well-established measure has yet been proposed for evaluating discourse understanding, which has made it necessary to evaluate it only on the basis of the system’s total perfor ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
This paper describes a method for creating an evaluation measure for discourse understanding in spoken dialogue systems. No well-established measure has yet been proposed for evaluating discourse understanding, which has made it necessary to evaluate it only on the basis of the system’s total performance. Such evaluations, however, are greatly influenced by task domains and dialogue strategies. To find a measure that enables good estimation of system performance only from discourse understanding results, we enumerated possible discourse-understanding-related metrics and calculated their correlation with the system’s total performance through dialogue experiments.
Towards self-configuring hardware for distributed computer systems
- In The Second International Conference on Autonomic Computing
, 2005
"... High-end servers that can be partitioned into logical subsystems and repartitioned on the fly are now becoming available. This development raises the possibility of reconfiguring distributed systems online to optimize for dynamically changing workloads. This paper presents the initial steps towards ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
High-end servers that can be partitioned into logical subsystems and repartitioned on the fly are now becoming available. This development raises the possibility of reconfiguring distributed systems online to optimize for dynamically changing workloads. This paper presents the initial steps towards a system that can learn to alter its current configuration in reaction to the current workload. In particular, the advantages of shifting CPU and memory resources online are considered. Investigation on a publically available multi-machine, multi-process distributed system (the online transaction processing benchmark TPC-W) indicates that there is a real performance benefit to reconfiguration in reaction to workload changes. A learning framework is presented that does not require any instrumentation of the middleware, nor any special instrumentation of the operating system; rather, it learns to identify preferable configurations as well as their quantitative performance effects from system behavior as reported by standard monitoring tools. Initial results using the WEKA machine learning package suggest that automatic adaptive configuration can provide measurable performance benefits over any fixed configuration. 1.
Is Combining Classifiers Better than Selecting the Best One
- Machine Learning
, 2004
"... We empirically evaluate several state-of-theart methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. We then propose a new method for stacking, that uses m ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We empirically evaluate several state-of-theart methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. We then propose a new method for stacking, that uses multi-response model trees at the meta-level, and show that it clearly outperforms existing stacking approaches and selecting the best classifier by cross validation. 1.
Naive Bayes for regression
- Machine Learning
, 2000
"... Abstract. Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the c ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract. Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates. This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e., regression) tasks by modeling the probability distribution of the target value with kernel density estimators, and compares it to linear regression, locally weighted linear regression, and a method that produces “model trees”—decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on real-world datasets it is almost uniformly worse than locally weighted linear regression and model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, while for another it is worse. We also show that standard naive Bayes applied to regression problems by discretizing the target value performs similarly badly. We then present empirical evidence that isolates naive Bayes ’ independence assumption as the culprit for its poor performance in the regression setting. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.
Extraction of Rules from Artificial Neural Networks for Nonlinear Regression
, 2002
"... Neural networks have been successfully applied to solve a variety of application problems including classification and function approximation. They are especially useful as function approximators because they do not require prior knowledge of the input data distribution and they have been shown to b ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Neural networks have been successfully applied to solve a variety of application problems including classification and function approximation. They are especially useful as function approximators because they do not require prior knowledge of the input data distribution and they have been shown to be universal approximators. In many applications, it is desirable to extract knowledge that can explain how the problems are solved by the networks. Most existing approaches have focused on extracting symbolic rules for classification. Few methods have been devised to extract rules from trained neural networks for regression. This article presents an approach for extracting rules from trained neural networks for regression. Each rule in the extracted rule set corresponds to a subregion of the input space and a linear function involving the relevant input attributes of the data approximates the network output for all data samples in this subregion. Extensive experimental results on 32 benchmark data sets demonstrate the effectiveness of the proposed approach in generating accurate regression rules.
Generating Rule Sets from Model Trees
- in Proc. of the 12th Australian Joint Conf. on Artificial Intelligence
"... Abstract. Model trees—decision trees with linear models at the leaf nodes—have recently emerged as an accurate method for numeric prediction that produces understandable models. However, it is known that decision lists—ordered sets of If-Then rules—have the potential to be more compact and therefore ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Abstract. Model trees—decision trees with linear models at the leaf nodes—have recently emerged as an accurate method for numeric prediction that produces understandable models. However, it is known that decision lists—ordered sets of If-Then rules—have the potential to be more compact and therefore more understandable than their tree counterparts. We present an algorithm for inducing simple, accurate decision lists from model trees. Model trees are built repeatedly and the best rule is selected at each iteration. This method produces rule sets that are as accurate but smaller than the model tree constructed from the entire dataset. Experimental results for various heuristics which attempt to find a compromise between rule accuracy and rule coverage are reported. We show that our method produces comparably accurate and smaller rule sets than the commercial state-of-the-art rule learning system Cubist. 1
Developing innovative applications in agriculture using data mining
, 2001
"... comprehensive suite of facilities for applying data mining techniques to large data sets. This paper discusses a process model for analyzing data, and describes the support that WEKA provides for this model. The domain model ‘learned ’ by the data mining algorithm can then be readily incorporated in ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
comprehensive suite of facilities for applying data mining techniques to large data sets. This paper discusses a process model for analyzing data, and describes the support that WEKA provides for this model. The domain model ‘learned ’ by the data mining algorithm can then be readily incorporated into a software application. This WEKA-based analysis and application construction process is illustrated through a case study in the agricultural domain—mushroom grading.
Inducing polynomial equations for regression
- In
, 2004
"... Both equation discovery and regression methods aim at inducing models of numerical data. While the equation discovery methods are usually evaluated in terms of comprehensibility of the induced model, the emphasis of the regression methods evaluation is on their predictive accuracy. In this paper, we ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Both equation discovery and regression methods aim at inducing models of numerical data. While the equation discovery methods are usually evaluated in terms of comprehensibility of the induced model, the emphasis of the regression methods evaluation is on their predictive accuracy. In this paper, we present Ciper, an efficient method for discovery of polynomial equations and empirically evaluate its predictive performance on standard regression tasks. The evaluation shows that polynomials compare favorably to linear and piecewise regression models, induced by the existing state-of-the-art regression methods, in terms of degree of fit and complexity. 1

