Results

**1 - 6**of**6**### )L i &6s UNIVERSITY OF STIRLING From 'Tree ' Based Bayesian Networks To Mutual Information Classifiers: Deriving a Singly Connected Network Classifier Using an Information Theory Based Technique

, 2005

"... This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that the copyright rests with the author and that no quotation from the thesis and no information derived from it may be published without the prior written consent of the author or the Univ ..."

Abstract
- Add to MetaCart

(Show Context)
This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that the copyright rests with the author and that no quotation from the thesis and no information derived from it may be published without the prior written consent of the author or the University (as may be appropriate). To my Father William Thomas For reasoning under uncertainty the Bayesian network has become the representation of choice. However, except where models are considered 'simple ' the task of construction and inference are provably NP-hard. For modelling larger 'real ' world problems this computational complexity has been addressed by methods that approximate the model. The Naive Bayes classifier, which has strong assumptions of independence among features, is a common approach, whilst the class of trees is another less extreme example. In this thesis we propose the use of an information theory based technique as a mechanism for inference in Singly Connected Networks. We call this a Mutual Information Measure classifier, as it corresponds to the restricted class of trees built from mutual information. We show that the new approach provides for both an efficient and localised method of

### Data mining of Bayesian networks . . .

"... This paper describes a novel data mining algorithm that employs cooperative coevolution and a hybrid approach to discover Bayesian networks from data. A Bayesian network is a graphical knowledge representation tool. However, learning Bayesian networks from data is a difficult problem. There are two ..."

Abstract
- Add to MetaCart

This paper describes a novel data mining algorithm that employs cooperative coevolution and a hybrid approach to discover Bayesian networks from data. A Bayesian network is a graphical knowledge representation tool. However, learning Bayesian networks from data is a difficult problem. There are two different approaches to the network learning problem. The first one uses dependency analysis, while the second approach searches good network structures according to a metric. Unfortunately, the two approaches both have their own drawbacks. Thus, we propose a novel algorithm that combines the characteristics of these approaches to improve learning effectiveness and efficiency. The new learning algorithm consists of the Conditional Independence (CI) test and the search phases. In the CI test phase, dependency analysis is conducted to reduce the size of the search space. In the search phase, good Bayesian networks are generated by a cooperative coevolution genetic algorithm. We conduct a number of experiments and compare the new algorithm with our previous algorithm, Minimum Description Length and Evolutionary Programming (MDLEP), which uses evolutionary programming for network learning. The results illustrate that the new algorithm has better performance. We apply the algorithm to a large real-world data set and compare the performance of the discovered Bayesian networks with that of the backpropagation neural networks and the logistic regression models. This study illustrates that the algorithm is a promising alternative to other data mining algorithms.

### Bayesian network data imputation with application to survival tree analysis

"... Abstract Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternativel ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation-maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

### Decision

"... Given the explosive growth of data collected from current business environment, data mining can potentially discover new knowledge to improve managerial decision making. We propose a novel data mining approach that employs evolutionary programming to discover knowledge represented in Bayesian networ ..."

Abstract
- Add to MetaCart

(Show Context)
Given the explosive growth of data collected from current business environment, data mining can potentially discover new knowledge to improve managerial decision making. We propose a novel data mining approach that employs evolutionary programming to discover knowledge represented in Bayesian networks and apply the approach to marketing data. There are two different approaches to the network learning problem. The first one uses dependency analysis, while the second approach searches good network structures according to a metric. Unfortunately, the two approaches both have their own drawbacks. Thus, we propose a novel hybrid of the two approaches. With this new idea, we endeavor to improve upon our previous work, MDLEP, which uses evolutionary programming for network learning. We also introduce a new operator to further enhance the search efficiency. We conduct a number of experiments and compare the hybrid approach with MDLEP. The empirical results illustrate that the approach improves over MDLEP. 1

### CEC IEEE Data Mining Using Parallel Multi-Objective Evolutionary Algorithms on Graphics Hardware

"... Abstract — An important and challenging data mining application in marketing is to learn models for predicting potential customers who contribute large profit to a company under resource constraints. In this paper, we first formulate this learning problem as a constrained optimization problem and th ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract — An important and challenging data mining application in marketing is to learn models for predicting potential customers who contribute large profit to a company under resource constraints. In this paper, we first formulate this learning problem as a constrained optimization problem and then converse it to an unconstrained Multi-objective Optimization Problem (MOP). A parallel Multi-Objective Evolutionary Algorithm (MOEA) on consumer-level graphics hardware is used to handle the MOP. We perform experiments on a real-life direct marketing problem to compare the proposed method with the parallel Hybrid Genetic Algorithm, the DMAX approach, and a sequential MOEA. It is observed that the proposed method is much more effective and efficient than the other approaches. I.

### Bayesian

"... network data imputation with application to survival tree analysis ..."

(Show Context)