Results 1  10
of
37
Not so naive Bayes: Aggregating onedependence estimators
 Machine Learning
, 2005
"... Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and superparent TAN have demonstrated remarkable error performance. However, both techniques obtain this outcome at a considerable computational cost. We present a new approach ..."
Abstract

Cited by 94 (11 self)
 Add to MetaCart
(Show Context)
Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and superparent TAN have demonstrated remarkable error performance. However, both techniques obtain this outcome at a considerable computational cost. We present a new approach to weakening the attribute independence assumption by averaging all of a constrained class of classifiers. In extensive experiments this technique delivers comparable prediction accuracy to LBR and superparent TAN with substantially improved computational e#ciency at test time relative to the former and at training time relative to the latter. The new algorithm is shown to have low variance and is suited to incremental learning.
On Why Discretization Works for NaiveBayes Classifiers
 In Proceedings of the 16th Australian Joint Conference on Artificial Intelligence (AI
, 2003
"... We investigate why discretization is effective in naiveBayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naiveBayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functio ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
We investigate why discretization is effective in naiveBayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naiveBayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed.
A Clustering Comparison Measure Using Density Profiles and its Application to the Discovery of Alternate Clusterings
 DATA MINING AND KNOWLEDGE DISCOVERY
"... Data clustering is a fundamental and very popular method of data analysis. Its subjective nature, however, means that different clustering algorithms or different parameter settings can produce widely varying and sometimes conflicting results. This has led to the use of clustering comparison measure ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Data clustering is a fundamental and very popular method of data analysis. Its subjective nature, however, means that different clustering algorithms or different parameter settings can produce widely varying and sometimes conflicting results. This has led to the use of clustering comparison measures to quantify the degree of similarity between alternative clusterings. Existing measures, though, can be limited in their ability to assess similarity and sometimes generate unintuitive results. They also cannot be applied to compare clusterings which contain different data points, an activity which is important for scenarios such as data stream analysis. In this paper, we introduce a new clustering similarity measure, known as ADCO, which aims to address some limitations of existing measures, by allowing greater flexibility of comparison via the use of density profiles to characterize a clustering. In particular, it adopts a ‘data mining style’ philosophy to clustering comparison, whereby two clusterings are considered to be more similar, if they are likely to give rise to similar types of prediction models. Furthermore, we show that this new measure can be applied as a highly effective objective function within a new algorithm, known as MAXIMUS, for generating alternate clusterings.
Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes
 INTERNATIONAL JOURNAL OF APPROXIMATE REASONING
, 2006
"... Most of the Bayesian networkbased classifiers are usually only able to handle discrete variables. However, most realworld domains involve continuous variables. A common practice to deal with continuous variables is to discretize them, with a subsequent loss of information. This work shows how disc ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Most of the Bayesian networkbased classifiers are usually only able to handle discrete variables. However, most realworld domains involve continuous variables. A common practice to deal with continuous variables is to discretize them, with a subsequent loss of information. This work shows how discrete classifier induction algorithms can be adapted to the conditional Gaussian network paradigm to deal with continuous variables without discretizing them. In addition, three novel classifier induction algorithms and two new propositions about mutual information are introduced. The classifier induction algorithms presented are ordered and grouped according to their structural complexity: naive Bayes, tree augmented naive Bayes, kdependence Bayesian classifiers and semi naive Bayes. All the classifier induction algorithms are empirically evaluated using predictive accuracy, and they are compared to linear discriminant analysis, as a continuous classic statistical benchmark classifier. Besides, the accuracies for a set of stateoftheart classifiers are included in order to justify the use of linear discriminant analysis as the benchmark algorithm. In order to understand the behavior of the conditional Gaussian networkbased classifiers better, the results include biasvariance decomposition of the expected misclassification rate. The study suggests that semi naive Bayes structure based classifiers and, especially, the novel wrapper condensed semi naive Bayes backward, outperform the behavior of the rest of the presented classifiers. They also obtain quite competitive results compared to the stateoftheart algorithms included.
Estimating Bias and Variance from Data
, 2004
"... The biasvariance decomposition of error provides useful insights into the error performance of a classifier as it is applied to di#erent types of learning task. Most notably, it has been used to explain the extraordinary e#ectiveness of ensemble learning techniques. It is important that the researc ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
The biasvariance decomposition of error provides useful insights into the error performance of a classifier as it is applied to di#erent types of learning task. Most notably, it has been used to explain the extraordinary e#ectiveness of ensemble learning techniques. It is important that the research community have e#ective tools for assessing such explanations. To this end, techniques have been developed for estimating bias and variance from data. The most widely deployed of these uses repeated subsampling with a holdout set. We argue, with empirical support, that this approach has serious limitations. First, it provides very little flexibility in the types of distributions of training sets that may be studied. It requires that the training sets be relatively small and that the degree of variation between training sets be very circumscribed. Second, the approach leads to bias and variance estimates that have high statistical variance and hence low reliability. We develop an alternative method that is based on crossvalidation. We show that this method allows far greater flexibility in the types of distribution that are examined and that the estimates derived are much more stable. Finally, we show that changing the distributions of training sets from which bias and variance estimates are drawn can alter substantially the bias and variance estimates that are derived.
InformationTheoretic Inference of Gene Networks Using Backward Elimination
"... Abstract — Unraveling transcriptional regulatory networks is essential for understanding and predicting cellular responses in different developmental and environmental contexts. Informationtheoretic methods of network inference have been shown to produce highquality reconstructions because of thei ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Abstract — Unraveling transcriptional regulatory networks is essential for understanding and predicting cellular responses in different developmental and environmental contexts. Informationtheoretic methods of network inference have been shown to produce highquality reconstructions because of their ability to infer both linear and nonlinear dependencies between regulators and targets. In this paper, we introduce MRNETB an improved version of the previous informationtheoretic algorithm, MRNET, which has competitive performance with stateoftheart algorithms. MRNET infers a network by using a forward selection strategy to identify a maximallyindependent set of neighbors for every variable. However, a known limitation of algorithms based on forward selection is that the quality of the selected subset strongly depends on the first variable selected. In this paper, we present MRNETB, an improved version of MRNET that overcomes this limitation by using a backward selection strategy followed by a sequential replacement. Our new variable selection procedure can be implemented with the same computational cost as the forward selection strategy. MRNETB was benchmarked against MRNET and two other informationtheoretic algorithms, CLR and ARACNE. Our benchmark comprised 15 datasets generated from two regulatory network simulators, 10 of which are from the DREAM4 challenge, which was recently used to compare over 30 network inference methods. To assess stability of our results, each method was implemented with two estimators of mutual information. Our results show that MRNETB has significantly better performance than MRNET, irrespective of the mutual information estimation method. MRNETB also performs comparably to CLR and significantly better than ARACNE indicating that our new variable selection strategy can successfully infer highquality networks.
Incremental Discretization for NaïveBayes Classifier
"... Abstract. NaïveBayes classifiers (NB) support incremental learning. However, the lack of effective incremental discretization methods has been hindering NB’s incremental learning in face of quantitative data. This problem is further compounded by the fact that quantitative data are everywhere, from ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. NaïveBayes classifiers (NB) support incremental learning. However, the lack of effective incremental discretization methods has been hindering NB’s incremental learning in face of quantitative data. This problem is further compounded by the fact that quantitative data are everywhere, from temperature readings to share prices. In this paper, we present a novel incremental discretization method for NB, incremental flexible frequency discretization (IFFD). IFFD discretizes values of a quantitative attribute into a sequence of intervals of flexible sizes. It allows online insertion and splitting operation on intervals. Theoretical analysis and experimental test are conducted to compare IFFD with alternative methods. Empirical evidence suggests that IFFD is efficient and effective. NB coupled with IFFD achieves a rapport between high learning efficiency and high classification accuracy in the context of incremental learning. 1
A Case Study in Feature Invention for Breast Cancer Diagnosis Using XRay Scatter Images
"... Xray mammography is the current method for screening for breast cancer, and like any technique, has its limitations. Several groups have reported di#erences in the Xray scattering patterns of normal and tumour tissue from the breast. This gives rise to the hope that Xray scatter analysis techniqu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Xray mammography is the current method for screening for breast cancer, and like any technique, has its limitations. Several groups have reported di#erences in the Xray scattering patterns of normal and tumour tissue from the breast. This gives rise to the hope that Xray scatter analysis techniques may lead to a more accurate and cost e#ective method of diagnosing beast cancer which lends itself to automation. This is a particularly challenging exercise due to the inherent complexity of the information content in Xray scatter patterns from complex hetrogenous tissue samples. We use a simple nave Bayes classier, coupled with Equal Frequency Discretization (EFD) as our classification system. Highlevel features are extracted from the lowlevel pixel data. This paper reports some preliminary results in the ongoing development of this classification method that can distinguish between the di#raction patterns of normal and cancerous tissue, with particular emphasis on the invention of features for classification.
Maximally Bijective Discretization for Datadriven Modeling of Complex Systems
"... Phasespace discretization is a necessary step for study of continuous dynamical systems using a languagetheoretic approach. It is also critical for many machine learning techniques, e.g., probabilistic graphical models (Bayesian Networks, Markov models). This paper proposes a novel discretization ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Phasespace discretization is a necessary step for study of continuous dynamical systems using a languagetheoretic approach. It is also critical for many machine learning techniques, e.g., probabilistic graphical models (Bayesian Networks, Markov models). This paper proposes a novel discretization method – Maximally Bijective Discretization, that finds a discretization on the dependent variables given a discretization on the independent variables such that the correspondence between input and output variables in the continuous domain is preserved in discrete domain for the given dynamical system. 1.
A Probabilistic Approach to Mining Geospatial Knowledge from Social Annotations ABSTRACT
"... Usergenerated content, such as photos and videos, is often annotated by users with freetext labels, called tags. Increasingly, such content is also georeferenced, i.e., it is associated with geographic coordinates. The implicit relationships between tags and their locations can tell us much about ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Usergenerated content, such as photos and videos, is often annotated by users with freetext labels, called tags. Increasingly, such content is also georeferenced, i.e., it is associated with geographic coordinates. The implicit relationships between tags and their locations can tell us much about how people conceptualize places and relations between them. However, extracting such knowledge from social annotations presents many challenges, since annotations are often ambiguous, noisy, uncertain and spatially inhomogeneous. We introduce a probabilistic framework for modeling georeferenced annotations and a method for learning model parameters from data. The framework is flexible and general, and can be used in a variety of applications that mine geospatial knowledge from usergenerated content. Specifically, we study three problems: extracting place semantics, predicting locations of photos and learning partof relations between places. We show our method performs well compared to stateoftheart approaches developed for the first two problems, and offers a novel solution to the problem of learning relations between places. 1.