Results 1 - 10
of
18
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
"... We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: “what are the implicit statistical assumptions of feature selection criter ..."
Abstract
-
Cited by 39 (6 self)
- Add to MetaCart
(Show Context)
We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: “what are the implicit statistical assumptions of feature selection criteria based on mutual information?”. To answer this, we adopt a different strategy than is usual in the feature selection literature—instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature ‘relevancy ’ and ‘redundancy’, our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.
Turn on, Tune in, Drop out: Anticipating student dropouts
- in Massive Open Online Courses, in NIPS Data-Driven Education Workshop
, 2013
"... In this paper, we explore student dropout behavior in Massive Open Online Cours-es(MOOC). We use as a case study a recent Coursera class from which we develop a survival model that allows us to measure the influence of factors extracted from that data on student dropout rate. Specifically we explore ..."
Abstract
-
Cited by 17 (10 self)
- Add to MetaCart
In this paper, we explore student dropout behavior in Massive Open Online Cours-es(MOOC). We use as a case study a recent Coursera class from which we develop a survival model that allows us to measure the influence of factors extracted from that data on student dropout rate. Specifically we explore factors related to studen-t behavior and social positioning within discussion forums using standard social network analytic techniques. The analysis reveals several significant predictors of dropout. 1
Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood
"... We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposa ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional log-likelihood scoring criterion. The resulting criterion has an information-theoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with state-of-the-art classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLL-trained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.
Searching for Interacting Features in Subset Selection
, 2007
"... The evolving and adapting capabilities of robust intelligence are best manifested in its ability to learn. Machine learning enables computer systems to learn, and improve performance. Feature selection facilitates machine learning (e.g., classification) by aiming to remove irrelevant features. Featu ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
The evolving and adapting capabilities of robust intelligence are best manifested in its ability to learn. Machine learning enables computer systems to learn, and improve performance. Feature selection facilitates machine learning (e.g., classification) by aiming to remove irrelevant features. Feature (attribute) interaction presents a challenge to feature subset selection for classification. This is because a feature by itself might have little correlation with the target concept, but when it is combined with some other features, they can be strongly correlated with the target concept. Thus, the unintentional removal of these features may result in poor classification performance. It is computationally intractable to handle feature interactions in general. However, the presence of feature interaction in a wide range of real-world applications demands practical solutions that can reduce high-dimensional data while perpetuating feature interactions. In this paper, we take up the challenge to design a special data structure for feature quality evaluation, and to employ an information-theoretic feature ranking mechanism to efficiently handle feature interaction in subset selection. We conduct experiments to evaluate our approach by comparing with some representative methods, perform a lesion study to examine the critical components of the proposed algorithm to gain insights, and investigate related issues such as data structure, ranking, time complexity, and scalability in search of interacting features.
Redundancy in Systems Which Entertain a Model of Themselves: Interaction Information and the Self-Organization of Anticipation
"... entropy ..."
(Show Context)
Structuration” by intellectual organization: The configuration of knowledge in relations among structural components in networks of science
- Scientometrics
, 2011
"... ..."
(Show Context)
Parallel Feature Selection inspired by Group Testing
"... This paper presents a parallel feature selection method for classification that scales up to very high dimensions and large data sizes. Our original method is inspired by group testing theory, under which the feature selection procedure consists of a collection of randomized tests to be performed in ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents a parallel feature selection method for classification that scales up to very high dimensions and large data sizes. Our original method is inspired by group testing theory, under which the feature selection procedure consists of a collection of randomized tests to be performed in parallel. Each test corresponds to a subset of features, for which a scoring function may be applied to measure the relevance of the features in a classification task. We develop a general the-ory providing sufficient conditions under which true features are guaranteed to be correctly identified. Superior performance of our method is demonstrated on a challenging relation extraction task from a very large data set that have both redundant features and sample size in the order of millions. We present compre-hensive comparisons with state-of-the-art feature selection methods on a range of data sets, for which our method exhibits competitive performance in terms of run-ning time and accuracy. Moreover, it also yields substantial speedup when used as a pre-processing step for most other existing methods. 1
Context-dependent feature analysis with random forests
"... Abstract In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumsta ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.
Multivariate Mutual Information Measures for Discovering Biological Networks
"... Abstract—Most studies on biological networks until now focus only on pairwise interactions/relationships. However interactions/relationships participated by more than two molecules are popular in biology. In this paper, we introduce multivariate mutual information measures to reconstruct multivariat ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Most studies on biological networks until now focus only on pairwise interactions/relationships. However interactions/relationships participated by more than two molecules are popular in biology. In this paper, we introduce multivariate mutual information measures to reconstruct multivariate interactions/relationships in biological networks. I.