• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Machine learning based on attribute interactions. (2005)

by A Jakulin
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 18
Next 10 →

Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

by Gavin Brown, Adam Pocock, Ming-jie Zhao, Mikel Luján, Isabelle Guyon
"... We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: “what are the implicit statistical assumptions of feature selection criter ..."
Abstract - Cited by 39 (6 self) - Add to MetaCart
We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: “what are the implicit statistical assumptions of feature selection criteria based on mutual information?”. To answer this, we adopt a different strategy than is usual in the feature selection literature—instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature ‘relevancy ’ and ‘redundancy’, our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.
(Show Context)

Citation Context

... procedures. The most stable was the univariate mutual information, followed closely by JMI (Yang and Moody, 1999; Meyer et al., 2008); while among the least stable are MIFS (Battiti, 1994) and ICAP (=-=Jakulin, 2005-=-). As visualised by multi-dimensional scaling in Figure 5, several criteria appear to return quite similar sets, while there are some outliers. How do criteria behave in limited and extreme small-samp...

Turn on, Tune in, Drop out: Anticipating student dropouts

by Diyi Yang, Tanmay Sinha, David Adamson, Carolyn Penstein Rose - in Massive Open Online Courses, in NIPS Data-Driven Education Workshop , 2013
"... In this paper, we explore student dropout behavior in Massive Open Online Cours-es(MOOC). We use as a case study a recent Coursera class from which we develop a survival model that allows us to measure the influence of factors extracted from that data on student dropout rate. Specifically we explore ..."
Abstract - Cited by 17 (10 self) - Add to MetaCart
In this paper, we explore student dropout behavior in Massive Open Online Cours-es(MOOC). We use as a case study a recent Coursera class from which we develop a survival model that allows us to measure the influence of factors extracted from that data on student dropout rate. Specifically we explore factors related to studen-t behavior and social positioning within discussion forums using standard social network analytic techniques. The analysis reveals several significant predictors of dropout. 1

Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood

by Alexandra M. Carvalho, Teemu Roos, Arlindo L. Oliveira, Petri Myllymäki, Russell Greiner
"... We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposa ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional log-likelihood scoring criterion. The resulting criterion has an information-theoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with state-of-the-art classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLL-trained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.

Searching for Interacting Features in Subset Selection

by Zheng Zhao, Huan Liu , 2007
"... The evolving and adapting capabilities of robust intelligence are best manifested in its ability to learn. Machine learning enables computer systems to learn, and improve performance. Feature selection facilitates machine learning (e.g., classification) by aiming to remove irrelevant features. Featu ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
The evolving and adapting capabilities of robust intelligence are best manifested in its ability to learn. Machine learning enables computer systems to learn, and improve performance. Feature selection facilitates machine learning (e.g., classification) by aiming to remove irrelevant features. Feature (attribute) interaction presents a challenge to feature subset selection for classification. This is because a feature by itself might have little correlation with the target concept, but when it is combined with some other features, they can be strongly correlated with the target concept. Thus, the unintentional removal of these features may result in poor classification performance. It is computationally intractable to handle feature interactions in general. However, the presence of feature interaction in a wide range of real-world applications demands practical solutions that can reduce high-dimensional data while perpetuating feature interactions. In this paper, we take up the challenge to design a special data structure for feature quality evaluation, and to employ an information-theoretic feature ranking mechanism to efficiently handle feature interaction in subset selection. We conduct experiments to evaluate our approach by comparing with some representative methods, perform a lesion study to examine the critical components of the proposed algorithm to gain insights, and investigate related issues such as data structure, ranking, time complexity, and scalability in search of interacting features.
(Show Context)

Citation Context

...work and available for research purposes upon request. The experiments are conducted in the WEKA environment. Twenty seven (27) benchmark data sets are used in which 23 data sets were investigated in =-=[35]-=- and are available from the UCI ML Repository [36]. Among the 4 additional benchmark data sets, ‘musk2’, ‘USCensus90’, and ‘internet-ads’ are from the UCI ML Repository, and the ‘45×4026+2C’ data is f...

Redundancy in Systems Which Entertain a Model of Themselves: Interaction Information and the Self-Organization of Anticipation

by Loet Leydesdorff
"... entropy ..."
Abstract - Cited by 7 (6 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...idered as Shannon-type information. Watanabe [21] had made the same argument, but this was forgotten in the blossoming literature using Q-measures for the measurement of “configurational information” =-=[22]-=-. Yeung [16] used the deviant symbol μ* to indicate that mutual information in three or more dimensions is not Shannon-type information. However, Krippendorff specified in detail why the reasoning amo...

Structuration” by intellectual organization: The configuration of knowledge in relations among structural components in networks of science

by L Leydesdorff - Scientometrics , 2011
"... ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...teraction among three or more dimensions. Configurational information has the seemingly attractive property of indicating synergy in the information transfer in terms of negative and positive values (=-=Jakulin, 2005-=-). However, this information is not a Shannon-measure and therefore has remained difficult to interpret (Watanabe, 1960; Yeung, 2008, at p. 59). Garner & McGill (1956, at p. 225) noted that a negative...

Parallel Feature Selection inspired by Group Testing

by Yingbo Zhou, Utkarsh Porwal, Ce Zhang, Hung Ngo, Xuanlong Nguyen, Venu Govindaraju
"... This paper presents a parallel feature selection method for classification that scales up to very high dimensions and large data sizes. Our original method is inspired by group testing theory, under which the feature selection procedure consists of a collection of randomized tests to be performed in ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
This paper presents a parallel feature selection method for classification that scales up to very high dimensions and large data sizes. Our original method is inspired by group testing theory, under which the feature selection procedure consists of a collection of randomized tests to be performed in parallel. Each test corresponds to a subset of features, for which a scoring function may be applied to measure the relevance of the features in a classification task. We develop a general the-ory providing sufficient conditions under which true features are guaranteed to be correctly identified. Superior performance of our method is demonstrated on a challenging relation extraction task from a very large data set that have both redundant features and sample size in the order of millions. We present compre-hensive comparisons with state-of-the-art feature selection methods on a range of data sets, for which our method exhibits competitive performance in terms of run-ning time and accuracy. Moreover, it also yields substantial speedup when used as a pre-processing step for most other existing methods. 1

Exploring the Potential of Schemes in Building NLP Tools for Arabic Language

by Mohamed Achraf , Ben Mohamed , Souheyl Mallat , Mohamed Amine Nahdi , Mounir Zrigui
"... ..."
Abstract - Add to MetaCart
Abstract not found

Context-dependent feature analysis with random forests

by Antonio Sutera , Gilles Louppe , Anh Vân , Huynh-Thu , Louis Wehenkel , Pierre Geurts
"... Abstract In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumsta ..."
Abstract - Add to MetaCart
Abstract In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.
(Show Context)

Citation Context

... X1 can not be characterized globally since we have simultaneously: I(Y ;X1|X2 = 0, Xc = xc) > I(Y ;X1|X2 = 0) I(Y ;X1|X2 = 1, Xc = xc) < I(Y ;X1|X2 = 1), for both xc = 0 and xc = 1. X2 is however contextcomplementary as the knowledge of Xc always increases the information it contains about Y . Related works. Several authors have studied interactions between variables in the context of supervised learning. They have come up with various interaction definitions and measures, e.g., based on multivariate mutual information (McGill, 1954; Jakulin and Bratko, 2003), conditional mutual information (Jakulin, 2005; Van de Cruys, 2011), or variants thereof (Brown, 2009; Brown et al., 2012). There are several differences between these definitions and ours. In our case, the context variable has a special status and as a consequence, our definition is inherently asymmetric, while most existing variable interaction measures are symmetric. In addition, we are interested in detecting any information difference occurring in a given context (i.e., for a specific value of Xc) and for any conditioning subset B, while most interaction analyses are interested in average and/or unconditional effects. For example, (J...

Multivariate Mutual Information Measures for Discovering Biological Networks

by Tho Hoan Pham, Tu Bao Ho, Quynh Diep Nguyen, Dang Hung Tran, Van Hoang Nguyen
"... Abstract—Most studies on biological networks until now focus only on pairwise interactions/relationships. However interactions/relationships participated by more than two molecules are popular in biology. In this paper, we introduce multivariate mutual information measures to reconstruct multivariat ..."
Abstract - Add to MetaCart
Abstract—Most studies on biological networks until now focus only on pairwise interactions/relationships. However interactions/relationships participated by more than two molecules are popular in biology. In this paper, we introduce multivariate mutual information measures to reconstruct multivariate interactions/relationships in biological networks. I.
(Show Context)

Citation Context

...ual information and interaction information [18], [19], [20], [21]. While the first one is clearly understood and interpreted, there have been a lot of controversy on the interpretation of the second =-=[22]-=-, [17]. Forever and most importantly, there has been no research work that systematically identifies different types of dependences existing among multiple variables and provides appropriate measures ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University