Results 1  10
of
85
Selection of relevant features and examples in machine learning
 ARTIFICIAL INTELLIGENCE
, 1997
"... In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been mad ..."
Abstract

Cited by 590 (2 self)
 Add to MetaCart
(Show Context)
In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare different methods. We close with some challenges for future work in this area.
An Efficient MembershipQuery Algorithm for Learning DNF with Respect to the Uniform Distribution
, 1994
"... We present a membershipquery algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this alg ..."
Abstract

Cited by 173 (13 self)
 Add to MetaCart
We present a membershipquery algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this algorithm for learning DNF over certain nonuniform distributions and for learning a class of geometric concepts that generalizes DNF. Furthermore, we show that DNF is weakly learnable with respect to uniform from noisy examples. Our strong learning algorithm utilizes one of Freund's boosting techniques and relies on the fact that boosting does not require a completely distributionindependent weak learner. The boosted weak learner is a nonuniform extension of a parityfinding algorithm discovered by Goldreich and Levin. 3 1 Introduction Consider the following 20questionslike game between two players, Bob and Alice. Bob has a Disjunctive Normal Form (DNF) expression f in mind. Alice is allo...
Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis
 IN PROCEEDINGS OF THE TWENTYSIXTH ANNUAL SYMPOSIUM ON THEORY OF COMPUTING
, 1994
"... We present new results on the wellstudied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learn ..."
Abstract

Cited by 131 (24 self)
 Add to MetaCart
(Show Context)
We present new results on the wellstudied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the rst positive result known for learning general DNF in polynomial time in a nontrivial model. Our results should be contrasted with those of Kharitonov [12], who proved that AC 0 is not eciently learnable in this model based on cryptographic assumptions. We also present ecient learning algorithms in various models for the readk and SATk subclasses of DNF. We then turn our attention to the recently introduced statistical query model of learning [9]. This model is a restricted version of the popular Probably Approximately Correct (PAC) model, and practically every PAC learning algorithm falls into the statistical query model [9]. We prove that DNF and decision trees are not even weakly learnable in polynomial time in this model. This result is informationtheoretic and therefore does not rely on any unproven assumptions, and demonstrates that no straightforward modication of the existing algorithms for learning various restricted forms of DNF and decision trees will solve the general problem. These lower bounds are a corollary of a more general characterization of the complexity of statistical query learning in terms of the number of uncorrelated functions in the concept class. The underlying tool for all of our results is the Fourier analysis of the concept class to be learned.
On the Boosting Ability of TopDown Decision Tree Learning Algorithms
 In Proceedings of the TwentyEighth Annual ACM Symposium on the Theory of Computing
, 1995
"... We analyze the performance of topdown algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions used to label the internal nodes of ..."
Abstract

Cited by 99 (6 self)
 Add to MetaCart
(Show Context)
We analyze the performance of topdown algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions used to label the internal nodes of the decision tree can weakly approximate the unknown target function, then the topdown algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the splitting criterion function G used by the topdown algorithm. More precisely, if the functions used to label the internal nodes have error 1=2 \Gamma fl as approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1=ffl) O(1=fl 2 ffl 2 ) and (1=ffl) O(log(1=ffl)=fl 2 ) (respectively) suffice to drive the error below ffl. Thus, small constant advantage over...
Oracles and Queries that are Sufficient for Exact Learning
 Journal of Computer and System Sciences
, 1996
"... We show that the class of all circuits is exactly learnable in randomized expected polynomial time using weak subset and weak superset queries. This is a consequence of the following result which we consider to be of independent interest: circuits are exactly learnable in randomized expected poly ..."
Abstract

Cited by 85 (5 self)
 Add to MetaCart
We show that the class of all circuits is exactly learnable in randomized expected polynomial time using weak subset and weak superset queries. This is a consequence of the following result which we consider to be of independent interest: circuits are exactly learnable in randomized expected polynomial time with equivalence queries and the aid of an NPoracle. We also show that circuits are exactly learnable in deterministic polynomial time with equivalence queries and a \Sigma 3 oracle. The hypothesis class for the above learning algorithms is the class of circuits of largerbut polynomially relatedsize. Also, the algorithms can be adapted to learn the class of DNF formulas with hypothesis class consisting of depth3  formulas (by the work of Angluin [A90], this is optimal in the sense that the hypothesis class cannot be reduced to DNF formulas, i.e. depth2  formulas).
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract

Cited by 83 (3 self)
 Add to MetaCart
(Show Context)
To Mom, Dad, and Susan, for their support and encouragement.
Defaultreasoning with models
"... Reasoning with modelbased representations is an intuitive paradigm, which has been shown to be theoretically sound and to possess some computational advantages over reasoning with formulabased representations of knowledge. In this paper we present more evidence to the utility of such representatio ..."
Abstract

Cited by 83 (20 self)
 Add to MetaCart
Reasoning with modelbased representations is an intuitive paradigm, which has been shown to be theoretically sound and to possess some computational advantages over reasoning with formulabased representations of knowledge. In this paper we present more evidence to the utility of such representations. In real life situations, one normally completes a lot of missing "context" information when answering queries. We model this situation by augmenting the available knowledge about the world with contextspecific information; we show that reasoning with modelbased representations can be done efficiently in the presence of varying context information. We then consider the task of default reasoning. We show that default reasoning is a generalization of reasoning within context, in which the reasoner has many "context" rules, which may be conflicting. We characterize the cases in which modelbased reasoning supports efficient default reasoning and develop algorithms that handle efficiently fragments of Reiter's default logic. In particular, this includes cases in which performing the default reasoning task with the traditional, formulabased, representation is intractable. Further, we argue that these results support an incremental view of reasoning in a natural way.
Identification of Gene Regulatory Networks by Strategic Gene Disruptions and Gene Overexpressions
 PROC. NINTH ACMSIAM SYMP. DISCRETE ALGORITHMS (SODA'98, IN
, 1998
"... ..."
Learning to reason
 Journal of the ACM
, 1994
"... Abstract. We introduce a new framework for the study of reasoning. The Learning (in order) to Reason approach developed here views learning as an integral part of the inference process, and suggests that learning and reasoning should be studied together. The Learning to Reason framework combines the ..."
Abstract

Cited by 70 (26 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce a new framework for the study of reasoning. The Learning (in order) to Reason approach developed here views learning as an integral part of the inference process, and suggests that learning and reasoning should be studied together. The Learning to Reason framework combines the interfaces to the world used by known learning models with the reasoning task and a performance criterion suitable for it. In this framework, the intelligent agent is given access to its favorite learning interface, and is also given a grace period in which it can interact with this interface and construct a representation KB of the world W. The reasoning performance is measured only after this period, when the agent is presented with queries � from some query language, relevant to the world, and has to answer whether W implies �. The approach is meant to overcome the main computational difficulties in the traditional treatment of reasoning which stem from its separation from the “world”. Since the agent interacts with the world when constructing its knowledge representation it can choose a representation that is useful for the task at hand. Moreover, we can now make explicit the dependence of the reasoning performance on the environment the agent interacts with. We show how previous results from learning theory and reasoning fit into this framework and
Splitters and nearoptimal derandomization
"... We present a fairly general method for finding deterministic constructions obeying what we call krestrictions; this yields structures of size not much larger than the probabilistic bound. The structures constructed by our method include (n; k)universal sets (a collection of binary vectors of lengt ..."
Abstract

Cited by 64 (1 self)
 Add to MetaCart
We present a fairly general method for finding deterministic constructions obeying what we call krestrictions; this yields structures of size not much larger than the probabilistic bound. The structures constructed by our method include (n; k)universal sets (a collection of binary vectors of length n such that for any subset of size k of the indices, all 2k configurations appear) and families of perfect hash functions. The nearoptimal constructions of these objects imply the very efficient derandomization of algorithms in learning, of fixedsubgraph finding algorithms, and of near optimal threshold formulae. In addition, they derandomize the reduction showing the hardness of approximation of set cover. They also yield deterministic constructions for a localcoloring protocol, and for exhaustive testing of circuits.