#### DMCA

## Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm (1988)

### Cached

### Download Links

Venue: | Machine Learning |

Citations: | 773 - 5 self |

### Citations

4844 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...ber of mistakes grows linearly with the number of irrelevant attributes. This is in keeping with theoretical bounds from the perceptron convergence theorems288 N. LITTLESTONE (Hampson & Volper, 1986; =-=Duda & Hart, 1973-=-; Nilsson, 1965). We know of no evidence that any other standard perceptron algorithm does better. In contrast, we will prove that the number of mistakes that our algorithm makes grows only logarithmi... |

1985 | A theory of the learnable.
- Valiant
- 1984
(Show Context)
Citation Context ... not knowing which few will prove useful. For another example, consider an environment in which the learners286 N. LITTLESTONE builds new concepts as Boolean functions of old concepts (Banerji, 1985; =-=Valiant, 1984-=-). Here the learner may need to sift through a large library of available concepts to find the suitable ones to use in expressing each new concept. In a special case of this situation, one may design ... |

965 |
Estimation of dependences based on empirical data
- Vapnik
- 1982
(Show Context)
Citation Context ...ive a lower bound for opt(C) in terms of the Vapnik-Chervonenkis (Vapnik &; Chervonenkis, 1971) dimension of C, which is a combinatorial parameter that has proven useful in other studies of learning (=-=Vapnik, 1982-=-; Blumer et al., 1987a; Haussler, Littlestone, & Warmuth, 1987).2 To define the Vapnik-Chervonenkis dimension, we use the notion of a shattered set. Definition 3 A set S C X is shattered by a target c... |

755 | Queries and concept learning. - Angluin - 1988 |

750 |
Parallel distributed processing: Explorations in the microstructure of cognition. Vol : Foundations.
- Rumelhart, McClelland
- 1986
(Show Context)
Citation Context ...g an algorithm for learning k-DNF. Our main result is an algorithm that deals efficiently with large numbers of irrelevant attributes. If desired, it can be implemented within a neural net framework (=-=Rumelhart & McClelland, 1986-=-) as a simple linearthreshold algorithm. The method learns certain classes of functions that can be computed by a one-layer linear-threshold network; these include, among other functions, disjunctions... |

727 | Learnability and the Vapnik-Chervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...und for opt(C) in terms of the Vapnik-Chervonenkis (Vapnik &; Chervonenkis, 1971) dimension of C, which is a combinatorial parameter that has proven useful in other studies of learning (Vapnik, 1982; =-=Blumer et al., 1987-=-a; Haussler, Littlestone, & Warmuth, 1987).2 To define the Vapnik-Chervonenkis dimension, we use the notion of a shattered set. Definition 3 A set S C X is shattered by a target class C if for every U... |

653 | Generalization as search. - Mitchell - 1982 |

339 | Inductive Inference: Theory and Methods”, - Angluin, Smith - 1983 |

231 |
Pattern Classi and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ... algorithm, the number of mistakes grows linearly with the number of irrelevant attributes. This is in keeping with theoretical bounds from the perceptron convergence theorem (Hampson & Volper, 1986; =-=Duda & Hart, 1973-=-; Nilsson, 1965). We know of no evidence that any other standard perceptron algorithm does better. In contrast, we will prove that the number of mistakes that our algorithm makes grows only logarithmi... |

216 |
Threshold Logic and Its Applications.
- Muroga
- 1971
(Show Context)
Citation Context ...e r-of-k threshold functions are contained in F({0,1}n, £). There exist other classes of linearly-separable Boolean functions for which 1 grows exponentially with n when the instance space is {0,1}" (=-=Muroga, 1971-=-; Hampson &; Volper, 1986). One example of a set of functions with exponentially small 6 consists of as n varies. For such functions, the mistake bound that we will derive grows exponentially with n. ... |

181 | On the learnability of boolean formulae.
- Kearns, Li, et al.
- 1987
(Show Context)
Citation Context ...es of functions that can be computed by a one-layer linear-threshold network; these include, among other functions, disjunctions, conjunctions, and r-of-k threshold functions (Hampson & Volper, 1986; =-=Kearns, Li, Pitt, & Valiant, 1987-=-a). (The latter functions are true if at least r out of k designated variables are true.) Preprocessing techniques can be used to extend the algorithm to classes of Boolean functions that are not line... |

176 | Learning Machines,
- Nilsson
- 1965
(Show Context)
Citation Context ...ws linearly with the number of irrelevant attributes. This is in keeping with theoretical bounds from the perceptron convergence theorems288 N. LITTLESTONE (Hampson & Volper, 1986; Duda & Hart, 1973; =-=Nilsson, 1965-=-). We know of no evidence that any other standard perceptron algorithm does better. In contrast, we will prove that the number of mistakes that our algorithm makes grows only logarithmically with the ... |

120 |
Learning disjunctions of conjunctions.
- Valiant
- 1985
(Show Context)
Citation Context ...ibrary will just be Boolean functions themselves. For example, consider k-DNF, the class of Boolean functions that can be represented in disjunctive normal form with no more than k literals per term (=-=Valiant, 1985-=-). If one has available intermediate concepts that include all conjunctions of no more than k literals, then any k-DNF function can be represented as a simple disjunction of these concepts. We will re... |

60 |
Predicting {0, 1} functions on randomly drawn points.
- Haussler, Littlestone, et al.
- 1994
(Show Context)
Citation Context ...s of the Vapnik-Chervonenkis (Vapnik &; Chervonenkis, 1971) dimension of C, which is a combinatorial parameter that has proven useful in other studies of learning (Vapnik, 1982; Blumer et al., 1987a; =-=Haussler, Littlestone, & Warmuth, 1987-=-).2 To define the Vapnik-Chervonenkis dimension, we use the notion of a shattered set. Definition 3 A set S C X is shattered by a target class C if for every U C S there exists a function / e C such t... |

49 | On the prediction of general recursive functions. - Barzdin, Frievald - 1972 |

37 |
Recent results on boolean concept learning
- Kearns, Li, et al.
- 1987
(Show Context)
Citation Context ...es of functions that can be computed by a one-layer linear-threshold network; these include, among other functions, disjunctions, conjunctions, and r-of-k threshold functions (Hampson & Volper, 1986; =-=Kearns, Li, Pitt, & Valiant, 1987-=-a). (The latter functions are true if at least r out of k designated variables are true.) Preprocessing techniques can be used to extend the algorithm to classes of Boolean functions that are not line... |

35 |
Linear function neurons: structure and training
- Hampson, Volper
- 1986
(Show Context)
Citation Context ...hod learns certain classes of functions that can be computed by a one-layer linear-threshold network; these include, among other functions, disjunctions, conjunctions, and r-of-k threshold functions (=-=Hampson & Volper, 1986-=-; Kearns, Li, Pitt, & Valiant, 1987a). (The latter functions are true if at least r out of k designated variables are true.) Preprocessing techniques can be used to extend the algorithm to classes of ... |

10 | Quantifying the inductive bias in concept learning - Haussler - 1986 |

2 | Space Efficient Learning Algorithms. Unpublished manuscript - Haussler - 1986 |

1 |
The logic of learning: A basis for pattern recognition and for improvement of performance
- Banerji
- 1985
(Show Context)
Citation Context ... consideration, not knowing which few will prove useful. For another example, consider an environment in which the learners286 N. LITTLESTONE builds new concepts as Boolean functions of old concepts (=-=Banerji, 1985-=-; Valiant, 1984). Here the learner may need to sift through a large library of available concepts to find the suitable ones to use in expressing each new concept. In a special case of this situation, ... |

1 | The programmer's guide to the Connection Machine (Technical Report - Slade - 1987 |

1 | Space ecient learning algorithms. Unpublished manuscript - Haussler - 1985 |