| T. Bylander, Learning linear threshold functions in the presence of classification noise, Proceedings of the Workshop on Computational Learning Theory, 1994. |
....problem being solved is identical to that considered by Spielman and Teng, except that we replace the objective function max c x by a constraint c x c 0 . In addition to simplicity, the perceptron algorithm has other bene cial features, such as resilience to certain types of random noise[4, 5, 6]. Speci cally, we prove the following result, where all probability statements are with respect to the random Gaussian perturbation of variance . Note that each iteration of the perceptron algorithm takes O(md) time, just like the simplex algorithm. Theorem 1.1. Perceptron Smoothed ....
T. Bylander. Learning linear threshold functions in the presence of classi cation noise. In Proceedings of the Seventh Annual Workshop on Computational Learning Theory, pages 340-347. ACM Press, New York, NY, 1994.
.... easier to cope with, and malicious noise at the other, have been investigated, and methods of adapting learning algorithms so as to become provably error resilient devised [12] In particular learning algorithms for linear separators are known that are provably resilient to classification noise [4, 5]. In the experimental area also experience on this is broad. Most data sets arising from complex real world situations, such as medical diagnosis, are best viewed as noisy in the sense that the information contained in the specification of each case appears insufficient to make the diagnosis with ....
T. Bylander. Learning linear threshold functions in the presence of classification noise. In Proc. 7th ACM Conference on Computational Learning Theory, pages 340--347, 1994.
.... nding a vector w that minimizes the number of misclassi ed points is NP hard, variants on the Perceptron Algorithm typically do well in practice[Gal90, Ama94] In fact, it is possible to provide guarantees for variations on the Perceptron Algorithm in the presence of inconsistent data (e.g. see [Byl93, Byl94, Kea93] 2 ) under models in which the inconsistency is produced by a suciently benign process, such as the random classi cation noise model discussed below. In this paper, we present a version of the Perceptron Algorithm that maintains its properties of noise tolerance, while providing ....
....in the case of zero noise and describe how the Perceptron Algorithm can be modi ed and combined with the procedure from the Outlier Removal Lemma to produce a polynomial time PAC learning algorithm. Finally, we describe how the algorithm can be adjusted to the noisy case using known techniques [Byl94, Kea93, AD94]. 1.2 Notation, de nitions, and preliminaries In this paper, we consider the problem of learning linear threshold functions in the PAC model in the presence of random classi cation noise [KV94] The problem can be stated as follows. We are given access to examples (points) drawn from some ....
[Article contains additional citation context not shown here]
T. Bylander. Learning linear threshold functions in the presence of classication noise. In Proceedings of the Seventh Annual Workshop on Computational Learning Theory, pages 340-347. ACM Press, New York, NY, 1994. 17
....xj. Using lemma 3, and theorems 2, 3, we have the main result of this section. Theorem 4 An robust half space in R n can be PAC learned using O( log 1 2 ) examples in O( n 2 ) time. The Perceptron Algorithm is known to be tolerant to various types of classification noise [4, 2, 6]. It is a straightforward consequence that these properties continue to hold for our algorithm. In the concluding section we discuss straightforward bounds for agnostic learning. 4.2 Intersections of half spaces The next problem we consider is learning an intersection of m half spaces in R n , ....
T. Bylander, "Learning linear threshold functions in the presence of classification noise," Proc. 7th Workshop on Computational Learning Theory, 1994.
.... a vector w that minimizes the number of misclassified points is NP hard, variants on the Perceptron Algorithm typically do well in practice[Gal90, Ama94] In fact, it is possible to provide guarantees for variations on the Perceptron Algorithm in the presence of inconsistent data (e.g. see [Byl93, Byl94, Kea93] 2 ) under models in which the inconsistency is produced by a sufficiently benign process, such as the random classification noise model discussed below. In this paper, we present a version of the Perceptron Algorithm that maintains its properties of noise tolerance, while providing ....
....in the case of zero noise and describe how the Perceptron Algorithm can be modified and combined with the procedure from the Outlier Removal Lemma to produce a polynomial time PAC learning algorithm. Finally, we describe how the algorithm can be adjusted to the noisy case using known techniques [Byl94, Kea93]. 1.2 Notation, definitions, and preliminaries In this paper, we consider the problem of learning linear threshold functions in the PAC model in the presence of random classification noise [KV94] The problem can be stated as follows. We are given access to examples (points) drawn from some ....
[Article contains additional citation context not shown here]
T. Bylander. Learning linear threshold functions in the presence of classification noise. In Proceedings of the Seventh Annual Workshop on Computational Learning Theory, pages 340--347. ACM Press, New York, NY, 1994.
.... a vector w that minimizes the number of misclassified points is NP hard, variants on the Perceptron Algorithm typically do well in practice[Gal90, Ama94] In fact, it is possible to provide guarantees for variations on the Perceptron Algorithm in the presence of inconsistent data (e.g. see [Byl93, Byl94, Kea93] 2 ) under models in which the inconsistency is produced by a sufficiently benign process, such as the random classification noise model discussed below. In this paper, we present a version of the Perceptron Algorithm that maintains its properties of noise tolerance, while providing ....
....in the case of zero noise and describe how the Perceptron Algorithm can be modified and combined with the procedure from the Outlier Removal Lemma to produce a polynomial time PAC learning algorithm. Finally, we describe how the algorithm can be adjusted to the noisy case using known techniques [Byl94, Kea93, AD94]. 1.2 Notation, definitions, and preliminaries In this paper, we consider the problem of learning linear threshold functions in the PAC model in the presence of random classification noise [KV94] The problem can be stated as follows. We are given access to examples (points) drawn from some ....
[Article contains additional citation context not shown here]
T. Bylander. Learning linear threshold functions in the presence of classification noise. In Proceedings of the Seventh Annual Workshop on Computational Learning Theory, pages 340--347. ACM Press, New York, NY, 1994.
....is also a least squares linear probability approximation. If the conditional densities are multivariate normal with equal covariance matrices, then a least squares linear approximation on a set of examples will approximate the target LTF. The third special case is classification noise. Bylander [Byl94] shows that LTFs are polynomially learnable in the presence of classification noise if there is a sufficient separation between the examples and the target hyperplane and if there is no hyperplane that is very close on average to the examples. Blum et al. BFKV96] improve this result by ....
....compared with the following algorithms. The LMS algorithm is an online algorithm for least square approximation [WS85] the example were sampled with replacement, and a learning rate of 0:01 was used. The Perceptron Noise (PN) algorithm is an algorithm for learning LTFs with classification noise [Byl94]. The Ratchet (Rat) algorithm is the perceptron algorithm with a heuristic for determining the best weights [Gal90] 100 epochs were used for all the above algorithms. C4.5 is a decision tree learning algorithm [Qui93] it is commonly used by the machine learning community for experimental ....
T. Bylander. Learning linear-threshold functions in the presence of classification noise. In Proc. Seventh Annual ACM Conf. on Computational
....neural networks. For concept learning in which some linear threshold function is a perfect classifier, mistake bounds are known for the Perceptron algorithm [22, 25] and the Winnow and Weighted Majority algorithms [18, 19, 21] There are also results for these algorithms for various types of noise [3, 4, 5, 6, 10, 20]. However, these previous results do not characterize the behavior of these algorithms over any sequence of examples. This paper shows that minimizing the absolute loss characterizes the online behavior of two algorithms for learning linear threshold functions: the Perceptron algorithm and the ....
T. Bylander. Learning linear-threshold functions in the presence of classification noise. In Proc. Seventh Annual ACM Conf. on Computational Learning Theory, pages 340--347, 1994. 6 See [1, 14, 13] for some interesting research along these lines. 20
.... threshold function is a perfect classifier, mistake bounds are known for the perceptron algorithm (Rosenblatt 1962; Minsky Papert 1969) and Winnow and Weighted Majority algorithms (Littlestone 1988; 1989; Littlestone Warmuth 1994) There are also results for various types of noise, e.g. (Bylander 1994). However, these results do not characterize the behavior of these algorithms over any sequence of examples. This paper shows that minimizing the absolute loss characterizes the online behavior of the perceptron algorithm and an exponentiated update algorithm (related to Weighted Majority) over ....
Bylander, T. 1994. Learning linear-threshold functions in the presence of classification noise. In Proc. Seventh Annual ACM Conf. on ComputationalLearning Theory, 340-- 347.
....12 and 17, for classification and monotonic noise, respectively, which is far from being low order. The analysis leads to sizeable constants as well. However, this is very much a worst case analysis. Empirical studies have shown that similar algorithms perform very well on commonly used datasets [4, 5]. 4 Additional empirical studies would be useful. Other issues to address include the following. As mentioned previously, Blum et al. s [2] and Cohen [6] have shown how the a small separation parameter can be addressed for classification noise; whether their analyses can be extended to monotonic ....
T. Bylander. Learning linear-threshold functions in the presence of classification noise. In Proc. Seventh Annual ACM Conf. on Computational Learning Theory, pages 340-- 347, 1994.
No context found.
T. Bylander, Learning linear threshold functions in the presence of classification noise, Proceedings of the Workshop on Computational Learning Theory, 1994.
No context found.
T. Bylander. Learning linear threshold functions in the presence of classi cation noise. In Proceedings of the Seventh Annual Workshop on Computational Learning Theory, pages 340-347. ACM Press, New York, NY, 1994.
No context found.
T. Bylander. Learning linear threshold functions in the presence of classification noise. In Proc. 7th ACM Conference on Computational Learning Theory, pages 340--347, 1994.
No context found.
T. Bylander. Learning Linear Threshold Functions in the Presence of Classification Noise. In Proceedings of the 7th Annual ACM Conference on Computational Learning Theory, 340-347, 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC