22 citations found. Retrieving documents...
A.B.J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, pages 615--622, 1962.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Lower Bounds on Identification Criteria for Perceptron-like.. - Schmitt (1996)   (1 citation)  (Correct)

....any set of training examples taken from a Boolean threshold function leads after finitely many correction steps to a weight vector that produces no further errors on the training set. An endless number of proofs with subtle differences has been published. Lewis II, 1966; Minsky and Papert, 1988; Novikoff, 1962; Parberry, 1994; Rosenblatt, 1962 ] are just to mention a few. Using the weight complexity of Boolean threshold functions, Maass, 1994, Theorem 8.2 ] has derived an explicit upper bound in terms of the number of weights: at most (n 1) Delta 2 (n 1) log(n 1) corrections can occur. ....

A. Novikoff. On convergence proofs for Perceptrons. In Symposium on Mathematical Theory of Automata, pages 615--622. Polytechnic Institute of Brooklyn, April 1962.


Efficiency versus Convergence of Boolean Kernels for.. - Khardon, Roth, Servedio (2001)   (Correct)

....predictions using a linear function in their feature space. Despite their limited expressiveness, they have been applied successfully in recent years to several large scale real world classification problems. The SNoW system [7, 2] for example, has successfully applied variations of Perceptron [6] and Winnow [4] to problems in natural language processing. The system first extracts Boolean features from examples (given as text) and then runs learning algorithms over restricted conjunctions of these basic features. There are several ways to enhance the set of features after the initial ....

....w x 0: If the prediction is 1 and the label is 1 (false positive prediction) then the vector w is set to w x, while if the prediction is 1 and the label is 1 (false negative) then w is set to w x: No change is made if the prediction is correct. The famous Perceptron Convergence Theorem [6] bounds the number of mistakes which the Perceptron algorithm can make: Theorem 1 Let hx ; y 1 i; hx ; y t i be a sequence of labeled examples with x 2 kx k R and y i 2 f1; 1g for all i. Let u 2 ; 0 be such that y i u x for all i: Then Perceptron makes ....

A. Novikoff. On convergence proofs for perceptrons. In Proceeding of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615--622, 1963.


Efficiency versus Convergence of Boolean Kernels for.. - Roni Khardon Tufts   (Correct)

....w x 0: If the prediction is 1 and the label is 1 (false positive prediction) then the vector w is set to w x, while if the prediction is 1 and the label is 1 (false negative) then w is set to w x: No change is made if the prediction is correct. The famous Perceptron Convergence Theorem [6] bounds the number of mistakes which the Perceptron algorithm can make: Theorem 1 Let hx 1 ; y 1 i; hx t ; y t i be a sequence of labeled examples with x i 2 N ; kx i k R and y i 2 f1; 1g for all i. Let u 2 N ; 0 be such that y i u x i for all i: Then ....

A. Novikoff. On convergence proofs for perceptrons. In Proceeding of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615--622, 1963.


Parameter Estimation for Statistical Parsing Models: Theory and.. - Collins (2001)   (4 citations)  (Correct)

....is useful to define the maximum achievable margin fl on a separable training set as fl = maxQ2 n fl Q = maxQ2 n min i;y2G(x i ) y 6=y i OE(x i ;y i ) DeltaQ GammaOE(x i ;y) DeltaQ jjQjj . The following theorem can then be stated: Theorem 4 (Simple modification of theorem from [Block 1962; Novikoff 1962] see also [Freund and Schapire 1999] Let f(x 1 ; y 1 ) x n ; yn )g be a sequence of examples such that 8i; 8y 2 G(x i ) jjOE(x i ; y i ) Gamma OE(x i ; y)jj R. Assume the sequence is separable, and take fl to be the maximum achievable margin on the sequence. Then the number of ....

....i ; y i ) Gamma OE(x i ; y)jj R. Assume the sequence is separable, and take fl to be the maximum achievable margin on the sequence. Then the number of mistakes made by the perceptron algorithm on this sequence is at most (R=fl) 2 . Proof: Simple modification of the proof by [Block 1962; Novikoff 1962] see also [Freund and Schapire 1999] This theorem implies that if the training sample in figure 2 is separable, and we iterate the algorithm repeatedly over the training sample (i.e. T 1) then the algorithm converges to a parameter setting that classifies the training set with zero ....

Novikoff, A. B. J. 1962. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, Vol XII, 615--622.


Estimating the Optimal Margins of Embeddings in Euclidean .. - Forster, Schmitt, Simon (2001)   (1 citation)  (Correct)

....has the following matrix: HALF INTERVALSn = 0 B B B B 1 Gamma1 Delta Delta Delta Gamma1 . Gamma1 1 Delta Delta Delta Delta Delta Delta 1 1 C C C C A 2 f Gamma1; 1g n Thetan : As observed by Ben David [1] we can use Novikoff s Theorem [10] to get an upper bound on the margins of embeddings of this matrix in half spaces: If we have an embedding with margin fl, then it follows from Novikoff s Theorem that the concept class can be learned with at most fl Gamma2 EQ queries. The learning complexity of HALF INTERVALSn with arbitrary ....

Novikoff, A. B. (1962). On convergence proofs on perceptrons. Symposium on the Mathematical Theory of Automata, 12, 615--622. Polytechnic Institute of Brooklyn.


Efficiency versus Convergence of Boolean Kernels for.. - Khardon, Roth, Servedio   (Correct)

....w x 0: If the prediction is 1 and the label is 1 (false positive prediction) then the vector w is set to w x, while if the prediction is 1 and the label is 1 (false negative) then w is set to w x: No change is made if the prediction is correct. The famous Perceptron Convergence Theorem [5] bounds the number of mistakes which the Perceptron algorithm can make: Theorem 1 Let hx 1 ; y 1 i; hx t ; y t i be a sequence of labeled examples with x i 2 N ; kx i k R and y i 2 f1; 1g for all i. Let u 2 N ; 0 be such that y i u x i for all i: Then ....

A. Novikoff. On convergence proofs for perceptrons. In Proceeding of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615--622, 1963.


Comparison Between the Regression Depth Method and the .. - Christmann, Fischer.. (2000)   (Correct)

....estimate of does not exist, if the data is completely separable, i.e. n co = 0. ffl The opposite holds true when training a single linear threshold function using the Perceptron (Rosenblatt, 1962) algorithm. The Perceptron algorithm is guaranteed to converge only for data sets with n co = 0 (Novikoff, 1962). ffl And finally, when assessing the quality of linear models according to the empirical risk minimization principle (Vapnik, 1998) n co is a parameter in bounds on the prediction error. Unfortunately, the problem of determining the exact minimum number of misclassifications n co based on an ....

Novikoff, A. (1962). On convergence proofs on perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata, Vol XII, pp. 615-622.


Smooth Boosting and Linear Threshold Learning with Malicious Noise - Servedio   (Correct)

....threshold ; i.e. f( x) sign( u Delta x Gamma ) but our definition incurs no real loss of generality since such a threshold can be simulated by adding an extra variable. 2. 1 Perceptron The Perceptron algorithm for learning linear threshold functions was introduced nearly forty years ago [8, 27, 28] and continues to be the subject of active theoretical research [7, 9, 16, 23, 30, 31] The algorithm works online and is remarkably simple: it maintains a prediction vector v 2 n which is initially set to zero. Given an example x 2 n ; the algorithm predicts y = sign( v Delta x) 2 ....

....y = sign( v Delta x) 2 f Gamma1; 1g and then receives the true label y 2 f Gamma1; 1g: If the prediction is correct then v is left unchanged, but if it is incorrect then v is updated to v = v y x before the next example is processed. The well known Perceptron Convergence Theorem [8, 27] bounds the total number of prediction mistakes: Theorem 1 Let h x 1 ; y 1 i; h x s ; y s i be a sequence of labeled examples with x j 2 B(R) for all j: Suppose that 0 and u 2 n ; k uk = 1 is such that y j ( u Delta x j ) for all j: Then the Perceptron algorithm ....

A. Novikoff. On convergence proofs on perceptrons, in "Proc. Symposium on Mathematical Theory of Automata" (XII) (1962), 615-622.


Large Scale Bayes Point Machines - Herbrich, Graepel (2001)   (2 citations)  (Correct)

....: mg, if y Pi(i) Omega OE Gamma x Pi(i) Delta ; w t ff K 0 then w t 1 = w t y Pi(i) OE Gamma x Pi(i) Delta and t t 1. 3. Stop, if for all i 2 f1; mg, y Pi(i) Omega OE Gamma x Pi(i) Delta ; w t ff K 0. A classical theorem due to Novikoff [7] guarantees the convergence of this procedure and furthermore provides an upper bound on the number t of mistakes needed until convergence. More precisely, if there exists a classifier wSVM with margin fl Z (wSVM ) min (x i ;y i )2Z y i hOE (x i ) wSVM i K kwSVM k K ; then the number of ....

A. Novikoff. On convergence proofs for perceptrons. In Report at the Symposium on Mathematical Theory of Automata, pages 24--26, Politechnical Institute Brooklyn, 1962.


From Margin To Sparsity - Graepel, Herbrich (2001)   (Correct)

....in feature space. As will be seen in Theorem 2 the normalisation can only lead to a decrease in the upper bound on the number of steps until convergence. Other variants of this algorithm have been presented elsewhere (see [2, 3] 3 An Improvement of Novikoff s Theorem In the early 60 s Novikoff [9] was able to give an upper bound on the number of mistakes made by the classical perceptron learning procedure. Two years later, this bound was generalised to feature spaces using Mercer kernels by Aizerman et al. 1] Given a training set Z, the quantity determining the upper bound is the ....

A. Novikoff. On convergence proofs for perceptrons. In Report at the Symposium on Mathematical Theory of Automata, pages 24--26, Politechnical Institute Brooklyn, 1962.


Worst-Case Analysis of the Perceptron and Exponentiated Update.. - Bylander (1998)   (3 citations)  (Correct)

....m 2U 2 E X 2 E ln n implies Abs Loss(EU(s; 1= U E X 2 E ) S 0 ) m, which implies 0 1 Loss(EU(s; 1= U E X 2 E ) S) m. Proof: Using Theorem 3, Abs Loss(u; S) 0, j = 1= U E X 2 E ) and m 2U 2 E X 2 E ln n: Abs Loss(EU(s; j; U E ) S 0 ) 2 Block [2] Novikoff [23], and Papert [24] are generally credited with providing the first proofs of this mistake bound. 12 Abs Loss(u; S 0 ) U E ln n j jmU E X 2 E 2 U 2 E X 2 E ln n m 2 m Because every subsequence of length m has an absolute loss less than m, then Observation 4 implies ....

A. B. J. Novikoff. On convergence proofs for perceptrons. In Proc. Symposium on the Mathematical Theory of the Automata, volume XII, pages 615--622, 1962.


Generalisation Error Bounds for Sparse Linear Classifiers - Graepel, Herbrich.. (2000)   (Correct)

....algorithms. The sparsity allows us to avoid the double sample argument of the Basic Lemma [14] In addition, we present a PAC Bayesian theorem [9] about the average generalisation error over a subset of version space. Finally, we reinterpret Novikoff s well known perceptron convergence theorem [12] as a sparsity guarantee for the classifier found by the well known perceptron learning algorithm: the mere existence of large margin classifiers implies the existence of sparse consistent classifiers. By combining the perceptron mistake bound with a compression bound that originated from the ....

....= 0. 2. For all i 2 f1; mg, if y Pi(i) f ff Gamma x Pi(i) Delta 0 then (ff t 1 ) Pi(i) ff t ) Pi(i) 1 : 5.1) and t t 1. 3. Stop, if there is no i 2 f1; mg such that y Pi(i) f ff Gamma x Pi(i) Delta 0 : In the early 60 s Novikoff and Aizerman et al. [12, 1] were able to give an upper bound on the number t of mistakes made by this learning procedure. Given 301 a training set Z, the quantity determining the upper bound is the maximally achievable margin max ff fl Z (ff) on the training sample Z = X; Y ) normalised by the total extent of the ....

A. Novikoff. On convergence proofs for perceptrons. In Report at the Symposium on Mathematical Theory of Automata, pages 24--26, Politechnical Institute Brooklyn, 1962.


Large Margin Classification Using the Perceptron Algorithm - Freund, Schapire (1998)   (50 citations)  (Correct)

....is used for learning from a batch of training examples is to run the algorithm repeatedly through the training set until it finds a prediction vector which is correct on all of the training set. This prediction rule is then used for predicting the labels on the test set. Block [3] Novikoff [15] and Minsky and Papert [14] have shown that if the data are linearly separable, then the perceptron algorithm will make a finite number of mistakes, and therefore, if repeatedly cycled through the training set, will converge to a vector which correctly classifies all of the examples. Moreover, the ....

....inseparable case. Second, we review an analysis of the leave one out conversion of an online learning algorithm to a batch learning algorithm. 3. 1 The online perceptron algorithm in the separable case Our analysis is based on the following well known result first proved by Block [3] and Novikoff [15]. The significance of this result is that the number of mistakes does not depend on the dimension of the instances. This gives reason to believe that the perceptron algorithm might perform well in high dimensional spaces. Theorem 1 (Block, Novikoff) Let h(x 1 ; y 1 ) x m ; ym )i be a ....

A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 615--622, 1962.


Identification Criteria and Lower Bounds for Perceptron-like.. - Schmitt (1998)   (5 citations)  (Correct)

....theorem guarantees that any set of training examples taken from a Boolean threshold function leads after finitely many correction steps to a weight vector that produces no further errors on the training set. A considerable number of proofs with subtle differences has been published (e.g. Novikoff 1962; Rosenblatt 1962; Lewis II 1966; Minsky and Papert 1988; Parberry 1994) Maass (1994) has derived an explicit upper bound in terms of the number of weights: At most (n 1) 2 Delta 2 (n 1) log(n 1) corrections may occur. Concerning lower bounds one easily concludes that the number of ....

Novikoff, A. 1962. On convergence proofs for Perceptrons. In Symposium on Mathematical Theory of Automata, pp. 615--622. Polytechnic Institute of Brooklyn.


Loss Functions and Structured Domains for Support Vector Machines - Portera (2005)   (Correct)

No context found.

A.B.J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, pages 615--622, 1962.


Journal of Machine Learning Research 7 (2006) 551--585.. - Koby Crammer Crammer   (Correct)

No context found.

A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 615--622, 1962.


Unknown -   (Correct)

No context found.

Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. Proc. Symposium on the Mathematical Theory of Automata.


Journal of Machine Learning Research 6 (2005) 1579--1619.. - With Online And (2005)   (Correct)

No context found.

A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615--622. Polytechnic Institute of Brooklyn, 1962.


Fast Binary Feature Selection with Conditional Mutual Information - Fleuret (2004)   (1 citation)  (Correct)

No context found.

A. Novikoff. On convergence proofs for perceptrons. In In Symposium on Mathematical Theory of Automata, pages 615--622, 1962.


Online Learning with Kernels - Kivinen, Smola, Williamson (2002)   (5 citations)  (Correct)

No context found.

A. B. J. Novikoff, "On convergence proofs on perceptrons," in Proceedings of the Symposium on the Mathematical Theory of Automata, vol. 12. Polytechnic Institute of Brooklyn, 1962, pp. 615--622.


Online Classification on a Budget - Crammer, Kandola, Singer (2003)   (2 citations)  (Correct)

No context found.

A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 615--622, 1962.


Learning Multivalued Multithreshold Functions - Department   (Correct)

No context found.

A. B. Novikoff. On convergence proofs on perceptrons. In Symposium on the Mathematical Theory of Automata, 12. Polytechnic Institute of Brooklyn, 1962: 615-622.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC