#### DMCA

## Altitude training: Strong bounds for single-layer dropout (2014)

Venue: | In Advances in Neural Information Processing Systems |

Citations: | 1 - 1 self |

### Citations

4346 | Latent Dirichlet allocation
- Blei, Ng, et al.
- 2003
(Show Context)
Citation Context ...only two topics—one for each class—we get the naive Bayes model. If ⇥ is the (K 1)-dimensional simplex where (⌧) is a ⌧ -mixture over K basis vectors, we get theK-topic latent Dirichlet allocation =-=[16]-=-.2 Note that although our generalization result relies on a generative model, the actual learning algorithm is agnostic to it. Our analysis shows that dropout can take advantage of a generative struct... |

3264 | An introduction to probability theory and its applications volume II - Feller - 1988 |

1008 | ImageNet Classification with Deep Convolutional Neural Networks
- Krizhevsky, Sutskever, et al.
- 2012
(Show Context)
Citation Context ...ias in high dimensions. 1 Introduction Dropout training [1] is an increasingly popular method for regularizing learning algorithms. Dropout is most commonly used for regularizing deep neural networks =-=[2, 3, 4, 5]-=-, but it has also been found to improve the performance of logistic regression and other single-layer models for natural language tasks such as document classification and named entity recognition [6,... |

616 | A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts
- Pang, Lee
- 2004
(Show Context)
Citation Context ...0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 Fraction of data used Er ro r r at e Log.Reg. Naive Bayes Dropout−0.8 Dropout−0.5 Dropout−0.2 (a) Polarity 2.0 dataset =-=[17]-=-. 0 0.2 0.4 0.6 0.8 1 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 Fraction of data used Er ro r r at e Log.Reg. Naive Bayes Dropout−0.8 Dropout−0.5 Dropout−0.2 (b) IMDB dataset [18]. Figure 3: Experime... |

519 | On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes
- Ng, Jordan
- 2002
(Show Context)
Citation Context ...t affected by dropout under the Poisson topic model. Bias is thus negligible when the Bayes boundary is close to linear. It is instructive to compare our generalization bound to that of Ng and Jordan =-=[10]-=-, who showed that the naive Bayes classifier exploits a strong generative assumption—conditional independence of the features given the label—to achieve an excess risk of OP ( p (log d)/n). However, i... |

156 | Statistical behavior and consistency of classification methods based on convex risk minimization
- Zhang, 2004b
(Show Context)
Citation Context ... discussing this solution is more difficult since the ERM solution does not have have a simple characterization. The relationship between the 0-1 loss and convex surrogates has been studied by, e.g., =-=[14, 15]-=-. The score criterion for logistic regression is 0 = Pn i=1 y(i) p̂i x(i), where p̂i = (1 + e bw·x(i))1 are the fitted probabilities. Note that easily-classified examples (where p̂i is close t... |

88 | Learning word vectors for sentiment analysis.
- Maas, Daly, et al.
- 2011
(Show Context)
Citation Context ...olarity 2.0 dataset [17]. 0 0.2 0.4 0.6 0.8 1 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 Fraction of data used Er ro r r at e Log.Reg. Naive Bayes Dropout−0.8 Dropout−0.5 Dropout−0.2 (b) IMDB dataset =-=[18]-=-. Figure 3: Experiments on sentiment classification. More dropout is better relative to logistic regression for small datasets and gradually worsens with more training data. We train a model on traini... |

81 |
Ilya Sutskever, and Ruslan R Salakhutdinov. “Improving neural networks by preventing co-adaptation of feature detectors”. In: arXiv preprint arXiv:1207.0580
- Hinton, Srivastava, et al.
- 2012
(Show Context)
Citation Context ...upted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions. 1 Introduction Dropout training =-=[1]-=- is an increasingly popular method for regularizing learning algorithms. Dropout is most commonly used for regularizing deep neural networks [2, 3, 4, 5], but it has also been found to improve the per... |

79 | Nightmare at test time: Robust learning by feature deletion
- Globerson, Roweis
- 2006
(Show Context)
Citation Context ...dence assumptions of the topic model, we are able to improve the exponent in the rate of convergence of the empirical risk minimizer. It is also possible to analyze dropout as an adaptive regularizer =-=[6, 9, 13]-=-: in comparison with L2 regularization, dropout favors the use of rare features and encourages confident predictions. If we believe that good document classification should produce confident predictio... |

79 | Classification with hybrid generative/discriminative models.
- Raina, Shen, et al.
- 2003
(Show Context)
Citation Context ...ssumption to cut their generalization error. Our analysis presents an interesting contrast to other work that directly combine generative and discriminative modeling by optimizing a hybrid likelihood =-=[20, 21, 22, 23, 24, 25]-=-. Our approach is more guarded in that we only let the generative assumption speak through pseudo-examples. Conclusion We have presented a theoretical analysis that explains how dropout training can b... |

76 | Principled hybrids of generative and discriminative models.
- Lasserre, Bishop, et al.
- 2006
(Show Context)
Citation Context ...ssumption to cut their generalization error. Our analysis presents an interesting contrast to other work that directly combine generative and discriminative modeling by optimizing a hybrid likelihood =-=[20, 21, 22, 23, 24, 25]-=-. Our approach is more guarded in that we only let the generative assumption speak through pseudo-examples. Conclusion We have presented a theoretical analysis that explains how dropout training can b... |

64 | The trade-off between generative and discriminative classifiers.
- Bouchard, Triggs
- 2004
(Show Context)
Citation Context ...ssumption to cut their generalization error. Our analysis presents an interesting contrast to other work that directly combine generative and discriminative modeling by optimizing a hybrid likelihood =-=[20, 21, 22, 23, 24, 25]-=-. Our approach is more guarded in that we only let the generative assumption speak through pseudo-examples. Conclusion We have presented a theoretical analysis that explains how dropout training can b... |

63 | Regularization of neural networks using dropconnect.
- Wan, Zeiler, et al.
- 2013
(Show Context)
Citation Context ...ias in high dimensions. 1 Introduction Dropout training [1] is an increasingly popular method for regularizing learning algorithms. Dropout is most commonly used for regularizing deep neural networks =-=[2, 3, 4, 5]-=-, but it has also been found to improve the performance of logistic regression and other single-layer models for natural language tasks such as document classification and named entity recognition [6,... |

55 | An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators
- Liang, Jordan
- 2008
(Show Context)
Citation Context |

46 | Multi-conditional learning: Generative/discriminative training for clustering and classification
- McCallum, Pal, et al.
- 2006
(Show Context)
Citation Context |

38 | Baselines and bigrams: Simple, good sentiment and topic classification. - Wang, Manning - 2012 |

32 | Dropout training as adaptive regularization.
- Wager, Wang, et al.
- 2013
(Show Context)
Citation Context ... 5], but it has also been found to improve the performance of logistic regression and other single-layer models for natural language tasks such as document classification and named entity recognition =-=[6, 7, 8]-=-. For single-layer linear models, learning with dropout is equivalent to using “blankout noise” [9]. The goal of this paper is to gain a better theoretical understanding of why dropout regularization ... |

23 | Fast dropout training.
- Wang, Manning
- 2013
(Show Context)
Citation Context ... 5], but it has also been found to improve the performance of logistic regression and other single-layer models for natural language tasks such as document classification and named entity recognition =-=[6, 7, 8]-=-. For single-layer linear models, learning with dropout is equivalent to using “blankout noise” [9]. The goal of this paper is to gain a better theoretical understanding of why dropout regularization ... |

15 | Learning with marginalized corrupted features.
- Maaten, Chen, et al.
- 2013
(Show Context)
Citation Context ...models for natural language tasks such as document classification and named entity recognition [6, 7, 8]. For single-layer linear models, learning with dropout is equivalent to using “blankout noise” =-=[9]-=-. The goal of this paper is to gain a better theoretical understanding of why dropout regularization works well for natural language tasks. We focus on the task of document classification using linear... |

12 |
Bias-variance tradeoff in hybrid generative-discriminative models
- Bouchard
- 2007
(Show Context)
Citation Context |

12 | Stephane Boucheron, and Gabor Lugosi, Introduction to Statistical Learning Theory - Bousquet |

11 |
Yoshua Bengio. Maxout networks
- Goodfellow, Warde-Farley, et al.
- 2013
(Show Context)
Citation Context ...ias in high dimensions. 1 Introduction Dropout training [1] is an increasingly popular method for regularizing learning algorithms. Dropout is most commonly used for regularizing deep neural networks =-=[2, 3, 4, 5]-=-, but it has also been found to improve the performance of logistic regression and other single-layer models for natural language tasks such as document classification and named entity recognition [6,... |

10 | The dropout learning algorithm
- Baldi, Sadowski
- 2014
(Show Context)
Citation Context ...ework to prove a generalization bound for dropout that decays as 1 . Moreover, provided that is not too close to 1, dropout behaves similarly to an adaptive L2 regularizer with parameter /(1) =-=[6, 12]-=-, and at least in linear regression such L2 regularization improves generalization error by a constant factor. In contrast, by leveraging the conditional independence assumptions of the topic model, w... |

9 | Feature noising for log-linear structured prediction.
- Wang, Wang, et al.
- 2013
(Show Context)
Citation Context ... 5], but it has also been found to improve the performance of logistic regression and other single-layer models for natural language tasks such as document classification and named entity recognition =-=[6, 7, 8]-=-. For single-layer linear models, learning with dropout is equivalent to using “blankout noise” [9]. The goal of this paper is to gain a better theoretical understanding of why dropout regularization ... |

8 |
Adaptive dropout for training deep neural networks
- Ba, Frey
- 2013
(Show Context)
Citation Context |

6 | A PAC-Bayesian tutorial with a dropout bound. arXiv:1307.2118
- McAllester
- 2013
(Show Context)
Citation Context ...sion and naive Bayes, allowing us to tune the bias-variance tradeoff. Other perspectives on dropout In the general setting, dropout only improves generalization by a multiplicative factor. McAllester =-=[11]-=- used the PAC-Bayes framework to prove a generalization bound for dropout that decays as 1 . Moreover, provided that is not too close to 1, dropout behaves similarly to an adaptive L2 regularizer... |