#### DMCA

## The Manifold Tangent Classifier

Citations: | 32 - 10 self |

### Citations

2461 | A global geometric framework for nonlinear dimensionality reduction. - Tenenbaum, Silva, et al. - 2000 |

2414 | Nonlinear dimensionality reduction by locally linear embedding - S, Saul - 2000 |

970 | A fast learning algorithm for deep belief nets
- Hinton, Osindero, et al.
(Show Context)
Citation Context ...he successful unsupervised pretraining approach for learning deep architectures, which has been shown to significantly improve supervised performance even without using additional unlabeled examples (=-=Hinton et al., 2006-=-; Bengio, 2009; Erhan et al., 2010). 2. The (unsupervised) manifold hypothesis, according to which real world data presented in high dimensional spaces is likely to concentrate in the vicinity of non-... |

394 | Greedy layer-wise training of deep networks,” - Bengio, Lamblin, et al. - 2007 |

336 |
Learning deep architectures for AI,” Foundations and Trends
- Bengio
- 2009
(Show Context)
Citation Context ...vised pretraining approach for learning deep architectures, which has been shown to significantly improve supervised performance even without using additional unlabeled examples (Hinton et al., 2006; =-=Bengio, 2009-=-; Erhan et al., 2010). 2. The (unsupervised) manifold hypothesis, according to which real world data presented in high dimensional spaces is likely to concentrate in the vicinity of non-linear sub-man... |

272 |
Efficient pattern recognition using a new transformation distance
- Simard, LeCun, et al.
- 1994
(Show Context)
Citation Context ...he introduction. 4.1 CAE-based tangent distance One way of achieving this is to use a nearest neighbor classifier with a similarity criterion defined as the shortest distance between two hyperplanes (=-=Simard et al., 1993-=-). The tangents extracted on each points will allow us to shrink the distances between two samples when they can approximate each other by a linear combination of their local tangents. Following Simar... |

228 | Deep boltzmann machines,”
- Salakhutdinov, Hinton
- 2009
(Show Context)
Citation Context ...TC uses the same stack of CAEs trained with tangent propagation using 15 tangents. The prior state of the art for the permutation invariant version of the task was set by the Deep Boltzmann Machines (=-=Salakhutdinov and Hinton, 2009-=-) at 0.95%. Using our approach, we reach 0.81% error on the test set. Remarkably, the MTC also outperforms the basic Convolutional Neural Network (CNN) even though the CNN exploits prior knowledge abo... |

219 | Efficient learning of sparse representations with an energybased model,”
- Ranzato, Poultney, et al.
- 2006
(Show Context)
Citation Context ...y viewed as a technique for dimensionality reduction, where a narrow bottleneck (i.e. dh < d) was in effect acting as a capacity control mechanism. By contrast, recent successes (Bengio et al., 2007; =-=Ranzato et al., 2007-=-a; Kavukcuoglu et al., 2009; Vincent et al., 2010; Rifai et al., 2011a) tend to rely on rich, oftentimes over-complete representations (dh > d), so that more sophisticated forms of regularization are ... |

206 | Charting a manifold,
- Brand
- 2003
(Show Context)
Citation Context ...f parameters for each, or derived mostly from the set of training examples in every neighborhood), as most explicitly seen in Manifold Parzen Windows (Vincent and Bengio, 2003) and manifold Charting (=-=Brand, 2003-=-). See Bengio and Monperrus (2005) for a critique of local non-parametric manifold algorithms: they might require a number of training examples which grows exponentially with manifold dimension and cu... |

194 | Unsupervised learning of invariant feature hierarchies with applications to object recognition
- Ranzato, Huang, et al.
- 2007
(Show Context)
Citation Context ...y viewed as a technique for dimensionality reduction, where a narrow bottleneck (i.e. dh < d) was in effect acting as a capacity control mechanism. By contrast, recent successes (Bengio et al., 2007; =-=Ranzato et al., 2007-=-a; Kavukcuoglu et al., 2009; Vincent et al., 2010; Rifai et al., 2011a) tend to rely on rich, oftentimes over-complete representations (dh > d), so that more sophisticated forms of regularization are ... |

155 | Why does unsupervised pre-training help deep learning?
- Erhan
- 2010
(Show Context)
Citation Context ...ing approach for learning deep architectures, which has been shown to significantly improve supervised performance even without using additional unlabeled examples (Hinton et al., 2006; Bengio, 2009; =-=Erhan et al., 2010-=-). 2. The (unsupervised) manifold hypothesis, according to which real world data presented in high dimensional spaces is likely to concentrate in the vicinity of non-linear sub-manifolds of much lower... |

143 | Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion,”
- Vincent, Larochelle, et al.
- 2010
(Show Context)
Citation Context ...ion, where a narrow bottleneck (i.e. dh < d) was in effect acting as a capacity control mechanism. By contrast, recent successes (Bengio et al., 2007; Ranzato et al., 2007a; Kavukcuoglu et al., 2009; =-=Vincent et al., 2010-=-; Rifai et al., 2011a) tend to rely on rich, oftentimes over-complete representations (dh > d), so that more sophisticated forms of regularization are required to pressure the auto-encoder to extract ... |

119 | Learning invariant features through topographic filter maps.
- Kavukcuoglu, Ranzato, et al.
- 2009
(Show Context)
Citation Context ... for dimensionality reduction, where a narrow bottleneck (i.e. dh < d) was in effect acting as a capacity control mechanism. By contrast, recent successes (Bengio et al., 2007; Ranzato et al., 2007a; =-=Kavukcuoglu et al., 2009-=-; Vincent et al., 2010; Rifai et al., 2011a) tend to rely on rich, oftentimes over-complete representations (dh > d), so that more sophisticated forms of regularization are required to pressure the au... |

112 |
Learning a nonlinear embedding by preserving class neighbourhood structure,”
- Hinton
- 2007
(Show Context)
Citation Context ...Semi-supervised classification error on the MNIST test set with 100, 600, 1000 and 3000 labeled training examples. We compare our method with results from (Weston et al., 2008; Ranzato et al., 2007b; =-=Salakhutdinov and Hinton, 2007-=-). NN SVM CNN TSVM DBN-rNCA EmbedNN CAE MTC 100 25.81 23.44 22.98 16.81 - 16.86 13.47 12.03 600 11.44 8.85 7.68 6.16 8.7 5.97 6.3 5.13 1000 10.7 7.77 6.45 5.38 - 5.73 4.77 3.64 3000 6.04 4.21 3.35 3.4... |

78 |
Tangent Prop - A formalism for specifying selected invariances in an adaptive network
- Simard, Victorri, et al.
- 1992
(Show Context)
Citation Context ...ns or scalings usually do not change an image’s class). Supervised classification algorithms that have been devised to efficiently exploit tangent directions given as domain-specific prior-knowledge (=-=Simard et al., 1992-=-, 1993), can readily be used instead with our learned tangent spaces. In particular, we will show record-breaking improvements by using TangentProp for fine tuning CAE-pre-trained deep neural networks... |

77 | Contractive Auto-Encoders: Explicit Invariance during Feature Extraction,”
- Rifai, Vincent, et al.
- 2011
(Show Context)
Citation Context ...ts of different classes are likely to concentrate along different sub-manifolds, separated by low density regions of the input space. 1The recently proposed Contractive Auto-Encoder (CAE) algorithm (=-=Rifai et al., 2011-=-a), based on the idea of encouraging the learned representation to be robust to small variations of the input, was shown to be very effective for unsupervised feature learning. Its successful applicat... |

76 | Principled hybrids of generative and discriminative models.
- Lasserre, Bishop, et al.
- 2006
(Show Context)
Citation Context ...pothesis, according to which learning aspects of the input distribution p(x) can improve models of the conditional distribution of the supervised target p(y|x), i.e., p(x) and p(y|x) share something (=-=Lasserre et al., 2006-=-). This hypothesis underlies not only the strict semi-supervised setting where one has many more unlabeled examples at his disposal than labeled ones, but also the successful unsupervised pretraining ... |

67 | Deep learning via semi-supervised embedding.
- Weston, Ratle, et al.
- 2008
(Show Context)
Citation Context ...r at the training example, this amounts to making that minimization more robust, i.e., extend it to the neighborhood of the training examples. Also related is the Semi-Supervised Embedding algorithm (=-=Weston et al., 2008-=-). In addition to minimizing a supervised prediction error, it encourages each layer of representation of a deep architecture to be invariant when the training example is changed from x to a near neig... |

65 | Measuring invariances in deep networks.
- Goodfellow, Le, et al.
- 2009
(Show Context)
Citation Context ...ed to pressure the auto-encoder to extract relevant features and avoid trivial solutions. Several successful techniques aim at sparse representations (Ranzato et al., 2007a; Kavukcuoglu et al., 2009; =-=Goodfellow et al., 2009-=-). Alternatively, denoising auto-encoders (Vincent et al., 2010) change the objective from mere reconstruction to that of denoising. 2.2 First order and higher order contractive auto-encoders More rec... |

42 | Improved Local Coordinate Coding Using Local Tangents,”
- Yu, Zhang
- 2010
(Show Context)
Citation Context ...eled implicitly by the CAE’s objective function (that is not based on pairs of points). More recently, the Local Coordinate Coding (LCC) algorithm (Yu et al., 2009) and its Local Tangent LCC variant (=-=Yu and Zhang, 2010-=-) were proposed to build a a local chart around each training example (with a local low-dimensional coordinate system around it) and use it to define a representation for each input x: the responsibil... |

40 | Manifold Parzen Windows,”
- Vincent, Bengio
- 2002
(Show Context)
Citation Context ...around each training point (with a separate set of parameters for each, or derived mostly from the set of training examples in every neighborhood), as most explicitly seen in Manifold Parzen Windows (=-=Vincent and Bengio, 2003-=-) and manifold Charting (Brand, 2003). See Bengio and Monperrus (2005) for a critique of local non-parametric manifold algorithms: they might require a number of training examples which grows exponent... |

28 | Non-Local Manifold Tangent Learning,” - Bengio, Monperrus - 2004 |

24 | Non-local manifold parzen windows. In
- Bengio, Larochelle, et al.
- 2006
(Show Context)
Citation Context ...xamples which grows exponentially with manifold dimension and curvature (more crooks and valleys in the manifold will require more examples). One attempt to generalize the manifold shape non-locally (=-=Bengio et al., 2006-=-) is based on explicitly predicting the tangent plane associated to any given point x, as a parametrized function of x. Note that these algorithms all explicitly exploit training set neighborhoods (se... |

23 | Algorithms for manifold learning,”
- Cayton
- 2005
(Show Context)
Citation Context ...rvised) manifold hypothesis, according to which real world data presented in high dimensional spaces is likely to concentrate in the vicinity of non-linear sub-manifolds of much lower dimensionality (=-=Cayton, 2005-=-; Narayanan and Mitter, 2010). 3. The manifold hypothesis for classification, according to which points of different classes are likely to concentrate along different sub-manifolds, separated by low d... |

16 | Sample complexity of testing the manifold hypothesis
- Narayanan, Mitter
- 2010
(Show Context)
Citation Context ...ld hypothesis, according to which real world data presented in high dimensional spaces is likely to concentrate in the vicinity of non-linear sub-manifolds of much lower dimensionality (Cayton, 2005; =-=Narayanan and Mitter, 2010-=-). 3. The manifold hypothesis for classification, according to which points of different classes are likely to concentrate along different sub-manifolds, separated by low density regions of the input ... |

14 | Higher Order Contractive Auto-Encoder,”
- Rifai, Mesnil, et al.
- 2011
(Show Context)
Citation Context ...ts of different classes are likely to concentrate along different sub-manifolds, separated by low density regions of the input space. 1The recently proposed Contractive Auto-Encoder (CAE) algorithm (=-=Rifai et al., 2011-=-a), based on the idea of encouraging the learned representation to be robust to small variations of the input, was shown to be very effective for unsupervised feature learning. Its successful applicat... |

11 | Object recognition with gradient-based learning
- LeCun, Haffner, et al.
- 1999
(Show Context)
Citation Context ... test set with the full training set. K-NN NN SVM DBN CAE DBM CNN MTC 3.09% 1.60% 1.40% 1.17% 1.04% 0.95% 0.95% 0.81% Table 3 shows our results on the full MNIST dataset with some results taken from (=-=LeCun et al., 1999-=-; Hinton et al., 2006). The CAE in this figure is a two-layer deep network with 2000 units per layer pretrained with the CAE+H objective. The MTC uses the same stack of CAEs trained with tangent propa... |

5 |
Improving generalisation performance using double backpropagation
- Drucker, Cun
- 1992
(Show Context)
Citation Context ...ltaneously to provide a good initialization of deep network layers and a coherent non-local predictor of tangent spaces. TangentProp is itself closely related to the Double Backpropagation algorithm (=-=Drucker and LeCun, 1992-=-), in which one instead adds a penalty that is the sum of squared derivatives of the prediction error (with respect to the network input). Whereas TangentProp attempts to make the output insensitive t... |

5 | Application of distributed svm architectures in classifying forest data cover types - Trebar, Steele |