Results 1  10
of
162
Good Practice in LargeScale Learning for Image Classification
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
, 2013
"... We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods i ..."
Abstract

Cited by 53 (6 self)
 Add to MetaCart
We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that rankingbased algorithms do not outperform the onevsrest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through crossvalidation the optimal rebalancing of positive and negative examples can result in a significant improvement for the onevsrest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these “good practices”, we were able to improve the stateoftheart on a large subset of 10K classes and 9M images of ImageNet from 16.7 % Top1 accuracy to 19.1%.
RECENT ADVANCES IN DEEP LEARNING FOR SPEECH RESEARCH AT MICROSOFT
"... Deep learning is becoming a mainstream technology for speech recognition at industrial scale. In this paper, we provide an overview of the work by Microsoft speech researchers since 2009 in this area, focusing on more recent advances which shed light to the basic capabilities and limitations of the ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
(Show Context)
Deep learning is becoming a mainstream technology for speech recognition at industrial scale. In this paper, we provide an overview of the work by Microsoft speech researchers since 2009 in this area, focusing on more recent advances which shed light to the basic capabilities and limitations of the current deep learning technology. We organize this overview along the featuredomain and modeldomain dimensions according to the conventional approach to analyzing speech systems. Selected experimental results, including speech recognition and related applications such as spoken dialogue and language modeling, are presented to demonstrate and analyze the strengths and weaknesses of the techniques described in the paper. Potential improvement of these techniques and future research directions are discussed. Index Terms — deep learning, neural network, multilingual, speech recognition, spectral features, convolution, dialogue
Deepwalk: Online learning of social representations. arXiv preprint arXiv:1403.6652
, 2014
"... We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk’s latent representations on several multilabel network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk’s representations can provide F1 scores up to 10 % higher than competing methods when labeled data is sparse. In some experiments, DeepWalk’s representations are able to outperform all baseline methods while using 60 % less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS
"... We investigate multilingual modeling in the context of a deep neural network (DNN) – hidden Markov model (HMM) hybrid, where the DNN outputs are used as the HMM state likelihoods. By viewing neural networks as a cascade of feature extractors followed by a logistic regression classifier, we hypothes ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
We investigate multilingual modeling in the context of a deep neural network (DNN) – hidden Markov model (HMM) hybrid, where the DNN outputs are used as the HMM state likelihoods. By viewing neural networks as a cascade of feature extractors followed by a logistic regression classifier, we hypothesise that the hidden layers, which act as feature extractors, will be transferable between languages. As a corollary, we propose that training the hidden layers on multiple languages makes them more suitable for such crosslingual transfer. We experimentally confirm these hypotheses on the GlobalPhone corpus using seven languages from three different language families: Germanic, Romance, and Slavic. The experiments demonstrate substantial improvements over a monolingual DNNHMM hybrid baseline, and hint at avenues of further exploration. Index Terms — Speech recognition, deep learning, neural networks, multilingual modeling
What regularized autoencoders learn from the data generating distribution
, 2012
"... What do autoencoders learn about the underlying data generating distribution? Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form o ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
What do autoencoders learn about the underlying data generating distribution? Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the autoencoder captures the score (derivative of the logdensity with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the autoencoder: they show what the autoencoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising autoencoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate MetropolisHastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments. 1.
HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION
"... We investigate the application of deep neural network (DNN)hidden Markov model (HMM) hybrid acoustic models for farfield speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We investigate the application of deep neural network (DNN)hidden Markov model (HMM) hybrid acoustic models for farfield speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4–6 % absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multimicrophone setup by training with data from other microphones.
Pcanet: A simple deep learning baseline for image classification?” arXiv preprint arXiv:1404.3606
, 2014
"... Abstract — In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets
Multisource deep learning for human pose estimation
 In CVPR
, 2014
"... Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multisource deep model in order to extract nonlinear representation from these different aspects of information sources. With the deep ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multisource deep model in order to extract nonlinear representation from these different aspects of information sources. With the deep model, the global, highorder human body articulation patterns in these information sources are extracted for pose estimation. The task for estimating body locations and the task for human detection are jointly learned using a unified deep model. The proposed approach can be viewed as a postprocessing of pose estimation results and can flexibly integrate with existing methods by taking their information sources as input. By extracting the nonlinear representation from multiple information sources, the deep model outperforms stateoftheart by up to 8.6 percent on three public benchmark datasets. 1.
Provable bounds for learning some deep representations.
 ArXiv:1310.6343,
, 2013
"... Abstract We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer network that has degree at most n γ for some γ < 1 and each edge has a random edge weight in [−1, 1]. O ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer network that has degree at most n γ for some γ < 1 and each edge has a random edge weight in [−1, 1]. Our algorithm learns almost all networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural nets with random edge weights.
LargeScale Optimization of Hierarchical Features for Saliency Prediction in Natural Images
"... Saliency prediction typically relies on handcrafted (multiscale) features that are combined in different ways to form a “master ” saliency map, which encodes local image conspicuity. Recent improvements to the state of the art on standard benchmarks such as MIT1003 have been achieved mostly by incr ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Saliency prediction typically relies on handcrafted (multiscale) features that are combined in different ways to form a “master ” saliency map, which encodes local image conspicuity. Recent improvements to the state of the art on standard benchmarks such as MIT1003 have been achieved mostly by incrementally adding more and more handtuned features (such as car or face detectors) to existing models [18, 4, 22, 34]. In contrast, we here follow an entirely automatic datadriven approach that performs a largescale search for optimal features. We identify those instances of a richlyparameterized bioinspired model family (hierarchical neuromorphic networks) that successfully predict image saliency. Because of the high dimensionality of this parameter space, we use automated hyperparameter optimization to efficiently guide the search. The optimal blend of such multilayer features combined with a simple linear classifier achieves excellent performance on several image saliency benchmarks. Our models outperform the state of the art on MIT1003, on which features and classifiers are learned. Without additional training, these models generalize well to two other image saliency data sets, Toronto and NUSEF, despite their different image content. Finally, our algorithm scores best of all the 23 models evaluated to date on the MIT300 saliency challenge [16], which uses a hidden test set to facilitate an unbiased comparison. 1.