Results 1  10
of
173
Good Practice in LargeScale Learning for Image Classification
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
, 2013
"... We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods i ..."
Abstract

Cited by 53 (6 self)
 Add to MetaCart
We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that rankingbased algorithms do not outperform the onevsrest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through crossvalidation the optimal rebalancing of positive and negative examples can result in a significant improvement for the onevsrest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these “good practices”, we were able to improve the stateoftheart on a large subset of 10K classes and 9M images of ImageNet from 16.7 % Top1 accuracy to 19.1%.
Learning sentimentspecific word embedding for twitter sentiment classification.
 In ACL,
, 2014
"... Abstract We present a method that learns word embedding for Twitter sentiment classification in this paper. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text. This is problematic for sentiment a ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Abstract We present a method that learns word embedding for Twitter sentiment classification in this paper. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text. This is problematic for sentiment analysis as they usually map words with similar syntactic context but opposite sentiment polarity, such as good and bad, to neighboring word vectors. We address this issue by learning sentimentspecific word embedding (SSWE), which encodes sentiment information in the continuous representation of words. Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. sentences or tweets) in their loss functions. To obtain large scale training corpora, we learn the sentimentspecific word embedding from massive distantsupervised tweets collected by positive and negative emoticons. Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with handcrafted features in the topperformed system; (2) the performance is further improved by concatenating SSWE with existing feature set.
RECENT ADVANCES IN DEEP LEARNING FOR SPEECH RESEARCH AT MICROSOFT
"... Deep learning is becoming a mainstream technology for speech recognition at industrial scale. In this paper, we provide an overview of the work by Microsoft speech researchers since 2009 in this area, focusing on more recent advances which shed light to the basic capabilities and limitations of the ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
(Show Context)
Deep learning is becoming a mainstream technology for speech recognition at industrial scale. In this paper, we provide an overview of the work by Microsoft speech researchers since 2009 in this area, focusing on more recent advances which shed light to the basic capabilities and limitations of the current deep learning technology. We organize this overview along the featuredomain and modeldomain dimensions according to the conventional approach to analyzing speech systems. Selected experimental results, including speech recognition and related applications such as spoken dialogue and language modeling, are presented to demonstrate and analyze the strengths and weaknesses of the techniques described in the paper. Potential improvement of these techniques and future research directions are discussed. Index Terms — deep learning, neural network, multilingual, speech recognition, spectral features, convolution, dialogue
Deepwalk: Online learning of social representations. arXiv preprint arXiv:1403.6652
, 2014
"... We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk’s latent representations on several multilabel network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk’s representations can provide F1 scores up to 10 % higher than competing methods when labeled data is sparse. In some experiments, DeepWalk’s representations are able to outperform all baseline methods while using 60 % less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS
"... We investigate multilingual modeling in the context of a deep neural network (DNN) – hidden Markov model (HMM) hybrid, where the DNN outputs are used as the HMM state likelihoods. By viewing neural networks as a cascade of feature extractors followed by a logistic regression classifier, we hypothes ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
We investigate multilingual modeling in the context of a deep neural network (DNN) – hidden Markov model (HMM) hybrid, where the DNN outputs are used as the HMM state likelihoods. By viewing neural networks as a cascade of feature extractors followed by a logistic regression classifier, we hypothesise that the hidden layers, which act as feature extractors, will be transferable between languages. As a corollary, we propose that training the hidden layers on multiple languages makes them more suitable for such crosslingual transfer. We experimentally confirm these hypotheses on the GlobalPhone corpus using seven languages from three different language families: Germanic, Romance, and Slavic. The experiments demonstrate substantial improvements over a monolingual DNNHMM hybrid baseline, and hint at avenues of further exploration. Index Terms — Speech recognition, deep learning, neural networks, multilingual modeling
What regularized autoencoders learn from the data generating distribution
, 2012
"... What do autoencoders learn about the underlying data generating distribution? Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form o ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
What do autoencoders learn about the underlying data generating distribution? Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the autoencoder captures the score (derivative of the logdensity with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the autoencoder: they show what the autoencoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising autoencoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate MetropolisHastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments. 1.
HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION
"... We investigate the application of deep neural network (DNN)hidden Markov model (HMM) hybrid acoustic models for farfield speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We investigate the application of deep neural network (DNN)hidden Markov model (HMM) hybrid acoustic models for farfield speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4–6 % absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multimicrophone setup by training with data from other microphones.
Multisource deep learning for human pose estimation
 In CVPR
, 2014
"... Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multisource deep model in order to extract nonlinear representation from these different aspects of information sources. With the deep ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multisource deep model in order to extract nonlinear representation from these different aspects of information sources. With the deep model, the global, highorder human body articulation patterns in these information sources are extracted for pose estimation. The task for estimating body locations and the task for human detection are jointly learned using a unified deep model. The proposed approach can be viewed as a postprocessing of pose estimation results and can flexibly integrate with existing methods by taking their information sources as input. By extracting the nonlinear representation from multiple information sources, the deep model outperforms stateoftheart by up to 8.6 percent on three public benchmark datasets. 1.
Pcanet: A simple deep learning baseline for image classification?” arXiv preprint arXiv:1404.3606
, 2014
"... Abstract — In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets
Provable bounds for learning some deep representations.
 ArXiv:1310.6343,
, 2013
"... Abstract We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer network that has degree at most n γ for some γ < 1 and each edge has a random edge weight in [−1, 1]. O ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer network that has degree at most n γ for some γ < 1 and each edge has a random edge weight in [−1, 1]. Our algorithm learns almost all networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural nets with random edge weights.