Results 1  10
of
44
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850
, 2013
"... This paper shows how Long Shortterm Memory recurrent neural networks can be used to generate complex sequences with longrange structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
(Show Context)
This paper shows how Long Shortterm Memory recurrent neural networks can be used to generate complex sequences with longrange structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are realvalued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles. 1
On the difficulty of training recurrent neural networks
"... There are two widely known issues with properly training recurrent neural networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geo ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
There are two widely known issues with properly training recurrent neural networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section. 1.
How to construct deep recurrent neural networks
, 2014
"... In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find th ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) inputtohidden function, (2) hiddentohidden transition and (3) hiddentooutput function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.
Advances in optimizing recurrent networks
 In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
"... ar ..."
(Show Context)
Training Recurrent Neural Networks
, 2013
"... Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging probl ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessianfree (HF) optimizer and show that it can train RNNs on tasks that have extreme longrange temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to characterlevel language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with longterm dependencies. This directly contradicts widespread beliefs about the inability of firstorder methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization.
Highdimensional sequence transduction
 in ICASSP
, 2013
"... We investigate the problem of transforming an input sequence into a highdimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions give ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
We investigate the problem of transforming an input sequence into a highdimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions given the input and we devise an efficient algorithm to search for the global mode of that distribution. The resulting method produces musically plausible transcriptions even under high levels of noise and drastically outperforms previous stateoftheart approaches on five datasets of synthesized sounds and real recordings, approximately halving the test error rate. Index Terms — Sequence transduction, restricted Boltzmann machine, recurrent neural network, polyphonic transcription 1.
Discriminatively Trained Recurrent Neural Networks for SingleChannel Speech Separation
, 2014
"... This paper describes an indepth investigation of training criteria, network architectures and feature representations for regressionbased singlechannel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruc ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
This paper describes an indepth investigation of training criteria, network architectures and feature representations for regressionbased singlechannel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from timefrequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of timefrequency mask estimation by DNNs, recurrent DNNs and nonnegative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long shortterm memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of finetuning the feature representation for DNN training.
An Empirical Exploration of Recurrent Network Architectures
, 2015
"... The Recurrent Neural Network (RNN) is an extremely powerful sequence model that is often difficult to train. The Long ShortTerm Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s architecture appears to be adho ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
The Recurrent Neural Network (RNN) is an extremely powerful sequence model that is often difficult to train. The Long ShortTerm Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s architecture appears to be adhoc so it is not clear if it is optimal, and the significance of its individual components is unclear. In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recentlyintroduced Gated Recurrent Unit (GRU) on some but not all tasks. We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU.
Training EnergyBased Models for TimeSeries Imputation
"... Imputing missing values in high dimensional timeseries is a difficult problem. This paper presents a strategy for training energybased graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimizat ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Imputing missing values in high dimensional timeseries is a difficult problem. This paper presents a strategy for training energybased graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimizationbased learning (Domke, 2012) and allows complex neural models with convolutional and recurrent structures to be trained for imputation tasks. In this work, we use this training strategy to derive learning rules for three substantially different neural architectures. Inference in these models is done by either truncated gradient descent or variational meanfield iterations. In our experiments, we found that the training methods outperform the Contrastive Divergence learning algorithm. Moreover, the training methods can easily handle missing values in the training data itself during learning. We demonstrate the performance of this learning scheme and the three models we introduce on one artificial and two realworld data sets.