• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392, (2012)

by Nicolas Boulanger-Lewandowski, Yoshua Bengio, Pascal Vincent
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 44
Next 10 →

Representation learning: A review and new perspectives.

by Yoshua Bengio , Aaron Courville , Pascal Vincent - of IEEE Conf. Comp. Vision Pattern Recog. (CVPR), , 2005
"... Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract - Cited by 173 (4 self) - Add to MetaCart
Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
(Show Context)

Citation Context

...e on four major benchmarks by about 30 percent (e.g., from 27.4 to 18.5 percent on RT03S) compared to state-of-the-art models based on Gaussian mixtures for the acoustic modeling and trained on the same amount of data (309 hours of speech). The relative improvement in error rate obtained by Dahl et al. [55] on a smaller large-vocabulary speech recognition benchmark (Bing mobile business search dataset, with 40 hours of speech) is between 16 and 23 percent. Representation-learning algorithms have also been applied to music, substantially beating the state of the art in polyphonic transcription [34], with relative error improvement between 5 and 30 percent on a standard benchmark of four datasets. Deep learning also helped to win MIREX (music information retrieval) competitions, for example, in 2011 on audio tagging [81]. 2.2 Object Recognition The beginnings of deep learning in 2006 focused on the MNIST digit image classification problem [94], [23], breaking the supremacy of SVMs (1.4 percent error) on this dataset.3 The latest records are still held by deep networks: Ciresan et al. [46] currently claim the title of state of the art for the unconstrained version of the task (e.g., using...

Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850

by Alex Graves , 2013
"... This paper shows how Long Short-term Memory recurrent neural net-works can be used to generate complex sequences with long-range struc-ture, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwrit-ing (where the data are ..."
Abstract - Cited by 56 (2 self) - Add to MetaCart
This paper shows how Long Short-term Memory recurrent neural net-works can be used to generate complex sequences with long-range struc-ture, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwrit-ing (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles. 1
(Show Context)

Citation Context

...ve handwriting in a wide variety of styles. 1 Introduction Recurrent neural networks (RNNs) are a rich class of dynamic models that have been used to generate sequences in domains as diverse as music =-=[6, 4]-=-, text [30] and motion capture data [29]. RNNs can be trained for sequence generation by processing real data sequences one step at a time and predicting what comes next. Assuming the predictions are ...

On the difficulty of training recurrent neural networks

by Razvan Pascanu, Tomas Mikolov, Yoshua Bengio
"... There are two widely known issues with properly training recurrent neural networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geo ..."
Abstract - Cited by 42 (6 self) - Add to MetaCart
There are two widely known issues with properly training recurrent neural networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section. 1.

How to construct deep recurrent neural networks

by Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio , 2014
"... In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find th ..."
Abstract - Cited by 17 (3 self) - Add to MetaCart
In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhu-ber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conven-tional, shallow RNNs.

Advances in optimizing recurrent networks

by Yoshua Bengio, Nicolas Boulanger-lew, Razvan Pascanu, U. Montreal - In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
"... ar ..."
Abstract - Cited by 16 (5 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ctions are multivariate, another approach is to capture the high-order dependencies between the output variables using a powerful output probability model such as a Restricted Boltzmann Machine (RBM) =-=[24, 25]-=- or a deterministic variant of it called NADE [26, 25]. In the experiments performed here, we have experimented with a NADE output model for the music data. 3.4. Sparser Gradients via Sparse Output Re...

Training Recurrent Neural Networks

by Ilya Sutskever , 2013
"... Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging probl ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessian-free (HF) optimizer and show that it can train RNNs on tasks that have extreme long-range temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to character-level language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with long-term dependencies. This directly contradicts widespread beliefs about the inability of first-order methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization.

High-dimensional sequence transduction

by Nicolas Boulanger-lewandowski, Yoshua Bengio, Pascal Vincent - in ICASSP , 2013
"... We investigate the problem of transforming an input sequence into a high-dimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn real-istic output distributions give ..."
Abstract - Cited by 9 (3 self) - Add to MetaCart
We investigate the problem of transforming an input sequence into a high-dimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn real-istic output distributions given the input and we devise an efficient algorithm to search for the global mode of that distribution. The re-sulting method produces musically plausible transcriptions even un-der high levels of noise and drastically outperforms previous state-of-the-art approaches on five datasets of synthesized sounds and real recordings, approximately halving the test error rate. Index Terms — Sequence transduction, restricted Boltzmann machine, recurrent neural network, polyphonic transcription 1.
(Show Context)

Citation Context

...re introduced in [5]. In a recently developed probabilistic model called the RNN-RBM, a series of distribution estimators (one at each time step) are conditioned on the deterministic output of an RNN =-=[6, 7]-=-. In this work, we introduce an input/output extension of the RNN-RBM that can learn to map input sequences to output sequences, whereas the original RNN-RBM only learns the output sequence distributi...

Discriminatively Trained Recurrent Neural Networks for Single-Channel Speech Separation

by F. Le Roux, J. Hershey, J. R. Schuller , 2014
"... This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruc ..."
Abstract - Cited by 8 (6 self) - Add to MetaCart
This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech sep-aration in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by dis-criminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.

An Empirical Exploration of Recurrent Network Architectures

by Rafal Jozefowicz, Wojciech Zaremba, Ilya Sutskever , 2015
"... The Recurrent Neural Network (RNN) is an ex-tremely powerful sequence model that is often difficult to train. The Long Short-Term Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s archi-tecture appears to be ad-ho ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
The Recurrent Neural Network (RNN) is an ex-tremely powerful sequence model that is often difficult to train. The Long Short-Term Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s archi-tecture appears to be ad-hoc so it is not clear if it is optimal, and the significance of its individual components is unclear. In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. We conducted a thor-ough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks. We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU.

Training Energy-Based Models for Time-Series Imputation

by Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen, Yoshua Bengio
"... Imputing missing values in high dimensional time-series is a difficult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimizat ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Imputing missing values in high dimensional time-series is a difficult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimization-based learning (Domke, 2012) and allows complex neural models with convolutional and recurrent structures to be trained for imputation tasks. In this work, we use this training strategy to derive learning rules for three substantially different neural architectures. Inference in these models is done by either truncated gradient descent or variational mean-field iterations. In our experiments, we found that the training methods outperform the Contrastive Divergence learning algorithm. Moreover, the training methods can easily handle missing values in the training data itself during learning. We demonstrate the performance of this learning scheme and the three models we introduce on one artificial and two real-world data sets.
(Show Context)

Citation Context

...at define the energy are in a separate layer and the visible variables are not independent given the hidden variables. It is also similar to the Recurrent Neural Network Restricted Boltzmann Machine (=-=Boulanger-Lewandowski et al., 2012-=-) that will be used as a baseline in our experiments. 2777BRAKEL, STROOBANDT AND SCHRAUWEN 4. The Discriminative Temporal Boltzmann Machine The third model is inspired by the work on Deep Boltzmann M...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University