Results 11  20
of
125
Learning sumproduct networks with direct and indirect interactions.
 In Proceedings of the ThirtyFirst International Conference on Machine Learning,
, 2014
"... Abstract Sumproduct networks (SPNs) are a deep probabilistic representation that allows for efficient, exact inference. SPNs generalize many other tractable models, including thin junction trees, latent tree models, and many types of mixtures. Previous work on learning SPN structure has mainly foc ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
Abstract Sumproduct networks (SPNs) are a deep probabilistic representation that allows for efficient, exact inference. SPNs generalize many other tractable models, including thin junction trees, latent tree models, and many types of mixtures. Previous work on learning SPN structure has mainly focused on using topdown or bottomup clustering to find mixtures, which capture variable interactions indirectly through implicit latent variables. In contrast, most work on learning graphical models, thin junction trees, and arithmetic circuits has focused on finding direct interactions among variables. In this paper, we present IDSPN, a new algorithm for learning SPN structure that unifies the two approaches. In experiments on 20 benchmark datasets, we find that the combination of direct and indirect interactions leads to significantly better accuracy than several stateoftheart algorithms for learning SPNs and other tractable models.
Identifying and attacking the saddle point problem in highdimensional nonconvex optimization
 In NIPS
, 2014
"... optimization ..."
(Show Context)
New types of deep neural network learning for speech recognition and related applications: An overview
 in Proc. Int. Conf. Acoust., Speech, Signal Process
, 2013
"... In this paper, we provide an overview of the invited and contributed papers presented at the special session at ICASSP2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications, ” as organized by the authors. We also describe the historical context in ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
In this paper, we provide an overview of the invited and contributed papers presented at the special session at ICASSP2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications, ” as organized by the authors. We also describe the historical context in which acoustic models based on deep neural networks have been developed. The technical overview of the papers presented in our special session is organized into five ways of improving deep learning methods: (1) better optimization; (2) better types of neural activation function and better network architectures; (3) better ways to determine the myriad hyperparameters of deep neural networks; (4) more appropriate ways to preprocess speech for deep neural networks; and (5) ways of leveraging multiple languages or dialects that are more easily achieved with deep neural networks than with Gaussian mixture models. Index Terms — deep neural network, convolutional neural network, recurrent neural network, optimization, spectrogram features, multitask, multilingual, speech recognition, music processing
Distributed optimization of deeply nested systems
, 2012
"... In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate c ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate complex mappings using simple stages. Each layer performs a different operation and achieves an ever more sophisticated representation of the input, as, for example, in an deep artificial neural network, an object recognition cascade in computer vision or a speech frontend processing. Joint estimation of the parameters of all the layers and selection of an optimal architecture is widely considered to be a difficult numerical nonconvex optimization problem, difficult to parallelize for execution in a distributed computation environment, and requiring significant human expert effort, which leads to suboptimal systems in practice. We describe a general mathematical strategy to learn the parameters and, to some extent, the architecture of nested systems, called the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penaltybased methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, and is competitive with stateoftheart nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.
Detection and trajectorylevel exclusion in multiple object tracking
 CVPR
"... When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discretecontinuous conditional random field (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the cooccurrence of incompatible labels (trajectories). We develop an expansion movebased MAP estimation scheme that handles both nonsubmodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of groundtruth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and intertarget occlusion. 1.
Learning natural coding conventions
 In Symposium on the Foundations of Software Engineering (FSE
, 2014
"... Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers str ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project’s coding conventions. However, one third of reviews of changes contain feedback about coding conventions, indicating that programmers do not always follow them and that project members care deeply about adherence. Unfortunately, programmers are often unaware of coding conventions because inferring them requires a global view, one that aggregates the many local decisions programmers make and identifies emergent consensus on style. We present NATURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. NATURALIZE builds on recent work in applying statistical natural language processing to source code. We apply NATURALIZE to suggest natural identifier names and formatting conventions. We present four tools focused on ensuring natural code during development and release management, including code review. NATURALIZE achieves 94 % accuracy in its top suggestions for identifier names. We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.
Highdimensional gaussian process bandits.
 In Advances in Neural Information Processing Systems (NIPS),
, 2013
"... Abstract Many applications in machine learning require optimizing unknown functions defined over a highdimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some lowdimensional subsp ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract Many applications in machine learning require optimizing unknown functions defined over a highdimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some lowdimensional subspace and is smooth (i.e., it has a low norm in a Reproducible Kernel Hilbert Space). In particular, we present the SIBO algorithm, which leverages recent lowrank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. We carefully calibrate the explorationexploitation tradeoff by allocating the sampling budget to subspace estimation and function optimization, and obtain the first subexponential cumulative regret bounds and convergence rates for Bayesian optimization in highdimensions under noisy observations. Numerical results demonstrate the effectiveness of our approach in difficult scenarios.
Towards an empirical foundation for assessing Bayesian optimization of hyperparameters
 In NIPS Workshop on Bayesian Optimization in Theory and Practice
, 2013
"... Progress in practical Bayesian optimization is hampered by the fact that the only available standard benchmarks are artificial test functions that are not representative of practical applications. To alleviate this problem, we introduce a library of benchmarks from the prominent application of hyper ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Progress in practical Bayesian optimization is hampered by the fact that the only available standard benchmarks are artificial test functions that are not representative of practical applications. To alleviate this problem, we introduce a library of benchmarks from the prominent application of hyperparameter optimization and use it to compare Spearmint, TPE, and SMAC, three recent Bayesian optimization methods for hyperparameter optimization. 1
Highdimensional sequence transduction
 in ICASSP
, 2013
"... We investigate the problem of transforming an input sequence into a highdimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions give ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
We investigate the problem of transforming an input sequence into a highdimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions given the input and we devise an efficient algorithm to search for the global mode of that distribution. The resulting method produces musically plausible transcriptions even under high levels of noise and drastically outperforms previous stateoftheart approaches on five datasets of synthesized sounds and real recordings, approximately halving the test error rate. Index Terms — Sequence transduction, restricted Boltzmann machine, recurrent neural network, polyphonic transcription 1.
FBKUEdin participation to the WMT13 Quality Estimation sharedtask.
 In Proceedings of the Eighth Workshop on Statistical Machine Translation,
, 2013
"... Abstract In this paper we present the approach and system setup of the joint participation of Fondazione Bruno Kessler and University of Edinburgh in the WMT 2013 Quality Estimation sharedtask. Our submissions were focused on tasks whose aim was predicting sentencelevel Humanmediated Translation ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Abstract In this paper we present the approach and system setup of the joint participation of Fondazione Bruno Kessler and University of Edinburgh in the WMT 2013 Quality Estimation sharedtask. Our submissions were focused on tasks whose aim was predicting sentencelevel Humanmediated Translation Edit Rate and sentencelevel postediting time (Task 1.1 and 1.3, respectively). We designed features that are built on resources such as automatic word alignment, nbest candidate translation lists, backtranslations and word posterior probabilities. Our models consistently overcome the baselines for both tasks and performed particularly well for Task 1.3, ranking first among seven participants.