• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

DMCA

Training Products of Experts by Minimizing Contrastive Divergence (2002)

Cached

  • Download as a PDF

Download Links

  • [learning.cs.toronto.edu]
  • [www.cs.toronto.edu]
  • [www.cs.utoronto.ca]
  • [www.cs.toronto.edu]
  • [www.cs.toronto.edu]
  • [www.learning.cs.toronto.edu]
  • [learning.cs.toronto.edu]
  • [www.cs.utoronto.ca]
  • [psych.stanford.edu]
  • [www.cse.msu.edu]
  • [www.cs.toronto.edu]
  • [www.cs.utoronto.ca]
  • [www.cs.utoronto.ca]
  • [www.learning.cs.toronto.edu]
  • [www-clmc.usc.edu]
  • [www-cse.ucsd.edu]
  • [www.cs.utoronto.ca]
  • [www.cs.toronto.edu]
  • [www.cs.toronto.edu]
  • [www.cnbc.cmu.edu]
  • [www.gatsby.ucl.ac.uk]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Geoffrey E. Hinton
Citations:850 - 75 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Hinton02trainingproducts,
    author = {Geoffrey E. Hinton},
    title = { Training Products of Experts by Minimizing Contrastive Divergence},
    year = {2002}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual “expert ” models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called “contrastive divergence ” whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

Keyphrases

contrastive divergence    javier movellan training product    combination rule    several type    latent variable    probability distribution    interesting candidate    different expert    perceptual system    combined model    rapid inference    individual expert model    renormalization term    different objective function    multiple latent-variable model   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University