Results 1  10
of
77
Latent dirichlet allocation
 Journal of Machine Learning Research
, 2003
"... We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, ..."
Abstract

Cited by 4365 (92 self)
 Add to MetaCart
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model. 1.
Hierarchical topic models and the nested Chinese restaurant process
 Advances in Neural Information Processing Systems
, 2004
"... We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested ..."
Abstract

Cited by 287 (32 self)
 Add to MetaCart
(Show Context)
We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
The LargeScale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth
 Cognitive Science
"... We present statistical analyses of the largescale structure of three types of semantic networks: word associations, WordNet, and Roget's thesaurus. We show that they have a smallworld structure, characterized by sparse connectivity, short average pathlengths between words, and strong local ..."
Abstract

Cited by 209 (2 self)
 Add to MetaCart
We present statistical analyses of the largescale structure of three types of semantic networks: word associations, WordNet, and Roget's thesaurus. We show that they have a smallworld structure, characterized by sparse connectivity, short average pathlengths between words, and strong local clustering. In addition, the distributions of the number of connections follow power laws that indicate a scalefree pattern of connectivity, with most nodes having relatively few connections joined together through a small number of hubs with many connections. These regularities have also been found in certain other complex natural networks, such as the world wide web, but they are not consistent with many conventional models of semantic organization, based on inheritance hierarchies, arbitrarily structured networks, or highdimensional vector spaces. We propose that these structures reflect the mechanisms by which semantic networks grow. We describe a simple model for semantic growth, in which each new word or concept is connected to an existing network by differentiating the connectivity pattern of an existing node. This model generates appropriate smallworld statistics and powerlaw connectivity distributions, and also suggests one possible mechanistic basis for the effects of learning history variables (ageofacquisition, usage frequency) on behavioral performance in semantic processing tasks.
Topics in semantic representation
 Psychological Review
, 2007
"... Accounts of language processing have suggested that it requires retrieving concepts from memory in response to an ongoing stream of information. This can be facilitated by inferring the gist of a sentence, conversation, or document, and using that computational problem underlying the extraction and ..."
Abstract

Cited by 183 (15 self)
 Add to MetaCart
(Show Context)
Accounts of language processing have suggested that it requires retrieving concepts from memory in response to an ongoing stream of information. This can be facilitated by inferring the gist of a sentence, conversation, or document, and using that computational problem underlying the extraction and use of gist, formulating this problem as a rational statistical inference. This leads us to a novel approach to semantic representation in which word meanings are represented in terms of a set of probabilistic topics. The topic model performs well in predicting word association and the effects of semantic association and ambiguity on a variety of language processing and memory tasks. It also provides a foundation for developing more richly structured statistical models of language, as the generative process assumed in the topic model can easily be extended to incorporate other kinds of semantic and syntactic structure. Many aspects of perception and cognition can be understood by considering the computational problem that is addressed by a particular human capacity (Andersion, 1990; Marr, 1982). Perceptual capacities such as identifying shape from shading (Freeman, 1994), motion perception
Representing word meaning and order information in a composite holographic lexicon
 Psychological Review
, 2007
"... The authors present a computational model that builds a holographic lexicon representing both word meaning and word order from unsupervised experience with natural language. The model uses simple convolution and superposition mechanisms (cf. B. B. Murdock, 1982) to learn distributed holographic repr ..."
Abstract

Cited by 123 (14 self)
 Add to MetaCart
(Show Context)
The authors present a computational model that builds a holographic lexicon representing both word meaning and word order from unsupervised experience with natural language. The model uses simple convolution and superposition mechanisms (cf. B. B. Murdock, 1982) to learn distributed holographic representations for words. The structure of the resulting lexicon can account for empirical data from classic experiments studying semantic typicality, categorization, priming, and semantic constraint in sentence completions. Furthermore, order information can be retrieved from the holographic representations, allowing the model to account for limited word transitions without the need for builtin transition rules. The model demonstrates that a broad range of psychological data can be accounted for directly from the structure of lexical representations learned in this way, without the need for complexity to be built into either the processing mechanisms or the representations. The holographic representations are an appropriate knowledge representation to be used by higher order models of language comprehension, relieving the complexity required at the higher level.
Integrating experiential and distributional data to learn semantic representations
 Psychological Review
, 2009
"... The authors identify 2 major types of statistical data from which semantic representations can be learned. These are denoted as experiential data and distributional data. Experiential data are derived by way of experience with the physical world and comprise the sensorymotor data obtained through s ..."
Abstract

Cited by 68 (4 self)
 Add to MetaCart
The authors identify 2 major types of statistical data from which semantic representations can be learned. These are denoted as experiential data and distributional data. Experiential data are derived by way of experience with the physical world and comprise the sensorymotor data obtained through sense receptors. Distributional data, by contrast, describe the statistical distribution of words across spoken and written language. The authors claim that experiential and distributional data represent distinct data types and that each is a nontrivial source of semantic information. Their theoretical proposal is that human semantic representations are derived from an optimal statistical combination of these 2 data types. Using a Bayesian probabilistic model, they demonstrate how word meanings can be learned by treating experiential and distributional data as a single joint distribution and learning the statistical structure that underlies it. The semantic representations that are learned in this manner are measurably more realistic—as verified by comparison to a set of humanbased measures of semantic representation—than those available from either data type individually or from both sources independently. This is not a result of merely using quantitatively more data, but rather it is because experiential and distributional data are qualitatively distinct, yet intercorrelated, types of data. The semantic representations that are learned are based on statistical structures that exist both within and between the experiential and distributional data types.
Collapsed variational inference for HDP
 In Advances in Neural Information Processing Systems
"... Abstract A wide variety of Dirichletmultinomial 'topic' models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy op ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
(Show Context)
Abstract A wide variety of Dirichletmultinomial 'topic' models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and sidestepping of issues with topicidentifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables. Experiments show a significant improvement in accuracy.
Bayesian models of cognition
"... For over 200 years, philosophers and mathematicians have been using probability theory to describe human cognition. While the theory of probabilities was first developed as a means of analyzing games of chance, it quickly took on a larger and deeper significance as a formal account of how rational a ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
For over 200 years, philosophers and mathematicians have been using probability theory to describe human cognition. While the theory of probabilities was first developed as a means of analyzing games of chance, it quickly took on a larger and deeper significance as a formal account of how rational agents should reason in situations of uncertainty
Bayesian word sense induction
 In EACL 2009
, 2009
"... Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim i ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word’s contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samples from a multinomial distribution over senses which are in turn characterized as distributions over words. The Bayesian framework provides a principled way to incorporate a wide range of features beyond lexical cooccurrences and to systematically assess their utility on the sense induction task. The proposed approach yields improvements over stateoftheart systems on a benchmark dataset. 1
Discrete Component Analysis
 Subspace, Latent Structure and Feature Selection Techniques
, 2006
"... This article presents a unified theory for analysis of components in discrete data, and compares the methods with techniques such as independent component analysis, nonnegative matrix factorisation and latent Dirichlet allocation. The main families of algorithms discussed are a variational appr ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
(Show Context)
This article presents a unified theory for analysis of components in discrete data, and compares the methods with techniques such as independent component analysis, nonnegative matrix factorisation and latent Dirichlet allocation. The main families of algorithms discussed are a variational approximation, Gibbs sampling, and RaoBlackwellised Gibbs sampling. Applications are presented for voting records from the United States Senate for 2003, and for the Reuters21578 newswire collection.