• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

DMCA

A note on topical n-grams (2005)

Cached

  • Download as a PDF

Download Links

  • [ciir.cs.umass.edu]
  • [maroo.cs.umass.edu]
  • [ciir-publications.cs.umass.edu]
  • [maroo.cs.umass.edu]
  • [www.cs.cmu.edu]
  • [www.cs.umass.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [www.dtic.mil]
  • [works.bepress.com]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Xuerui Wang , Andrew Mccallum
Venue:University of Massachusetts
Citations:27 - 1 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@TECHREPORT{Wang05anote,
    author = {Xuerui Wang and Andrew Mccallum},
    title = {A note on topical n-grams},
    institution = {University of Massachusetts},
    year = {2005}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assumption: bag of words. However, text is indeed a sequence of discrete word tokens, and without considering the order of words (in another word, the nearby context where a word is located), the accurate meaning of language cannot be exactly captured by word co-occurrences only. In this sense, collocations of words (phrases) have to be considered. However, like individual words, phrases sometimes show polysemy as well depending on the context. More noticeably, a composition of two (or more) words is a phrase in some context, but not in other contexts. In this paper, we propose a new probabilistic generative model that automatically determines unigram words and phrases based on context and simultaneously associates them with mixture of topics, and show very interesting results on large text corpora. 1

Keyphrases

topical n-grams    large text corpus    latent dirichlet allocation    nearby context    language cannot    interesting result    new probabilistic generative model    underlying assumption    unigram word    word co-occurrence    discrete word token    individual word    accurate meaning    popular topic model   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University