#### DMCA

## Dirichlet-hawkes processes with applications to clustering continuous-time document streams (2015)

Venue: | In Proc. of KDD |

Citations: | 2 - 0 self |

### Citations

1358 |
Sequential Monte Carlo methods in practice
- Doucet, Freitas, et al.
- 2001
(Show Context)
Citation Context ...inference algorithm alternates between two subroutines. The first subroutine samples the latent cluster membership (and perhaps the missing time) for the current document dn by Sequential Monte Carlo =-=[10, 11]-=-; and then, the second subroutine updates the learned triggering kernels of the respective cluster on the fly. Sampling the cluster label. Let s1:n and t1:n be the latent cluster indicator variables a... |

727 | Incorporating nonlocal information into information extraction systems by Gibbs sampling
- Finkel, Grenager, et al.
(Show Context)
Citation Context ...model on a set of 1,000,000 mainstream news articles extracted from the Spinn3r2 dataset from 01/01 to 02/15 in 2011. Experimental Setup. We apply the Named Entity Recognizer from Stanford NER system =-=[15]-=- and remove common stop-words and tokens which are neither verbs, nouns, nor adjectives. The vocabulary of both words and named entities is pruned to a total of 100,000 terms. We formulate the trigger... |

673 | Topic models
- Blei, Lafferty
(Show Context)
Citation Context ...83411. Keywords Dirichlet Process, Hawkes Process, Document Modeling 1. INTRODUCTION Online news articles, blogs and tweets tend to form clusters around real life events and stories on certain topics =-=[2, 3, 9, 22, 7, 24]-=-. Such data are generated by myriads of online media sites in real-time and in large volumes. It is a critically important task to effectively organize these articles according to their contents such ... |

636 |
Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics
- Antoniak
(Show Context)
Citation Context ...s, especially Hawkes Processes [17], are the mathematical tools for modeling recurrent patterns and continuous-time nature of real world events. 2.1 Bayesian Nonparametrics The Dirichlet process (DP) =-=[5]-=- is one of the most basic Bayesian nonparametric processes, parameterized by a concentration parameter α > 0 and a base distribution G0(θ) over a given space θ ∈ Θ. A sample G ∼ DP (α,G0) drawn from a... |

347 | Rao-Blackwellised particle filtering for dynamic Bayesian networks
- Doucet, Freitas, et al.
(Show Context)
Citation Context ...inference algorithm alternates between two subroutines. The first subroutine samples the latent cluster membership (and perhaps the missing time) for the current document dn by Sequential Monte Carlo =-=[10, 11]-=-; and then, the second subroutine updates the learned triggering kernels of the respective cluster on the fly. Sampling the cluster label. Let s1:n and t1:n be the latent cluster indicator variables a... |

246 | Topics over time: a non-markov continuoustime model of topical trends
- Wang, McCallum
- 2006
(Show Context)
Citation Context ...83411. Keywords Dirichlet Process, Hawkes Process, Document Modeling 1. INTRODUCTION Online news articles, blogs and tweets tend to form clusters around real life events and stories on certain topics =-=[2, 3, 9, 22, 7, 24]-=-. Such data are generated by myriads of online media sites in real-time and in large volumes. It is a critically important task to effectively organize these articles according to their contents such ... |

201 |
An Introduction to the Theory of Point Processes. Volume I and II, 2nd edition
- Daley, Vere-Jones
- 2003
(Show Context)
Citation Context ...kernels. Then, we collect the samples from all partitions to see whether this new sequence is a valid sample from the Hawkes Process with the intensity λ(t|HT ). Results. By the Time Changing Theorem =-=[8]-=-, the intensity integrals ∫ ti ti−1 λ(τ)dτ from the sampled sequence should conform to the unit-rate exponential distribution. Figure 4(d) presents the quantiles of the intensity integrals against the... |

168 |
Spectra of Some Self-Exciting and Mutually Excinting Point Process
- Hawkes
- 1971
(Show Context)
Citation Context ...ly of models which allow the model complexity (e.g., number of latent clusters, number of latent factors) to grow as more data are observed [18]. Temporal point processes, especially Hawkes Processes =-=[17]-=-, are the mathematical tools for modeling recurrent patterns and continuous-time nature of real world events. 2.1 Bayesian Nonparametrics The Dirichlet process (DP) [5] is one of the most basic Bayesi... |

148 | A hierarchical Bayesian language model based on Pitman-Yor processes
- Teh
- 2006
(Show Context)
Citation Context ...nd Hawkes processes has implications beyond clustering document streams. We will show that our construction can be generalized to other Nonparametric Bayesian models, such as the Pitman-Yor processes =-=[23]-=- and the Indian Buffet processes [16]. • We propose an efficient online inference algorithm which can scale up to millions of news articles with near constant processing time per document and moderate... |

60 | The indian buffet process: An introduction and review
- Griffiths, Ghahramani
(Show Context)
Citation Context ...beyond clustering document streams. We will show that our construction can be generalized to other Nonparametric Bayesian models, such as the Pitman-Yor processes [23] and the Indian Buffet processes =-=[16]-=-. • We propose an efficient online inference algorithm which can scale up to millions of news articles with near constant processing time per document and moderate memory consumptions. • We conduct la... |

56 | Distance dependent Chinese restaurant processes
- Blei, Frazier
(Show Context)
Citation Context ...er, one of the main deficiencies of the RCRP and related models [9] is that they require an explicit division of the event stream into unit episodes. Although this was ameliorated in the DD-CRP model =-=[6]-=- simply by defining a continuous weighting function, it does not address the issue that the actual counts of events are nonuniform over time. Artificially discretizing the time line into bins introduc... |

50 | Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering
- Ahmed, Xing
(Show Context)
Citation Context ..., to track their popularity and to predict the future trends. Such problem of modeling time-dependent topic-clusters has been attempted by [2, 3], where the Recurrent Chinese Restaurant Process(RCRP) =-=[4]-=- has been proposed to model 1http://www.usatoday.com/story/weather/2015/01/25/ northeast-possibly-historic-blizzard/22310869/ 219 Permission to make digital or hard copies of all or part of this work ... |

46 |
Survival and Event History Analysis: A Process Point of View, Statistics for Biology and Health,
- Aalen, Borgan, et al.
- 2008
(Show Context)
Citation Context ...ity for the occurrence of a new event given the history T : λ(t)dt = P {event in [t, t+ dt)|T } . (5) The functional form of the intensity λ(t) is often designed to capture the phenomena of interests =-=[1]-=-. For instance, in a homogeneous Poisson process, the intensity is assumed to be independent of the history T and constant over time, i.e., λ(t) = λ0 > 0. In an inhomogeneous Poisson process, the inte... |

41 |
Bayesian Nonparametrics.
- Hjort, Holmes, et al.
- 2010
(Show Context)
Citation Context ...cs, especially Chinese Restaurant Processes, are a rich family of models which allow the model complexity (e.g., number of latent clusters, number of latent factors) to grow as more data are observed =-=[18]-=-. Temporal point processes, especially Hawkes Processes [17], are the mathematical tools for modeling recurrent patterns and continuous-time nature of real world events. 2.1 Bayesian Nonparametrics Th... |

25 | Unified analysis of streaming news - Ahmed, Ho, et al. - 2011 |

19 | Learning networks of heterogeneous influence
- Du, Song, et al.
- 2012
(Show Context)
Citation Context ...θ α>θ gθ − Λ0 ∏ θi=θ ∑ tj<ti,θj=θ α>θ k(∆ij), (15) where glθ = ∑ ti<T,θi=θ ∫ T ti κ(τl, t− ti)dt and Λ0 = ∫ T 0 λ0dt. This can be done efficiently for many kernels, such as the Gaussian RBF kernel =-=[12, 13]-=-, Rayleigh kernel [1], etc. Here, we choose the Gaussian RBF kernel κ(τl,∆) = exp(−(∆ − τl) 2)/2σ2l )/ √ 2piσ2l , so the integral g l θ has the analytic form:∑ ti<T,θi=θ 1 2 ( erfc ( − τl√ 2σ2l ) − er... |

18 | Learning social infectivity in sparse low-rank networks using multidimensional hawkes processes.
- Zhou, Zha, et al.
- 2013
(Show Context)
Citation Context ...e events in later intervals, the Hawkes process in general is more expressive than a Poisson process. Hawkes process is particularly good for modeling repeated activities, such as social interactions =-=[23]-=-, search behaviors [19], or infectious diseases that do not convey immunity. Given a time t′ > t, we can also characterize the conditional probability that no event happens during [t, t′) and the cond... |

11 | Uncover topic-sensitive information diffusion networks. In:
- Du, Song, et al.
- 2013
(Show Context)
Citation Context ...θ α>θ gθ − Λ0 ∏ θi=θ ∑ tj<ti,θj=θ α>θ k(∆ij), (15) where glθ = ∑ ti<T,θi=θ ∫ T ti κ(τl, t− ti)dt and Λ0 = ∫ T 0 λ0dt. This can be done efficiently for many kernels, such as the Gaussian RBF kernel =-=[12, 13]-=-, Rayleigh kernel [1], etc. Here, we choose the Gaussian RBF kernel κ(τl,∆) = exp(−(∆ − τl) 2)/2σ2l )/ √ 2piσ2l , so the integral g l θ has the analytic form:∑ ti<T,θi=θ 1 2 ( erfc ( − τl√ 2σ2l ) − er... |

10 | Shaping social activity by incentivizing users.
- Farajtabar, Du, et al.
- 2014
(Show Context)
Citation Context ...e events in later intervals, the Hawkes process in general is more expressive than a Poisson process. Hawkes process is particularly good for modeling repeated activities, such as social interactions =-=[14]-=-, search behaviors [21], or infectious diseases that do not convey immunity. Given a time t′ > t, we can also characterize the conditional probability that no event happens during [t, t′) and the cond... |

8 | Nifty: A system for large scale information flow tracking and clustering
- Suen, Huang, et al.
(Show Context)
Citation Context ...83411. Keywords Dirichlet Process, Hawkes Process, Document Modeling 1. INTRODUCTION Online news articles, blogs and tweets tend to form clusters around real life events and stories on certain topics =-=[2, 3, 9, 22, 7, 24]-=-. Such data are generated by myriads of online media sites in real-time and in large volumes. It is a critically important task to effectively organize these articles according to their contents such ... |

6 |
On doubly stochastic poisson processes.
- Kingman
- 1964
(Show Context)
Citation Context ...ntensity by a certain amount. Since the intensity function depends on the history T up to time t, the Hawkes process is essentially a conditional Poisson process (or doubly stochastic Poisson process =-=[19]-=-) in the sense that conditioned on the history T , the Hawkes process is a Poisson process formed by the superposition of a background homogeneous Poisson process with the intensity γ0 and a set of in... |

5 |
Poisson processes, volume 3
- Kingman
- 1992
(Show Context)
Citation Context ...kes processes (conditional Poisson processes), one for each distinctive value of θd and with intensity λθd(t). Thus the overall event intensity is the sum of the intensities from individual processes =-=[20]-=- λ̄(t) = λ0 + D∑ d=1 λθd(t), where D is the total number of distinctive values {θi} in the DHP up to time t. Therefore, the Dirichlet-Hawkes process can capture the following four desirable properties... |

4 |
Identifying and labeling search tasks via query-based hawkes processes
- Li, Deng, et al.
- 2014
(Show Context)
Citation Context ...vals, the Hawkes process in general is more expressive than a Poisson process. Hawkes process is particularly good for modeling repeated activities, such as social interactions [14], search behaviors =-=[21]-=-, or infectious diseases that do not convey immunity. Given a time t′ > t, we can also characterize the conditional probability that no event happens during [t, t′) and the conditional density that an... |

3 | R.: An Indexed Model
- Ahmed, Appel, et al.
(Show Context)
Citation Context |

1 |
Recurrent chinese restaurant process with a duration-based discount for event identification from twitter
- Diao, Jiang
- 2014
(Show Context)
Citation Context |