### Citations

952 |
A stochastic approximation method
- Robbins, Monro
- 1954
(Show Context)
Citation Context ...nce. To the best of our knowledge, (Foulds et al., 2013) is the only work that proposes convergenceassured CVB inference so far. This model is based on the Robbins and Monro stochastic approximation (=-=Robbins and Monro, 1951-=-) and is only valid for LDA-CVB0. More precisely, the solution presented in (Foulds et al., 2013) is a MAP solution, leveraging the fact that the MAP solution is very similar to the CVB0 solution in t... |

895 | The link prediction problem in social networks
- Liben-Nowell, Kleinberg
- 2003
(Show Context)
Citation Context ...ographic citations between scientific articles, is useful in many ways. Many statistical models for relational data have been presented in the literature (Clauset et al., 2008; Erosheva et al., 2004; =-=Liben-Nowell and Kleinberg, 2003-=-; Zhu et al., 2009). Among them, the infinite relational model (IRM) proposed by Kemp et al. (2006) achieves simultaneous bi-clustering on the row and column dimensions of a given pairwise relational ... |

435 |
Ferguson distributions via polya urn schemes
- Blackwell, MacQueen
- 1973
(Show Context)
Citation Context ...listic generative model that uses DP for the prior of mixture proportions. We can implement DPM by using either a StickBreaking Process (SBP) (Sethuraman, 1994) or a Chinese restaurant process (CRP) (=-=Blackwell and MacQueen, 1973-=-), which is a marginalized form of SBP. CRP is employed for the (collapsed) Gibbs sampler, and SBP is employed for (collapsed) variational Bayes solutions typically. First, let us start by explaining ... |

367 | Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review
- Cowles, Carlin
- 1996
(Show Context)
Citation Context ...ro et al., 2012; Mø rup et al., 2010; Albers et al., 2013) rely on (collapsed) Gibbs samplers. However, the automatic convergence detection of stochastic sampling-based Gibbs is difficult to achieve (=-=Cowles and Carlin, 1996-=-). This is not preferable for non-expert users to employ IRM in practical uses. Further, (Albers et al., 2013) reported that the naive implementation of (collapsed) Gibbs is very slow in mixing for IR... |

267 | A variational Bayesian framework for graphical models
- Attias
- 2000
(Show Context)
Citation Context ...ssume variational posteriors of hidden variables of the model where parameters are marginalized out beforehand. In Eq. (25), parametersΦ are not marginalized (collapsed) out in ordinary VB inference (=-=Attias, 2000-=-; Bishop, 2006). Thus, we need to compute the variational posteriors of the parameters as well. The variational posteriors of the parameters impact the inference results, and this may increase the dan... |

243 | Hierarchical structure and the prediction of missing links in networks. Nature 453 - Clauset, Moore, et al. - 2008 |

229 | Learning systems of concepts with an infinite relational model.
- Kemp, Tenenbaum, et al.
- 2006
(Show Context)
Citation Context ... than exact collapsed Gibbs, in data modeling (Kurihara et al., 2007; Teh et al., 2007; Asuncion et al., 2009), link predictions, and neighborhood search (Sato et al., 2012). Most IRM papers to date (=-=Kemp et al., 2006-=-; Ishiguro et al., 2012; Mø rup et al., 2010; Albers et al., 2013) rely on (collapsed) Gibbs samplers. However, the automatic convergence detection of stochastic sampling-based Gibbs is difficult to a... |

207 |
The Enron corpus: a new dataset for email classification research
- Klimt, Yang
- 2004
(Show Context)
Citation Context ...rs of clusters of these datasets were: N1 = 100,N2 = 200,K1 = 4,K2 = 5 (synth 1), and N1 = 1000,N2 = 1500,K1 = 7,K2 = 6 (synth 2). The first real-world relational dataset is the Enron e-mail dataset (=-=Klimt and Yang, 2004-=-). This is a famous relational dataset used in many studies (Tang et al., 2008; Fu et al., 2009; Ishiguro et al., 2010, 2012). We extracted monthly e-mail transactions for 2001. The dataset contained ... |

136 | A collapsed variational bayesian inference algorithm for latent dirichlet allocation
- Teh, Newman, et al.
(Show Context)
Citation Context ...stimation than the original Gibbs samplers. Recently, collapsed variational Bayes (CVB) solutions have been intensively studied, especially for topic models such as latent Dirichlet allocation (LDA) (=-=Teh et al., 2007-=-; Asuncion et al., 2009; Sato and Nakagawa, 2012) and HDP-LDA (Sato et al., 2012). The original paper (Teh et al., 2007) examined a 2nd-order Taylor approximation of the variational expectation. A sim... |

128 | The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies - Blei, Griffiths, et al. - 2010 |

105 | Nonparametric latent feature models for link prediction
- Miller, Griffiths, et al.
- 2009
(Show Context)
Citation Context ...shiguro, Sato, and Ueda Stochastic Blockmodel (MMSB) (Airoldi et al., 2008) is a finite-cluster model that allows the nodes to have multiple cluster assignment, and change the clusters edge by edge. (=-=Miller et al., 2009-=-) employs the Indian Buffet Process (IBP, see (Griffiths and Ghahramani, 2011) for a review) to handle countably infinite binary factors for each node. The Infinite Latent Attribute Model (Palla et al... |

57 |
Structured topic models for language
- Wallach
- 2008
(Show Context)
Citation Context ...of the VB lower bound. It is easy to obtain update rules for hyperparameters by taking derivatives of the lower bound. Employing the fixed-point methods presented in (Iwata et al., 2012; Minka, 2000; =-=Wallach, 2008-=-), we have the following update rules for hyperparameters. α1 = K1∑K1 k=1 [ ψ ( α̂1,k + β̂1,k ) − ψ ( β̂1,k )] , (58) α2 = K2∑K2 l=1 [ ψ ( α̂2,l + β̂2,l ) − ψ ( β̂2,l )] , (59) ãk,l = ak,l ψ ( ak,l +... |

56 | Collapsed Variational Inference for HDP - Teh, Kurihara, et al. - 2008 |

40 | Overlapping community detection at scale: A nonnegative matrix factorization approach
- Yang, Leskovec
- 2013
(Show Context)
Citation Context ...he edge structure and node attributes to find communities within large networks. The model is called CESNA, consisting of a soft-max-based binary node attribute model and an affiliated network model (=-=Yang and Leskovec, 2013-=-). These recent works make the model scalable against very large networks consisting of millions of nodes. However, none of these works consider the cross-domain T1 × T2 relational observations that a... |

39 | Dynamic mixed membership blockmodel for evolving networks - Fu, Song, et al. - 2009 |

22 | A constructive definition of the Dirichlet process prior. - SETHURAMAN - 1994 |

21 | Dynamic infinite relational model for time-varying relational data analysis
- Ishiguro, Iwata, et al.
- 2010
(Show Context)
Citation Context ... 6 (synth 2). The first real-world relational dataset is the Enron e-mail dataset (Klimt and Yang, 2004). This is a famous relational dataset used in many studies (Tang et al., 2008; Fu et al., 2009; =-=Ishiguro et al., 2010-=-, 2012). We extracted monthly e-mail transactions for 2001. The dataset contained N = N1 = N2 = 151 company members of Enron. xi, j = 1(0) if there is (not) an e-mail sent from member i to member j. O... |

20 | Community detection in networks with node attributes
- Yang, McAuley, et al.
- 2013
(Show Context)
Citation Context ...th a simpler probabilistic generative model to achieve a scalable algorithm for large networks, which limits the cardinality of the triangle “cluster assignments” variety in the likelihood function. (=-=Yang et al., 2013-=-) employ the edge structure and node attributes to find communities within large networks. The model is called CESNA, consisting of a soft-max-based binary node attribute model and an affiliated netwo... |

14 | Stochastic collapsed variational bayesian inference for latent dirichlet allocation. In KDD,
- Foulds, Boyles, et al.
- 2013
(Show Context)
Citation Context ...Ishiguro et al. (2012) X - - - Noise filtering extension This paper X X X X Fully covers inferences. This problem, interestingly, has not been much discussed in the literature. The sole exception is (=-=Foulds et al., 2013-=-), which uses online stochastic learning valid for LDA. However, this is a tricky and problematic issue for practitioners who are not familiar with but want to try stateof-the-art machine learning tec... |

14 | Estimating a Dirichlet Distribution. http://research.microsoft.com/enus/um/people/minka/papers/dirichlet - Minka - 2000 |

8 |
Practical collapsed variational Bayes inference for hierarchical Dirichlet process
- Sato, Kurihara, et al.
- 2012
(Show Context)
Citation Context ...es (CVB) solutions have been intensively studied, especially for topic models such as latent Dirichlet allocation (LDA) (Teh et al., 2007; Asuncion et al., 2009; Sato and Nakagawa, 2012) and HDP-LDA (=-=Sato et al., 2012-=-). The original paper (Teh et al., 2007) examined a 2nd-order Taylor approximation of the variational expectation. A simpler 0th orderapproximated CVB (CVB0) solution also has been developed; it is an... |

7 | Rethinking collapsed variational bayes inference for lda.
- Sato, Nakagawa
- 2012
(Show Context)
Citation Context ...s. Recently, collapsed variational Bayes (CVB) solutions have been intensively studied, especially for topic models such as latent Dirichlet allocation (LDA) (Teh et al., 2007; Asuncion et al., 2009; =-=Sato and Nakagawa, 2012-=-) and HDP-LDA (Sato et al., 2012). The original paper (Teh et al., 2007) examined a 2nd-order Taylor approximation of the variational expectation. A simpler 0th orderapproximated CVB (CVB0) solution a... |

6 | E Fienberg, and Eric P Xing. Mixed membership stochastic blockmodels - Airoldi, Blei, et al. |

6 | Padhraic Smyth, and Yee Whye Teh. On smoothing and inference for topic models - Asuncion, Welling - 2009 |

5 | Jianping Zhang, and Zohreh Nazeri. Community evolution in dynamic multi-mode networks - Tang, Liu - 2008 |

4 | and Zoubin Ghahramani. The indian buffet process: An introduction and review - Griffiths - 1185 |

4 | On triangular versus edge representations — towards scalable modeling of networks.
- Ho, Yin, et al.
- 2012
(Show Context)
Citation Context ...ted to the single-domain case: that is, T × T → {1, 0}. (Ho et al., 2011) introduces a nested Chinese Restaurant Process (nCRP) (Blei et al., 2010) to incorporate multiscale membership for the MMSB. (=-=Ho et al., 2012-=-) proposed a bag of triangular representations of a network. The representation is based on the triplet of nodes. Possible connections among three nodes are (i) all three nodes are connected in a circ... |

4 | Yee Whye Teh. Collapsed variational dirichlet process mixture models - Kurihara, Welling - 2007 |

4 | Collapsed variational Bayesian inference for Hidden Markov Models - Wang, Blunsom - 2013 |

2 | Sequential Modeling of Topic Dynamics with Multiple Timescales - Iwata, Yamada, et al. |

2 | Avinava Dubey, and Eric P Xing. Parallel markov chain monte carlo for nonparametric mixture models - Williamson - 2013 |

1 | Aagard Moth, Morten Mø rup, and Mikkel N . Schmidt. Large Scale Inference in the Infinite Relational Model: Gibbs Sampling is not Enough - Albers, Leon - 2013 |

1 | Mø rup, and Lars Kai Hanse. Non-parametric Co-clustering of Large Scale Sparse Bipartite Networks on the GPU - Hansen, Morten - 2011 |

1 | Eric P Xing. Multiscale Community Blockmodel for Network Exploration - Ho, Parikh - 2011 |

1 | A Knowles, and Zoubin Ghahramani. An Infinite Latent Attribute Model for Network Data - Palla, David - 2012 |

1 | Collapsed Variational Bayesian Inference for PCFGs - Wang, Blunsom |

1 | Eric P Xing. A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks - Yin, Ho - 2013 |