#### DMCA

## WASP: Scalable Bayes via barycenters of subset posteriors

### Citations

955 | Distributed optimization and statistical learning via the alternating direction method of multipliers
- Boyd, Parikh, et al.
(Show Context)
Citation Context ...ce and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38. Copyright 2015 by the authors. subset-specific optimization problems. One widely used and well understood framework is ADMM =-=[4; 6]-=-. In Bayesian statistics, one is faced with the more challenging problem of approximating a posterior measure for the unknown parameters instead of just obtaining a single point estimate of these para... |

207 |
Optimal Transport: Old and New
- Villani
- 2009
(Show Context)
Citation Context ... the Euclidean distance. Assumption (b) can be typically satisfied if the subset size and the number of unknown parameters are not too large. Due to the geometric properties of the Wasserstein metric =-=[21]-=-, the Wasserstein barycenter (WB) of the subset posteriors has highly appealing statistical and computational properties. We formulate a linear program (LP) to estimate the (atomic) WB of atomic appro... |

99 | Stochastic variational inference
- Hoffman, Blei, et al.
- 2013
(Show Context)
Citation Context ... that is closest to the true posterior while restricting the search to a parametric family. While these methods have restrictive distributional assumptions, they can be made computationally efficient =-=[24; 3; 5; 15; 14; 19]-=-. The second group exploits the analytic form of posterior and uses computer architecture to improve the sampling time and convergence [20; 22; 1]. This approach is ideal for large-scale applications ... |

85 | An Architecture for Parallel Topic Models
- Smola, Narayanamurthy
(Show Context)
Citation Context ...they can be made computationally efficient [24; 3; 5; 15; 14; 19]. The second group exploits the analytic form of posterior and uses computer architecture to improve the sampling time and convergence =-=[20; 22; 1]-=-. This approach is ideal for large-scale applications that use simple parametric models. The third group obtains subset posteriors using some sampling algorithm and combines them by using kernel densi... |

36 | Optimal maps for the multidimensional Monge-Kantorovich prooblem
- Gangbo, Świȩch
- 1998
(Show Context)
Citation Context ...2(Θ) K∑ k=1 λkW 2 2 (νk, ν); (7) see Figure 1. Agueh and Carlier [2] also showed that νK,λ in (7) can be obtained as a solution to a LP problem posed as a multimarginal optimal transportation problem =-=[11]-=-. We only present their main result that relates νK,λ and ν1:K . Recall that if σ is a Borel map RD → RD, then the push-forward of µ through σ is the measure σ#µ; see Section 2.1. If T1k represents WA... |

30 | Bayesian posterior sampling via stochastic gradient fisher scoring
- Ahn, Korattikara, et al.
(Show Context)
Citation Context ... that is closest to the true posterior while restricting the search to a parametric family. While these methods have restrictive distributional assumptions, they can be made computationally efficient =-=[24; 3; 5; 15; 14; 19]-=-. The second group exploits the analytic form of posterior and uses computer architecture to improve the sampling time and convergence [20; 22; 1]. This approach is ideal for large-scale applications ... |

29 |
Nonparametric Bayes Modeling of Multivariate Categorical Data
- Dunson, Xing
- 2009
(Show Context)
Citation Context ...r et al. [17]. Following their approach, we use a Dirichlet Process mixture of product multinomial distributions, probabilistic parafac (pparafac), to model multivariate dependence in these data; see =-=[10]-=- for details about the model. The details of the generative model and Gibbs sampler are found in the Appendix D of Minsker et al. [17]. Our interest lies in comparing the final marginals obtained usin... |

26 |
Barycenters in the Wasserstein space
- Agueh, Carlier
(Show Context)
Citation Context ..., . . . ,xK ≡ x1:K ∈ RD, then their EB xK,λ = ∑K k=1 λk xk for λ ∈ ∆K is such that K∑ k=1 λk‖xk −xK,λ‖22 = inf y∈RD K∑ k=1 λk‖xk −y ‖22; (6) see Figure 1. Generalizing (6) to P2(Θ), Agueh and Carlier =-=[2]-=- showed that if ν1, . . . , νK ≡ ν1:K ∈ P2(Θ), then their WB νK,λ for λ ∈ ∆K is such that K∑ k=1 λkW 2 2 (νk, νK,λ) = inf ν∈P2(Θ) K∑ k=1 λkW 2 2 (νk, ν); (7) see Figure 1. Agueh and Carlier [2] also s... |

26 | Streaming variational Bayes
- Broderick, Boyd, et al.
- 2013
(Show Context)
Citation Context ... that is closest to the true posterior while restricting the search to a parametric family. While these methods have restrictive distributional assumptions, they can be made computationally efficient =-=[24; 3; 5; 15; 14; 19]-=-. The second group exploits the analytic form of posterior and uses computer architecture to improve the sampling time and convergence [20; 22; 1]. This approach is ideal for large-scale applications ... |

24 | Austerity in MCMC land: Cutting the MetropolisHastings budget, arXiv preprint arXiv:1304.5299
- Korattikara, Chen, et al.
- 2013
(Show Context)
Citation Context |

18 | Asymptotically exact, embarrassingly parallel MCMC
- Neiswanger, Wang, et al.
- 2014
(Show Context)
Citation Context ...h is ideal for large-scale applications that use simple parametric models. The third group obtains subset posteriors using some sampling algorithm and combines them by using kernel density estimation =-=[18]-=-, Weierstrass transform [23], or minimizing a loss defined on the reproducing kernel Hilbert space (RKHS) embedding of the subset posteriors [16]. These methods are flexible in that they are not restr... |

14 | A framework for evaluating approximation methods for Gaussian process regression
- Chalupka, Williams, et al.
(Show Context)
Citation Context ...e scale GP regression. Exact inference for GP regression involves matrix inversion of size equal to the data set. This becomes infeasible when the size of the data set reaches O(104). Chalupka et al. =-=[7]-=- compared several low rank matrix approximations to avoid matrix inversion in massive data GP computation. Such approximations can be avoided by using WASP for combining GP regression on data subsets ... |

14 | and big data: The consensus monte carlo algorithm, Bayes 250
- Scott, Blocker, et al.
- 2013
(Show Context)
Citation Context |

11 |
der Vaart (2000). Convergence rates of posterior distributions. The Annals of Statistics 28
- Ghosal, Ghosh, et al.
(Show Context)
Citation Context ...n the context of WASP are in the metric space (P2(Θ),W2). We now recall some basic concepts from nonparametric Bayes theory. Most of these concepts and definitions are based on fundamental results of =-=[12]-=-. Let C be the Borel σ-field on Θ and Πn be a (prior) probability measure on (Θ, C). Suppose that we observe random variables (X1, . . . , Xn) ≡ X(n) that are independent and identically distributed a... |

10 | Parallel MCMC via Weierstrass sampler. 2013. Available at arXiv:1312.4605
- Wang, Dunson
(Show Context)
Citation Context ...pplications that use simple parametric models. The third group obtains subset posteriors using some sampling algorithm and combines them by using kernel density estimation [18], Weierstrass transform =-=[23]-=-, or minimizing a loss defined on the reproducing kernel Hilbert space (RKHS) embedding of the subset posteriors [16]. These methods are flexible in that they are not restricted to a parametric class ... |

9 | Sinkhorn distances: Lightspeed computation of optimal transport
- Cuturi
- 2013
(Show Context)
Citation Context .... Exploiting the sparsity of the LP, we efficiently estimate the WB using standard software, such as Gurobi [13]. The WASP framework is inspired from recent developments in optimal transport problems =-=[8; 9]-=- and scalable Bayes methods [16]. Minsker et al. [16] proposed to use the geometric median of subset posteriors, calculated using a RKHS embedding that required choice of a kernel and corresponding ba... |

6 | Fast computation of Wasserstein barycenters - Cuturi, Coucet - 2014 |

4 |
Convex Optimization for Big Data: Scalable, Randomized, and Parallel Algorithms for Big Data Analytics
- Cevher, Becker, et al.
(Show Context)
Citation Context ...ce and Statistics (AISTATS) 2015, San Diego, CA, USA. JMLR: W&CP volume 38. Copyright 2015 by the authors. subset-specific optimization problems. One widely used and well understood framework is ADMM =-=[4; 6]-=-. In Bayesian statistics, one is faced with the more challenging problem of approximating a posterior measure for the unknown parameters instead of just obtaining a single point estimate of these para... |

4 | 2014a). Scalable and robust bayesian inference via the median posterior
- Minsker, Srivastava, et al.
(Show Context)
Citation Context ...m and combines them by using kernel density estimation [18], Weierstrass transform [23], or minimizing a loss defined on the reproducing kernel Hilbert space (RKHS) embedding of the subset posteriors =-=[16]-=-. These methods are flexible in that they are not restricted to a parametric class of models; however, the results can vary significantly depending on the choice of kernels without a principled approa... |

2 |
Blei (2011). Online variational inference for the hierarchical Dirichlet process
- Wang, Paisley, et al.
(Show Context)
Citation Context ...they can be made computationally efficient [24; 3; 5; 15; 14; 19]. The second group exploits the analytic form of posterior and uses computer architecture to improve the sampling time and convergence =-=[20; 22; 1]-=-. This approach is ideal for large-scale applications that use simple parametric models. The third group obtains subset posteriors using some sampling algorithm and combines them by using kernel densi... |

1 |
Duchi (2012). Distributed delayed stochastic optimization
- Agarwal, C
(Show Context)
Citation Context ...they can be made computationally efficient [24; 3; 5; 15; 14; 19]. The second group exploits the analytic form of posterior and uses computer architecture to improve the sampling time and convergence =-=[20; 22; 1]-=-. This approach is ideal for large-scale applications that use simple parametric models. The third group obtains subset posteriors using some sampling algorithm and combines them by using kernel densi... |

1 |
Gurobi Optimizer Reference Manual Version 6.0.0
- Inc
- 2014
(Show Context)
Citation Context ...inear program (LP) to estimate the (atomic) WB of atomic approximations of subset posteriors. Exploiting the sparsity of the LP, we efficiently estimate the WB using standard software, such as Gurobi =-=[13]-=-. The WASP framework is inspired from recent developments in optimal transport problems [8; 9] and scalable Bayes methods [16]. Minsker et al. [16] proposed to use the geometric median of subset poste... |

1 |
Dunson (2014b). Robust and scalable bayes via a median of subset posterior measures. arXiv preprint arXiv:1403.2660
- Minsker, Srivastava, et al.
(Show Context)
Citation Context ...s across columns. We now compare WASP’s performance with that of M-Posterior using the General Social Survey (GSS) data set from 2008 - 2010 for about 4100 responders that were used by Minsker et al. =-=[17]-=-. Following their approach, we use a Dirichlet Process mixture of product multinomial distributions, probabilistic parafac (pparafac), to model multivariate dependence in these data; see [10] for deta... |

1 |
Teh (2011). Bayesian learning via stochastic gradient Langevin dynamics
- Welling, W
(Show Context)
Citation Context |