#### DMCA

## Differentially private synthesization of multi-dimensional data using copula functions (2014)

Venue: | In EDBT |

Citations: | 4 - 4 self |

### Citations

1249 |
An Introduction to Copulas, in
- Nelsen
- 1999
(Show Context)
Citation Context ...al dimension, and 2) the dependence among the dimensions. Copula functions have been shown to be effective for modeling high-dimensional joint distributions based on continuous marginal distributions =-=[31, 34, 4, 24]-=-. They are particularly attractive due to several reasons. First, when we have more margins’ (Marginal distribution is shortened as margin in the paper) information than the joint distribution of all ... |

745 |
Multivariate models and dependence concepts
- Joe
- 1997
(Show Context)
Citation Context ...the Sklar’s theorem [36] stating that copulas are functions connecting multivariate distributions to their one-dimension marginal distributions. An axiomatic definition of copulas can be found in Joe =-=[20]-=- and Nelsen [31]. Copula functions have been widely applied in statistics and finance in recent years (e.g. [34]). 3. PRELIMINARIES Consider an original dataset D that contains a data vector (X1, X2, ... |

618 | Calibrating noise to sensitivity in private data analysis
- Dwork, McSherry, et al.
- 2006
(Show Context)
Citation Context ...Proceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0 . A common mechanism to achieve differential privacy is the Laplace mechanism =-=[16]-=- which injects calibrated noise to a statistical measure determined by the privacy budget , and the sensitivity of the statistical measure influenced by the inclusion and exclusion of a record in the... |

617 | Differential privacy
- Dwork
- 2006
(Show Context)
Citation Context ...le dimensional histograms. The technique of [33] is proposed especially for two dimensional data. We discuss and compare the methods for multi-dimensional histograms below. The method by Dwork et al. =-=[13]-=- publishes a DP histogram by adding independent Laplace random noise to the count of each histogram bin. While the method works well for low-dimensional data, it becomes problematic for high dimension... |

342 |
On default correlation: a copula function approach
- Li
- 2000
(Show Context)
Citation Context ...al dimension, and 2) the dependence among the dimensions. Copula functions have been shown to be effective for modeling high-dimensional joint distributions based on continuous marginal distributions =-=[31, 34, 4, 24]-=-. They are particularly attractive due to several reasons. First, when we have more margins’ (Marginal distribution is shortened as margin in the paper) information than the joint distribution of all ... |

244 | Differential privacy: A survey of results
- Dwork
- 2008
(Show Context)
Citation Context ...t DPCopula generates highly accurate synthetic multidimensional data with significantly better utility than stateof-the-art techniques. 1. INTRODUCTION Privacy preserving data analysis and publishing =-=[14, 15, 3]-=- has received considerable attention in recent years as a promising approach for sharing information while preserving data privacy. Differential privacy [14, 15, 22] has recently emerged as one of the... |

217 | A learning theory approach to non-interactive database privacy
- Blum, Ligett, et al.
(Show Context)
Citation Context ... 1000 Age Co un t (b) Marginal Histogram for Age Figure 1: Dataset vs. histogram illustration A growing number of works started addressing non-interactive data release with differential privacy (e.g. =-=[5, 27, 39, 19, 12, 41, 9, 10]-=-). Given an original dataset, the goal is to publish a DP statistical summary such as marginal or multi-dimensional histograms that can be used to answer predicate queries or to generate DP synthetic ... |

203 | Privacy-preserving data publishing: A survey of recent developments
- Fung, Wang, et al.
(Show Context)
Citation Context ...t DPCopula generates highly accurate synthetic multidimensional data with significantly better utility than stateof-the-art techniques. 1. INTRODUCTION Privacy preserving data analysis and publishing =-=[14, 15, 3]-=- has received considerable attention in recent years as a promising approach for sharing information while preserving data privacy. Differential privacy [14, 15, 22] has recently emerged as one of the... |

203 | Mechanism design via differential privacy
- McSherry, Talwar
(Show Context)
Citation Context ...measure influenced by the inclusion and exclusion of a record in the dataset. A lower privacy parameter requires larger noise to be added and provides a higher level of privacy. Many mechanisms (e.g. =-=[14, 18, 29]-=-) have been proposed for achieving differential privacy for a single computation or a given analytical task and programming platforms have been implemented for supporting interactive differentially pr... |

132 | A firm foundation for private data analysis
- Dwork
(Show Context)
Citation Context ...t DPCopula generates highly accurate synthetic multidimensional data with significantly better utility than stateof-the-art techniques. 1. INTRODUCTION Privacy preserving data analysis and publishing =-=[14, 15, 3]-=- has received considerable attention in recent years as a promising approach for sharing information while preserving data privacy. Differential privacy [14, 15, 22] has recently emerged as one of the... |

111 | The t copula and related copulas
- Demarta, McNeil
(Show Context)
Citation Context ...nk correlation is a well-accepted rank correlation measure of concordance for bivariate random vectors. The definition of Kendall’s τ is given as follows: Definition 3.5 (Kendall’s τ rank correlation =-=[11]-=-). The population version of Kendall’s τ rank correlation has the form: ρτ (Xj ,Xk) = E(sign(xi1,j − xi2,j)(xi1,k − xi2,k)) where (xi1,j , xi1,k) and (xi2,j , xi2,k) are two different independent pair... |

96 | consistency too: a holistic solution to contingency table release. InPODS - Privacy - 2007 |

94 |
An Introduction to Credit Risk Modeling
- Bluhm, Overbeck, et al.
- 2003
(Show Context)
Citation Context ...al dimension, and 2) the dependence among the dimensions. Copula functions have been shown to be effective for modeling high-dimensional joint distributions based on continuous marginal distributions =-=[31, 34, 4, 24]-=-. They are particularly attractive due to several reasons. First, when we have more margins’ (Marginal distribution is shortened as margin in the paper) information than the joint distribution of all ... |

93 | Differential privacy via wavelet transforms
- Xiao, Wang, et al.
- 2010
(Show Context)
Citation Context ... 1000 Age Co un t (b) Marginal Histogram for Age Figure 1: Dataset vs. histogram illustration A growing number of works started addressing non-interactive data release with differential privacy (e.g. =-=[5, 27, 39, 19, 12, 41, 9, 10]-=-). Given an original dataset, the goal is to publish a DP statistical summary such as marginal or multi-dimensional histograms that can be used to answer predicate queries or to generate DP synthetic ... |

88 | Optimizing linear counting queries under differential privacy
- Li, Hay, et al.
- 2010
(Show Context)
Citation Context ...rted addressing effective query answering in the interactive setting with differential privacy given a query workload or batch queries by considering the correlations between queries or query history =-=[38, 8, 43, 23, 42]-=-. id Age Hours/ week Edu … 1 50 13 13 … 2 38 40 9 … 3 53 40 7 … 4 28 40 13 … … … … … … PDF 文件使用 "pdfFactory Pro" 试用版本创建 www.fineprint.cn (a) Dataset 0 20 40 60 80 100 0 200 400 600 800 1000 Age Co un ... |

86 | Privacy: Theory meets practice on the map
- Machanavajjhala, Abowd, et al.
- 2008
(Show Context)
Citation Context ... 1000 Age Co un t (b) Marginal Histogram for Age Figure 1: Dataset vs. histogram illustration A growing number of works started addressing non-interactive data release with differential privacy (e.g. =-=[5, 27, 39, 19, 12, 41, 9, 10]-=-). Given an original dataset, the goal is to publish a DP statistical summary such as marginal or multi-dimensional histograms that can be used to answer predicate queries or to generate DP synthetic ... |

73 | No free lunch in data privacy
- Kifer, Machanavajjhala
- 2011
(Show Context)
Citation Context ...erving data analysis and publishing [14, 15, 3] has received considerable attention in recent years as a promising approach for sharing information while preserving data privacy. Differential privacy =-=[14, 15, 22]-=- has recently emerged as one of the strongest privacy guarantees for statistical data release. A statistical aggregation or computation is DP1 if the outcome is formally indistinguishable when run wit... |

61 |
Copulas for finance a reading guide and some applications. unpublished
- Durrleman, Nikeghbali, et al.
- 2000
(Show Context)
Citation Context ...pulas and testing the goodness-of-fit for the best copula as our future work. Formally, we give the Gaussian copula and Gaussian dependence definitions as follows: Definition 3.4 (The Gaussian Copula =-=[6]-=-). Deducing via Sklar’s theorem, a multivariate Gaussian density can be written as the product of two components: the Gaussian 478 dependence and margins, denoted as ΦP(x) = 1 |P| 12 exp { − 1 2 φ −1 ... |

48 | High dimensional semiparametric Gaussian copula graphical models. Arxiv preprint arXiv:1202.2169
- Liu, Han, et al.
- 2012
(Show Context)
Citation Context ...ss is the most commonly used, including Gaussian copula and t copula. In this paper, we focus on the semi-parametric Gaussian copula as it has better convergence properties for multi-dimensional data =-=[26]-=- and most real-world highdimensional data follow the Gaussian dependence structure [31] that can be modeled by the Gaussian copula. We note that Gaussian copula is not to be confused with Gaussian dis... |

47 | Nonparametric Pricing of Multivariate Contingent Claims
- Rosenberg
- 2003
(Show Context)
Citation Context |

42 | Differential privacy for statistics: What we know and what we want to learn
- Dwork, Smith
- 2003
(Show Context)
Citation Context ...all Figure 4: DPCopula Overview 4.1 DPCopula-MLE One basic method of DPCopula is to first compute DP marginal histograms, then estimate DP correlation matrix using the DP MLE method proposed by Dwork =-=[17]-=-, then sample DP synthetic data. We illustrate this algorithm schematically in Figure 4(a). Algorithm 1 presents the steps of DPCopula-MLE. We present the details of each step below. Computing DP marg... |

41 | Data mining with differential privacy
- Friedman, Schuster
- 2010
(Show Context)
Citation Context ...measure influenced by the inclusion and exclusion of a record in the dataset. A lower privacy parameter requires larger noise to be added and provides a higher level of privacy. Many mechanisms (e.g. =-=[14, 18, 29]-=-) have been proposed for achieving differential privacy for a single computation or a given analytical task and programming platforms have been implemented for supporting interactive differentially pr... |

39 | Differentially private spatial decompositions
- Cormode, Procopiuc, et al.
- 2012
(Show Context)
Citation Context |

37 | Differentially private data cubes: optimizing noise sources and consistency
- Ding, Winslett, et al.
- 2011
(Show Context)
Citation Context |

37 | A statistical framework for differential privacy
- Wasserman, Zhou
(Show Context)
Citation Context ...e empirical CDF based on the private histogram, F̂n(t) is the empirical CDF based on the original histogram, and F (t) is the population CDF when n tends to be infinity. Proof. Due to the analysis in =-=[37]-=-, we can deduce that the discrimination of F̂n(t) and F̃n(t) is bounded byO( logm n ). Hence, we can achieve that limn→∞ |F̃n(t)−F̂n(t)| = 0 leading to limn→∞ F̃n(t) = limn→∞ F̂n(t) and the conclusion... |

31 | Differentially private data release for data mining
- Mohammed, Chen, et al.
- 2011
(Show Context)
Citation Context ... data that does not follow multinomial distribution. Differentially private histogram generation. Various approaches have been proposed recently for publishing differentially private histograms (e.g. =-=[2, 40, 19, 41, 9, 10, 30, 1, 33]-=-. Among them, the methods of [19] and [41] are designed for single dimensional histograms. The technique of [33] is proposed especially for two dimensional data. We discuss and compare the methods for... |

29 |
Transformation of non positive semidefinite correlation matrices
- Rousseeuw, Molenberghs
- 1993
(Show Context)
Citation Context ...r experience when 2 is not too small, 2 ≥ 0.001). In this case, P̃ can be transformed to be positive definite using postprocessing methods like the eigenvalue procedure proposed by Rousseeuw et al. =-=[35]-=-. Algorithm 5 presents detailed steps of DP correlation coefficient matrix computation. Algorithm 5 Computing differentially private correlation coefficient matrix Input: Original data vector (X1, . .... |

25 |
Information preserving statistical obfuscation
- Burridge
(Show Context)
Citation Context ...nal data. Synthetic data can be used in preserving privacy and confidentiality of the original data. Numerous techniques have been proposed for generating privacy-preserving synthetic data (e.g. [21],=-=[7]-=-). But they do not provide formal privacy guarantees. Machanavajjhala et al. [27] presented a probabilistic DP Multinomial-Dirichlet (MD) synthesizer mechanism. They model the original map data using ... |

22 | Differentially private histogram publication
- Xu, Zhang, et al.
- 2012
(Show Context)
Citation Context |

12 | Differentially private summaries of sparse data
- Cormode, Procopiuc, et al.
- 2012
(Show Context)
Citation Context ...et Attribute Domain size Age 95 Gender 2 Disability 2 Nativity 2 Number of Years 31 Education 140 Working hours per week 95 Annual income 586 methods [9], Filter Priority (FP) with consistency checks =-=[10]-=-, and P-HP [1]. Among these methods, we observed that PSD and P-HP consistently outperform others in most settings. Hence, after presenting a complete comparison on US dataset, we only show PSD and P-... |

12 | Output perturbation with query relaxation
- Xiao, Tao
- 2008
(Show Context)
Citation Context ...rted addressing effective query answering in the interactive setting with differential privacy given a query workload or batch queries by considering the correlations between queries or query history =-=[38, 8, 43, 23, 42]-=-. id Age Hours/ week Edu … 1 50 13 13 … 2 38 40 9 … 3 53 40 7 … 4 28 40 13 … … … … … … PDF 文件使用 "pdfFactory Pro" 试用版本创建 www.fineprint.cn (a) Dataset 0 20 40 60 80 100 0 200 400 600 800 1000 Age Co un ... |

12 | Low-rank mechanism: Optimizing batch queries under differential privacy
- Yuan, Zhang, et al.
(Show Context)
Citation Context ...rted addressing effective query answering in the interactive setting with differential privacy given a query workload or batch queries by considering the correlations between queries or query history =-=[38, 8, 43, 23, 42]-=-. id Age Hours/ week Edu … 1 50 13 13 … 2 38 40 9 … 3 53 40 7 … 4 28 40 13 … … … … … … PDF 文件使用 "pdfFactory Pro" 试用版本创建 www.fineprint.cn (a) Dataset 0 20 40 60 80 100 0 200 400 600 800 1000 Age Co un ... |

10 | Differentially private grids for geospatial data
- Qardaji, Yang, et al.
(Show Context)
Citation Context ... data that does not follow multinomial distribution. Differentially private histogram generation. Various approaches have been proposed recently for publishing differentially private histograms (e.g. =-=[2, 40, 19, 41, 9, 10, 30, 1, 33]-=-. Among them, the methods of [19] and [41] are designed for single dimensional histograms. The technique of [33] is proposed especially for two dimensional data. We discuss and compare the methods for... |

9 | Differentially private histogram publishing through lossy compression
- Acs, Castelluccia, et al.
- 2012
(Show Context)
Citation Context ... data that does not follow multinomial distribution. Differentially private histogram generation. Various approaches have been proposed recently for publishing differentially private histograms (e.g. =-=[2, 40, 19, 41, 9, 10, 30, 1, 33]-=-. Among them, the methods of [19] and [41] are designed for single dimensional histograms. The technique of [33] is proposed especially for two dimensional data. We discuss and compare the methods for... |

8 |
integrated queries: an extensible platform for privacy-preserving data analysis
- Privacy
(Show Context)
Citation Context ...ing differential privacy for a single computation or a given analytical task and programming platforms have been implemented for supporting interactive differentially private queries or data analysis =-=[28]-=-. Due to the composibility of differential privacy [28], given an overall privacy budget constraint, it has to be allocated to subroutines in the computation or each query in a query sequence to ensur... |

7 | Accurate and efficient private release of datacubes and contingency tables
- Yaroslavtsev, Cormode, et al.
- 2013
(Show Context)
Citation Context |

4 |
Boosting the accuracy of differentially-private histograms through consistency,” VLDB
- Hayy, Rastogiz, et al.
- 2010
(Show Context)
Citation Context |

4 | Dpcube: Releasing differentially private data cubes for health information (demo paper - Xiao, Gardner, et al. - 2012 |

3 | Integrating historical noisy answers for improving data utility under differential privacy
- Chen, Shuigeng, et al.
- 2012
(Show Context)
Citation Context |

1 |
disclosure limitation in longitudinal linked data
- Reiter
- 2001
(Show Context)
Citation Context ...original data. Synthetic data can be used in preserving privacy and confidentiality of the original data. Numerous techniques have been proposed for generating privacy-preserving synthetic data (e.g. =-=[21]-=-,[7]). But they do not provide formal privacy guarantees. Machanavajjhala et al. [27] presented a probabilistic DP Multinomial-Dirichlet (MD) synthesizer mechanism. They model the original map data us... |

1 |
A limit theorem for copulas. Dwonload from www-m4.ma.tum.de/m4/pers/lindner
- Lindner, Szimayer
- 2003
(Show Context)
Citation Context ...copulas. Each {F1i}, . . . , {Fmi} and Ci correspond to Di, i ∈ {1, . . . , t} and are parameterized by the number of records of Di. We have the following theorem: Theorem 3.3 (Convergence of Copulas =-=[25]-=-). For every t in N+, a m-dimensional joint distribution function Ht is defined as Ht(x1, . . . , xm) := Ct(F1t(x1), . . . , Fmt(xm)). Then the sequence {Ht} converges to H0 in distribution, if and on... |

1 |
Fonctions de ŕl ↪epartition ĺd’ n dimensions et leurs marges. Publications de l’Institut de statistique de l’Universit́l ↪e de
- Sklar
- 1959
(Show Context)
Citation Context ...P method [1] as representatives of the general-purpose histogram methods. Copula functions. The idea of copula was shown dating back to 1940’s, and the term copula was provided by the Sklar’s theorem =-=[36]-=- stating that copulas are functions connecting multivariate distributions to their one-dimension marginal distributions. An axiomatic definition of copulas can be found in Joe [20] and Nelsen [31]. Co... |