Results 1 - 10
of
219
Differential privacy via wavelet transforms
- In ICDE
, 2010
"... Abstract — Privacy preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, ɛ-differential privacy provides one of the strongest privacy guarantees. Existing data publishing methods that achieve ɛ-differential privacy, however, offer litt ..."
Abstract
-
Cited by 100 (8 self)
- Add to MetaCart
(Show Context)
Abstract — Privacy preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, ɛ-differential privacy provides one of the strongest privacy guarantees. Existing data publishing methods that achieve ɛ-differential privacy, however, offer little data utility. In particular, if the output dataset is used to answer count queries, the noise in the query answers can be proportional to the number of tuples in the data, which renders the results useless. In this paper, we develop a data publishing technique that ensures ɛ-differential privacy while providing accurate answers for range-count queries, i.e., count queries where the predicate on each attribute is a range. The core of our solution is a framework that applies wavelet transforms on the data before adding noise to it. We present instantiations of the proposed framework for both ordinal and nominal data, and we provide a theoretical analysis on their privacy and utility guarantees. In an extensive experimental study on both real and synthetic data, we show the effectiveness and efficiency of our solution. I.
Differentially Private Data Cubes: Optimizing Noise Sources and Consistency
"... Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table’s dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishi ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table’s dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals ’ privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln |L | +1) 2 of the optimal, where |L | is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 − 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.
Differentially Private Data Release through Multidimensional Partitioning
"... Abstract. Differential privacy is a strong notion for protecting individual privacy in privacy preserving data analysis or publishing. In this paper, we study the problem of differentially private histogram release based on an interactive differential privacy interface. We propose two multidimension ..."
Abstract
-
Cited by 40 (13 self)
- Add to MetaCart
Abstract. Differential privacy is a strong notion for protecting individual privacy in privacy preserving data analysis or publishing. In this paper, we study the problem of differentially private histogram release based on an interactive differential privacy interface. We propose two multidimensional partitioning strategies including a baseline cell-based partitioning and an innovative kd-tree based partitioning. In addition to providing formal proofs for differential privacy and usefulness guarantees for linear distributive queries, we also present a set of experimental results and demonstrate the feasibility and performance of our method. 1
On the Tradeoff Between Privacy and Utility in Data Publishing
"... In data publishing, anonymization techniques such as generalization and bucketization have been designed to provide privacy protection. In the meanwhile, they reduce the utility of the data. It is important to consider the tradeoff between privacy and utility. In a paper that appeared in KDD 2008, B ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
(Show Context)
In data publishing, anonymization techniques such as generalization and bucketization have been designed to provide privacy protection. In the meanwhile, they reduce the utility of the data. It is important to consider the tradeoff between privacy and utility. In a paper that appeared in KDD 2008, Brickell and Shmatikov proposed an evaluation methodology by comparing privacy gain with utility gain resulted from anonymizing the data, and concluded that “even modest privacy gains require almost complete destruction of the data-mining utility”. This conclusion seems to undermine existing work on data anonymization. In this paper, we analyze the fundamental characteristics of privacy and utility, and show that it is inappropriate to directly compare privacy with utility. We then observe that the privacy-utility tradeoff in data publishing is similar to the risk-return tradeoff in financial investment, and propose an integrated framework for considering privacy-utility tradeoff, borrowing concepts from the Modern Portfolio Theory for financial investment. Finally, we evaluate our methodology on the Adult dataset from the UCI machine learning repository. Our results clarify several common misconceptions about data utility and provide data publishers useful guidelines on choosing the right tradeoff between privacy and utility.
Differentially Private Data Release for Data Mining
- Proc. KDD 11
, 2011
"... Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful infor-mation. Among the existing privacy models, -differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary’s background knowledge. Most ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful infor-mation. Among the existing privacy models, -differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary’s background knowledge. Most of the existing solutions that ensure -differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymiza-tion algorithm for the non-interactive setting based on the generalization technique. The proposed solution first prob-abilistically generalizes the raw data and then adds noise to guarantee -differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimen-tal results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.
Analyzing multicore dumps to facilitate concurrency bug reproduction
- In ASPLOS
, 2010
"... Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacerbated in multi-core environments where the ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
(Show Context)
Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacerbated in multi-core environments where there are multiple schedulers, one for each core. In this paper, we propose a reproduction technique for concurrent programs that execute on multi-core platforms. Our technique performs a lightweight analysis of a failing execution that occurs in a multi-core environment, and uses the result of the analysis to enable reproduction of the bug in a singlecore system, under the control of a deterministic scheduler. More specifically, our approach automatically identifies the execution point in the re-execution that corresponds to the failure point. It does so by analyzing the failure core dump and leveraging a technique called execution indexing that identifies a related point in the re-execution. By generating a core dump at this point, and comparing the differences betwen the two dumps, we are able to guide a search algorithm to efficiently generate a failure inducing schedule. Our experiments show that our technique is highly effective and has reasonable overhead.
Information-theoretic bounds for differentially private mechanisms
- In 24rd IEEE Computer Security Foundations Symposium, CSF 2011. IEEE Computer Society, Los Alamitos
"... Abstract—There are two active and independent lines of research that aim at quantifying the amount of information that is disclosed by computing on confidential data. Each line of research has developed its own notion of confidentiality: on the one hand, differential privacy is the emerging consensu ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Abstract—There are two active and independent lines of research that aim at quantifying the amount of information that is disclosed by computing on confidential data. Each line of research has developed its own notion of confidentiality: on the one hand, differential privacy is the emerging consensus guarantee used for privacy-preserving data analysis. On the other hand, information-theoretic notions of leakage are used for characterizing the confidentiality properties of programs in language-based settings. The purpose of this article is to establish formal connections between both notions of confidentiality, and to compare them in terms of the security guarantees they deliver. We obtain the following results. First, we establish upper bounds for the leakage of every ɛ-differentially private mechanism in terms of ɛ and the size of the mechanism’s input domain. We achieve this by identifying and leveraging a connection to coding theory. Second, we construct a class of ɛ-differentially private channels whose leakage grows with the size of their input domains. Using these channels, we show that there cannot be domain-size-independent bounds for the leakage of all ɛ-differentially private mechanisms. Moreover, we perform an empirical evaluation that shows that the leakage of these channels almost matches our theoretical upper bounds, demonstrating the accuracy of these bounds. Finally, we show that the question of providing optimal upper bounds for the leakage of ɛ-differentially private mechanisms in terms of rational functions of ɛ is in fact decidable.
Towards Privacy for Social Networks: A Zero-Knowledge Based Definition of Privacy
"... Abstract. We put forward a zero-knowledge based definition of privacy. Our notion is strictly stronger than the notion of differential privacy and is particularly attractive when modeling privacy in social networks. We furthermore demonstrate that it can be meaningfully achieved for tasks such as co ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
(Show Context)
Abstract. We put forward a zero-knowledge based definition of privacy. Our notion is strictly stronger than the notion of differential privacy and is particularly attractive when modeling privacy in social networks. We furthermore demonstrate that it can be meaningfully achieved for tasks such as computing averages, fractions, histograms, and a variety of graph parameters and properties, such as average degree and distance to connectivity. Our results are obtained by establishing a connection between zero-knowledge privacy and sample complexity, and by leveraging recent sublinear time algorithms. 1
Trajectory privacy in location-based services and data publication
- SIGKDD Expl
"... The ubiquity of mobile devices with global positioning functionality (e.g., GPS and AGPS) and Internet connectivity (e.g., 3G andWi-Fi)hasresultedinwidespread development of location-based services (LBS). Typical examples of LBS include local business search, e-marketing, social networking, and auto ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
The ubiquity of mobile devices with global positioning functionality (e.g., GPS and AGPS) and Internet connectivity (e.g., 3G andWi-Fi)hasresultedinwidespread development of location-based services (LBS). Typical examples of LBS include local business search, e-marketing, social networking, and automotive traffic monitoring. Although LBS provide valuable services for mobile users, revealing their privatelocations topotentially untrustedLBSservice providers pose privacy concerns. In general, there are two types of LBS, namely, snapshot and continuous LBS. For snapshot LBS, a mobile user only needs to report its current location to a service provider once to get its desired information. On the other hand, a mobile user has to report its location to a service provider in a periodic or on-demand manner to obtain its desired continuous LBS. Protecting user location privacy for continuous LBS is more challenging than snapshot LBS because adversaries may use the spatial and temporal correlations in the user’s location samples to infer the user’s location information with higher certainty. Such user location trajectories are also very important for many applications, e.g., business analysis, city planning, and intelligent transportation. However, publishing such location trajectories to the public or a third party for data analysis could pose serious privacy concerns. Privacy protection in continuous LBS and trajectory data publication has increasingly drawn attention from the research community and industry. In this survey, we give an overview of the state-of-the-art privacy-preserving techniques in these two problems. 1.
Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers
"... Abstract. There is an increasing need for sharing data repositories containing personal information across multiple distributed and private databases. However, such data sharing is subject to constraints imposed by privacy of individuals or data subjects as well as data confidentiality of institutio ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
Abstract. There is an increasing need for sharing data repositories containing personal information across multiple distributed and private databases. However, such data sharing is subject to constraints imposed by privacy of individuals or data subjects as well as data confidentiality of institutions or data providers. Concretely, given a query spanning multiple databases, query results should not contain individually identifiable information. In addition, institutions should not reveal their databases to each other apart from the query results. In this paper, we develop a set of decentralized protocols that enable data sharing for horizontally partitioned databases given these constraints. Our approach includes a new notion, l-site-diversity, for data anonymization to ensure anonymity of data providers in addition to that of data subjects, and a distributed anonymization protocol that allows independent data providers to build a virtual anonymized database while maintaining both privacy constraints. 1