Results 1 - 10
of
15
Attacks on privacy and de finetti’s theorem
- In SIGMOD
, 2009
"... In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti’s theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization s ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti’s theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization scheme that is vulnerable to this attack. In fact, any scheme that uses the random worlds model, i.i.d. model, or tuple-independent model needs to be re-evaluated. The difference between the attack presented here and others that have been proposed in the past is that we do not need extensive background knowledge. An attacker only needs to know the nonsensitive attributes of one individual in the data, and can carry out this attack just by building a machine learning model over the sanitized data. The reason this attack is successful is that it exploits a subtle flaw in the way prior work computed the probability of disclosure of a sensitive attribute. We demonstrate this theoretically, empirically, and with intuitive examples. We also discuss how this generalizes to many other privacy schemes.
Anonymizing Transaction Databases for Publication
"... This paper considers the problem of publishing “transaction data” for research purposes. Each transaction is an arbitrary set of items chosen from a large universe. Detailed transaction data provides an electronic image of one's life. This has two implications. One, transaction data are excellent ca ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper considers the problem of publishing “transaction data” for research purposes. Each transaction is an arbitrary set of items chosen from a large universe. Detailed transaction data provides an electronic image of one's life. This has two implications. One, transaction data are excellent candidates for data mining research. Two, use of transaction data would raise serious concerns over individual privacy. Therefore, before transaction data is released for data mining, it must be made anonymous so that data subjects cannot be re-identified. The challenge is that transaction data has no structure and can be extremely high dimensional. Traditional anonymization methods lose too much information on such data. To date, there has been no satisfactory privacy notion and solution proposed for anonymizing transaction data. This paper proposes one way to address this issue.
On the Tradeoff Between Privacy and Utility in Data Publishing
"... In data publishing, anonymization techniques such as generalization and bucketization have been designed to provide privacy protection. In the meanwhile, they reduce the utility of the data. It is important to consider the tradeoff between privacy and utility. In a paper that appeared in KDD 2008, B ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In data publishing, anonymization techniques such as generalization and bucketization have been designed to provide privacy protection. In the meanwhile, they reduce the utility of the data. It is important to consider the tradeoff between privacy and utility. In a paper that appeared in KDD 2008, Brickell and Shmatikov proposed an evaluation methodology by comparing privacy gain with utility gain resulted from anonymizing the data, and concluded that “even modest privacy gains require almost complete destruction of the data-mining utility”. This conclusion seems to undermine existing work on data anonymization. In this paper, we analyze the fundamental characteristics of privacy and utility, and show that it is inappropriate to directly compare privacy with utility. We then observe that the privacy-utility tradeoff in data publishing is similar to the risk-return tradeoff in financial investment, and propose an integrated framework for considering privacy-utility tradeoff, borrowing concepts from the Modern Portfolio Theory for financial investment. Finally, we evaluate our methodology on the Adult dataset from the UCI machine learning repository. Our results clarify several common misconceptions about data utility and provide data publishers useful guidelines on choosing the right tradeoff between privacy and utility.
Modeling and Integrating Background Knowledge in Data Anonymization
"... Abstract — Recent work has shown the importance of considering the adversary’s background knowledge when reasoning about privacy in data publishing. However, it is very difficult for the data publisher to know exactly the adversary’s background knowledge. Existing work cannot satisfactorily model ba ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract — Recent work has shown the importance of considering the adversary’s background knowledge when reasoning about privacy in data publishing. However, it is very difficult for the data publisher to know exactly the adversary’s background knowledge. Existing work cannot satisfactorily model background knowledge and reason about privacy in the presence of such knowledge. This paper presents a general framework for modeling the adversary’s background knowledge using kernel estimation methods. This framework subsumes different types of knowledge (e.g., negative association rules) that can be mined from the data. Under this framework, we reason about privacy using Bayesian inference techniques and propose the skyline (B, t)privacy model, which allows the data publisher to enforce privacy requirements to protect the data against adversaries with different levels of background knowledge. Through an extensive set of experiments, we show the effects of probabilistic background knowledge in data anonymization and the effectiveness of our approach in both privacy protection and utility preservation. I.
Publishing Sensitive Transactions for Itemset Utility
"... We consider the problem of publishing sensitive transaction data with privacy preservation. High dimensionality of transaction data poses unique challenges on data privacy and data utility. On one hand, re-identification attacks tend to use a subset of items that infrequently occur in transactions, ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We consider the problem of publishing sensitive transaction data with privacy preservation. High dimensionality of transaction data poses unique challenges on data privacy and data utility. On one hand, re-identification attacks tend to use a subset of items that infrequently occur in transactions, called moles. On the other hand, data mining applications typically depend on subsets of items that frequently occur in transactions, called nuggets. Thus the problem is how to eliminate all moles while retaining nuggets as much as possible. A challenge is that moles and nuggets are multi-dimensional with exponential growth and are tangled together by shared items. We present a novel and scalable solution to this problem. The novelty lies in a compact border data structure that eliminates the need of generating all moles and nuggets. 1.
Probabilistic Inference Protection on Anonymized Data
"... Abstract—Background knowledge is an important factor in privacy preserving data publishing. Probabilistic distributionbased background knowledge is a powerful kind of background knowledge which is easily accessible to adversaries. However, to the best of our knowledge, there is no existing work that ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—Background knowledge is an important factor in privacy preserving data publishing. Probabilistic distributionbased background knowledge is a powerful kind of background knowledge which is easily accessible to adversaries. However, to the best of our knowledge, there is no existing work that can provide a privacy guarantee under adversary attack with such background knowledge. The difficulty of the problem lies in the high complexity of the probability computation and the non-monotone nature of the privacy condition. The only solution known to us relies on approximate algorithms with no known error bound. In this paper, we propose an algorithm that overcomes the difficulties of the problem by introducing a bounding condition on probability deviations in the anonymized data groups, which is much easier to compute and which is a monotone function on the grouping sizes. This bounding condition is also in harmony with the utility preservation objective. Our empirical studies show that our method preserves data utilities at a higher or comparable level when compared with some state-of-the-art algorithms that provide less protection. I.
Anonymization-based attacks in privacy-preserving data publishing
, 2009
"... Data publishing generates much concern over the protection of individual privacy. Recent studies consider cases where the adversary may possess different kinds of knowledge about the data. In this article, we show that knowledge of the mechanism or algorithm of anonymization for data publication can ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Data publishing generates much concern over the protection of individual privacy. Recent studies consider cases where the adversary may possess different kinds of knowledge about the data. In this article, we show that knowledge of the mechanism or algorithm of anonymization for data publication can also lead to extra information that assists the adversary and jeopardizes individual privacy. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. We call such an attack a minimality attack. In this article, we introduce a model called m-confidentiality which deals with minimality attacks, and propose a feasible solution. Our experiments show that minimality attacks are practical concerns on real datasets and that our algorithm can prevent such attacks with very little overhead and information loss.
Can the utility of anonymized data be used for privacy breaches
- TKDD
"... Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes k-anonymity, l-diversity, and t-closeness, to name a few. The goal of this article is to raise a fundamental issue regarding the p ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes k-anonymity, l-diversity, and t-closeness, to name a few. The goal of this article is to raise a fundamental issue regarding the privacy exposure of the approaches using group based anonymization. This has been overlooked in the past. The group based anonymization approach by bucketization basically hides each individual record behind a group to preserve data privacy. If not properly anonymized, patterns can actually be derived from the published data and be used by an adversary to breach individual privacy. For example, from the medical records released, if patterns such as that people from certain countries rarely suffer from some disease can be derived, then the information can be used to imply linkage of other people in an anonymized group with this disease with higher likelihood. We call the derived patterns from the published data the foreground knowledge. This is in contrast to the background knowledge that the adversary may obtain from other channels, as studied in some previous work. Finally, our experimental results show such an attack is realistic in the privacy benchmark dataset under the traditional group based anonymization approach.
Slicing: A New Approach to Privacy Preserving Data Publishing. Arxiv preprint arXiv:0909.2290
- In Proceedings of Security and Privacy
, 2009
"... Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasiidentifying attributes and sensitive attributes. In this paper, we present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the ℓ-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. Our experiments also demonstrate that slicing can be used to prevent membership disclosure. 1.
Minimizing Minimality and Maximizing Utility: Analyzing Method-based attacks on Anonymized Data
"... The principle of anonymization for data sharing has become a very popular paradigm for the preservation of privacy of the data subjects. Since the introduction of k-anonymity, dozens of methods and enhanced privacy definitions have been proposed. However, over-eager attempts to minimize the informat ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The principle of anonymization for data sharing has become a very popular paradigm for the preservation of privacy of the data subjects. Since the introduction of k-anonymity, dozens of methods and enhanced privacy definitions have been proposed. However, over-eager attempts to minimize the information lost by the anonymization potentially allow private information to be inferred. Proof-ofconcept of this “minimality attack ” has been demonstrated for a variety of algorithms and definitions [16]. In this paper, we provide a comprehensive analysis and study of this attack, and demonstrate that with care its effect can be almost entirely countered. The attack allows an adversary to increase his (probabilistic) belief in certain facts about individuals over the data. We show that (a) a large class of algorithms are not affected by this attack, (b) for a class of algorithms that have a “symmetric ” property, the attacker’s belief increases by at most a small constant, and (c) even for an algorithm chosen to be highly susceptible to the attack, the attacker’s belief when using the attack increases by at most a small constant factor. We also provide a series of experiments that show in all these cases that the confidence about the sensitive value of any individual remains low in practice, while the published data is still useful for its intended purpose. From this, we conclude that the impact of such method-based attacks can be minimized. 1.

