Results 1 - 10
of
315
ℓ-diversity: Privacy beyond k-anonymity
- IN ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract
-
Cited by 672 (13 self)
- Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a k-anonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently.
Protecting respondents’ identities in microdata release
- In IEEE Transactions on Knowledge and Data Engineering (TKDE
, 2001
"... Today’s globally networked society places great demand on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call today for the release of specific data (microdata). In order to protect the anonymity of the ..."
Abstract
-
Cited by 512 (32 self)
- Add to MetaCart
Today’s globally networked society places great demand on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call today for the release of specific data (microdata). In order to protect the anonymity of the entities (called respondents) to which information refers, data holders often remove or encrypt explicit identifiers such as names, addresses, and phone numbers. De-identifying data, however, provides no guarantee of anonymity. Released information often contains other data, such as race, birth date, sex, and ZIP code, that can be linked to publicly available information to re-identify respondents and inferring information that was not intended for disclosure. In this paper we address the problem of releasing microdata while safeguarding the anonymity of the respondents to which the data refer. The approach is based on the definition of k-anonymity. A table provides k-anonymity if attempts to link explicitly identifying information to its content map the information to at least k entities. We illustrate how k-anonymity can be provided without compromising the integrity (or truthfulness) of the information released by using generalization and suppression techniques. We introduce the concept of minimal generalization that captures the property of the release process not to distort the data more than needed to achieve k-anonymity, and present an algorithm for the computation of such a generalization. We also discuss possible preference policies to choose among different minimal generalizations. Index terms:
Incognito: efficient full-domain k-anonymity
- In Proc. of SIGMOD
"... A number of organizations publish microdata for purposes such as public health and demographic research. Although attributes that clearly identify individuals, such as Name and Social Security Number, are generally removed, these databases can sometimes be joined with other public databases on attri ..."
Abstract
-
Cited by 304 (5 self)
- Add to MetaCart
(Show Context)
A number of organizations publish microdata for purposes such as public health and demographic research. Although attributes that clearly identify individuals, such as Name and Social Security Number, are generally removed, these databases can sometimes be joined with other public databases on attributes such as Zipcode, Sex, and Birthdate to re-identify individuals who were supposed to remain anony-mous. “Joining ” attacks are made easier by the availability of other, complementary, databases over the Internet. K-anonymization is a technique that prevents joining at-tacks by generalizing and/or suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k. In this paper, we pro-vide a practical framework for implementing one model of k-anonymization, called full-domain generalization. We intro-duce a set of algorithms for producing minimal full-domain generalizations, and show that these algorithms perform up to an order of magnitude faster than previous algorithms on two real-life databases. Besides full-domain generalization, numerous other mod-els have also been proposed for k-anonymization. The sec-ond contribution in this paper is a single taxonomy that categorizes previous models and introduces some promising new alternatives. 1.
Differential privacy: A survey of results
- In Theory and Applications of Models of Computation
, 2008
"... Abstract. Over the past five years a new approach to privacy-preserving ..."
Abstract
-
Cited by 258 (0 self)
- Add to MetaCart
Abstract. Over the past five years a new approach to privacy-preserving
Mondrian multidimensional k-anonymity
- in Proc. 22nd ICDE. IEEE
"... K-Anonymity has been proposed as a mechanism for privacy protection in microdata publishing, and numerous recoding “models ” have been considered for achieving k-anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (sing ..."
Abstract
-
Cited by 255 (5 self)
- Add to MetaCart
(Show Context)
K-Anonymity has been proposed as a mechanism for privacy protection in microdata publishing, and numerous recoding “models ” have been considered for achieving k-anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Often this flexibility leads to higher-quality anonymizations, as measured both by general-purpose metrics, as well as more specific notions of query answerability. In this paper, we prove that optimal multidimensional anonymization is NP-hard (like previous k-anonymity models). However, we introduce a simple, scalable, greedy algorithm that produces anonymizations that are a constantfactor approximation of optimal. Experimental results show that this greedy algorithm frequently leads to more desirable anonymizations than two optimal exhaustive-search algorithms for single-dimensional models. 1.
Transforming Data to Satisfy Privacy Constraints
, 2002
"... Data on individuals and entities are being collected widely. These data can contain information that explicitly identifies the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying ..."
Abstract
-
Cited by 250 (0 self)
- Add to MetaCart
Data on individuals and entities are being collected widely. These data can contain information that explicitly identifies the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying when linked with other available data sets. Data are often shared for business or legal reasons. This paper addresses the important issue of preserving the anonymity of the individuals or entities during the data dissemination process. We explore preserving the anonymity by the use of generalizations and suppressions on the potentially identifying portions of the data. We extend earlier works in this area along various dimensions. First, satisfying privacy constraints is considered in conjunction with the usage for the data being disseminated. This allows us to optimize the process of preserving privacy for the specified usage. In particular, we investigate the privacy transformation in the context of data mining applications like building classification and regression models. Second, our work improves on previous approaches by allowing more flexible generalizations for the data. Lastly, this is combined with a more thorough exploration of the solution space using the genetic algorithm framework. These extensions allow us to transform the data so that they are more useful for their intended purpose while satisfying the privacy constraints.
Privacy-Preserving Data Publishing: A Survey on Recent Developments
"... The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange an ..."
Abstract
-
Cited by 219 (16 self)
- Add to MetaCart
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published, and agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.
Location Privacy in Mobile Systems: A Personalized Anonymization Model
- IN: ICDCS
, 2005
"... This paper describes a personalized k-anonymity model for protecting location privacy against various privacy threats through location information sharing. Our model has two unique features. First, we provide a unified privacy personalization framework to support location k-anonymity for a wide rang ..."
Abstract
-
Cited by 192 (8 self)
- Add to MetaCart
This paper describes a personalized k-anonymity model for protecting location privacy against various privacy threats through location information sharing. Our model has two unique features. First, we provide a unified privacy personalization framework to support location k-anonymity for a wide range of users with context-sensitive personalized privacy requirements. This framework enables each mobile node to specify the minimum level of anonymity it desires as well as the maximum temporal and spatial resolutions it is willing to tolerate when requesting for k-anonymity preserving location-based services (LBSs). Second, we devise an efficient message perturbation engine which runs by the location protection broker on a trusted server and performs location anonymization on mobile users ’ LBS request messages, such as identity removal and spatio-temporal cloaking of location information. We develop a suite of scalable and yet efficient spatio-temporal cloaking algorithms, called CliqueCloak algorithms, to provide high quality personalized location k-anonymity, aiming at avoiding or reducing known location privacy threats before forwarding requests to LBS provider(s). The effectiveness of our CliqueCloak algorithms is studied under various conditions using realistic location data synthetically generated using real road maps and traffic volume data.
On k-anonymity and the curse of dimensionality
- In VLDB
, 2005
"... In recent years, the wide availability of personal data has made the problem of privacy preserving data mining an important one. A number of methods have recently been proposed for privacy preserving data mining of multidimensional data records. One of the methods for privacy preserving data mining ..."
Abstract
-
Cited by 171 (4 self)
- Add to MetaCart
In recent years, the wide availability of personal data has made the problem of privacy preserving data mining an important one. A number of methods have recently been proposed for privacy preserving data mining of multidimensional data records. One of the methods for privacy preserving data mining is that of anonymization, in which a record is released only if it is indistinguishable from k other entities in the data. We note that methods such as k-anonymity are highly dependent upon spatial locality in order to effectively implement the technique in a statistically robust way. In high dimensional space the data becomes sparse, and the concept of spatial locality is no longer easy to define from an application point of view. In this paper, we view the k-anonymization problem from the perspective of inference attacks over all possible combinations of attributes. We show that when the data contains a large number of attributes which may be considered quasi-identifiers, it becomes difficult to anonymize the data without an unacceptably high amount of information loss. This is because an exponential number of combinations of dimensions can be used to make precise inference attacks, even when individual attributes are partially specified within a range. We provide an analysis of the effect of dimensionality on k-anonymity methods. We conclude that when a data set contains a large number of attributes which