Results 1 - 10
of
344
ℓ-diversity: Privacy beyond k-anonymity
- IN ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract
-
Cited by 672 (13 self)
- Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a k-anonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently.
Incognito: efficient full-domain k-anonymity
- In Proc. of SIGMOD
"... A number of organizations publish microdata for purposes such as public health and demographic research. Although attributes that clearly identify individuals, such as Name and Social Security Number, are generally removed, these databases can sometimes be joined with other public databases on attri ..."
Abstract
-
Cited by 304 (5 self)
- Add to MetaCart
(Show Context)
A number of organizations publish microdata for purposes such as public health and demographic research. Although attributes that clearly identify individuals, such as Name and Social Security Number, are generally removed, these databases can sometimes be joined with other public databases on attributes such as Zipcode, Sex, and Birthdate to re-identify individuals who were supposed to remain anony-mous. “Joining ” attacks are made easier by the availability of other, complementary, databases over the Internet. K-anonymization is a technique that prevents joining at-tacks by generalizing and/or suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k. In this paper, we pro-vide a practical framework for implementing one model of k-anonymization, called full-domain generalization. We intro-duce a set of algorithms for producing minimal full-domain generalizations, and show that these algorithms perform up to an order of magnitude faster than previous algorithms on two real-life databases. Besides full-domain generalization, numerous other mod-els have also been proposed for k-anonymization. The sec-ond contribution in this paper is a single taxonomy that categorizes previous models and introduces some promising new alternatives. 1.
Mondrian multidimensional k-anonymity
- in Proc. 22nd ICDE. IEEE
"... K-Anonymity has been proposed as a mechanism for privacy protection in microdata publishing, and numerous recoding “models ” have been considered for achieving k-anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (sing ..."
Abstract
-
Cited by 255 (5 self)
- Add to MetaCart
(Show Context)
K-Anonymity has been proposed as a mechanism for privacy protection in microdata publishing, and numerous recoding “models ” have been considered for achieving k-anonymity. This paper proposes a new multidimensional model, which provides an additional degree of flexibility not seen in previous (single-dimensional) approaches. Often this flexibility leads to higher-quality anonymizations, as measured both by general-purpose metrics, as well as more specific notions of query answerability. In this paper, we prove that optimal multidimensional anonymization is NP-hard (like previous k-anonymity models). However, we introduce a simple, scalable, greedy algorithm that produces anonymizations that are a constantfactor approximation of optimal. Experimental results show that this greedy algorithm frequently leads to more desirable anonymizations than two optimal exhaustive-search algorithms for single-dimensional models. 1.
Privacy-Preserving Data Publishing: A Survey on Recent Developments
"... The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange an ..."
Abstract
-
Cited by 219 (16 self)
- Add to MetaCart
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published, and agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.
On k-anonymity and the curse of dimensionality
- In VLDB
, 2005
"... In recent years, the wide availability of personal data has made the problem of privacy preserving data mining an important one. A number of methods have recently been proposed for privacy preserving data mining of multidimensional data records. One of the methods for privacy preserving data mining ..."
Abstract
-
Cited by 171 (4 self)
- Add to MetaCart
In recent years, the wide availability of personal data has made the problem of privacy preserving data mining an important one. A number of methods have recently been proposed for privacy preserving data mining of multidimensional data records. One of the methods for privacy preserving data mining is that of anonymization, in which a record is released only if it is indistinguishable from k other entities in the data. We note that methods such as k-anonymity are highly dependent upon spatial locality in order to effectively implement the technique in a statistically robust way. In high dimensional space the data becomes sparse, and the concept of spatial locality is no longer easy to define from an application point of view. In this paper, we view the k-anonymization problem from the perspective of inference attacks over all possible combinations of attributes. We show that when the data contains a large number of attributes which may be considered quasi-identifiers, it becomes difficult to anonymize the data without an unacceptably high amount of information loss. This is because an exponential number of combinations of dimensions can be used to make precise inference attacks, even when individual attributes are partially specified within a range. We provide an analysis of the effect of dimensionality on k-anonymity methods. We conclude that when a data set contains a large number of attributes which
Personalized Privacy Preservation
- SIGMOD 2006
, 2006
"... We study generalization for preserving privacy in publication of sensitive data. The existing methods focus on a universal approach that exerts the same amount of preservation for all persons, without catering for their concrete needs. The consequence is that we may be offering insufficient protecti ..."
Abstract
-
Cited by 132 (7 self)
- Add to MetaCart
We study generalization for preserving privacy in publication of sensitive data. The existing methods focus on a universal approach that exerts the same amount of preservation for all persons, without catering for their concrete needs. The consequence is that we may be offering insufficient protection to a subset of people, while applying excessive privacy control to another subset. Motivated by this, we present a new generalization framework based on the concept of personalized anonymity. Our technique performs the minimum generalization for satisfying everybody’s requirements, and thus, retains the largest amount of information from the microdata. We carry out a careful theoretical study that leads to valuable insight into the behavior of alternative solutions. In particular, our analysis mathematically reveals the circumstances where the previous work fails to protect privacy, and establishes the superiority of the proposed solutions. The theoretical findings are verified with extensive experiments.
Towards identity anonymization on graphs
- In Proceedings of ACM SIGMOD
, 2008
"... The proliferation of network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of the nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itsel ..."
Abstract
-
Cited by 123 (6 self)
- Add to MetaCart
(Show Context)
The proliferation of network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of the nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals. To address this issue, we study a specific graph-anonymization problem. We call a graph k-degree anonymous if for every node v, there exist at least k−1 other nodes in the graph with the same degree as v. This definition of anonymity prevents the re-identification of individuals by adversaries with a priori knowledge of the degree of certain nodes. We formally define the graph-anonymization problem that, given a graph G, asks for the k-degree anonymous graph that stems from G with the minimum number of graph-modification operations. We devise simple and efficient algorithms for solving this problem. Our algorithms are based on principles related to the realizability of degree sequences. We apply our methods to a large spectrum of synthetic and real datasets and demonstrate their efficiency and practical utility.
Injecting utility into anonymized datasets
- In SIGMOD
, 2006
"... Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy requirements such as k-anonymity and ℓ-diversity are desig ..."
Abstract
-
Cited by 119 (5 self)
- Add to MetaCart
Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy requirements such as k-anonymity and ℓ-diversity are designed to thwart attacks that attempt to identify individuals in the data and to discover their sensitive information. On the other hand, the utility of such data has not been well-studied. In this paper we will discuss the shortcomings of current heuristic approaches to measuring utility and we will introduce a formal approach to measuring utility. Armed with this utility metric, we will show how to inject additional information into k-anonymous and ℓ-diverse tables. This information has an intuitive semantic meaning, it increases the utility beyond what is possible in the original k-anonymity and ℓ-diversity frameworks, and it maintains the privacy guarantees of k-anonymity and ℓ-diversity. 1.
To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles
- In WWW
, 2009
"... In order to address privacy concerns, many social media websites allow users to hide their personal profiles from the public. In this work, we show how an adversary can exploit an online social network with a mixture of public and private user profiles to predict the private attributes of users. We ..."
Abstract
-
Cited by 109 (3 self)
- Add to MetaCart
(Show Context)
In order to address privacy concerns, many social media websites allow users to hide their personal profiles from the public. In this work, we show how an adversary can exploit an online social network with a mixture of public and private user profiles to predict the private attributes of users. We map this problem to a relational classification problem and we propose practical models that use friendship and group membership information (which is often not hidden) to infer sensitive attributes. The key novel idea is that in addition to friendship links, groups can be carriers of significant information. We show that on several well-known social media sites, we can easily and accurately recover the information of private-profile users. To the best of our knowledge, this is the first work that uses link-based and group-based classification to study privacy implications in social networks with mixed public and private user profiles.
A Peer-to-Peer Spatial Cloaking Algorithm for Anonymous Location-based Services
- In: ACM GIS. (2006
, 2006
"... This paper tackles a major privacy threat in current location-based services where users have to report their exact locations to the database server in order to obtain their desired services. For example, a mobile user asking about her nearest restaurant has to report her exact location. With untrus ..."
Abstract
-
Cited by 105 (10 self)
- Add to MetaCart
(Show Context)
This paper tackles a major privacy threat in current location-based services where users have to report their exact locations to the database server in order to obtain their desired services. For example, a mobile user asking about her nearest restaurant has to report her exact location. With untrusted service providers, reporting private location information may lead to several privacy threats. In this paper, we present a peer-to-peer (P2P) spatial cloaking algorithm in which mobile and stationary users can entertain location-based services without revealing their exact location information. The main idea is that before requesting any location-based service, the mobile user will form a group from her peers via single-hop communication and/or multihop routing. Then, the spatial cloaked area is computed as the region that covers the entire group of peers. Two modes of operations are supported within the proposed P2P spatial cloaking algorithm, namely, the on-demand mode and the proactive mode. Experimental results show that the P2P spatial cloaking algorithm operated in the on-demand mode has lower communication cost and better quality of services than the proactive mode, but the on-demand incurs longer response time.