#### DMCA

## k-anonymity: a model for protecting privacy. (2002)

Venue: | International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, |

Citations: | 1313 - 15 self |

### Citations

615 | D.E.: Cryptography and data security.
- Denning
- 1982
(Show Context)
Citation Context ...t is the subject of this paper is not so much whether the recipient can get access or not to the information as much as what values will constitute the information the recipient will receive. A general doctrine of the work presented herein is to release all the information but to do so such that the identities of the people who are the subjects of the data (or other sensitive properties found in the data) are protected. Therefore, the goal of the work presented in this paper lies outside of traditional work on access control and authentication. 2.4. Multiple queries can leak inference Denning [17] and others [18, 19] were among the first to explore inferences realized from multiple queries to a database. For example, consider a table containing only (physician, patient, medication). A query listing the patients seen by each physician, i.e., a relation R(physician, patient), may not be sensitive. Likewise, a query itemizing medications prescribed by each physician may also not be sensitive. But the query associating patients with their prescribed medications may be sensitive because medications typically correlate with diseases. One common solution, called query restriction, prohibits q... |

226 |
Principles of Database and Knowledge – Base Systems.
- Ullman
- 1989
(Show Context)
Citation Context ... column is called an attribute and denotes a field or semantic category of information that is a set of possible values; therefore, an attribute is also a domain. Attributes within a table are unique. So by observing a table, each row is an ordered n-tuple of values <d1, d2, …, dn> such that each value dj is in the domain of the j-th column, for j=1, 2, …, n where n is the number of columns. In mathematical set theory, a relation corresponds with this tabular presentation, the only difference is the absence of column names. Ullman provides a detailed discussion of relational database concepts [23]. Definition 1. Attributes Let B(A1,…,An) be a table with a finite number of tuples. The finite set of attributes of B are {A1,…,An}. Given a table B(A1,…,An), {Ai,…,Aj} ⊆ {A1,…,An}, and a tuple t∈B, I use t[Ai,…,Aj] to denote the sequence of the values, vi,…,vj, of Ai,…,Aj in t. I use B[Ai,…,Aj] to denote the projection, maintaining duplicate tuples, of attributes Ai,…Aj in B. Throughout the remainder of this work each tuple is assumed to be specific to one person and no two tuples pertain to the same person. This assumption simplifies discussion without loss of applicability. To draw an infe... |

91 |
Guaranteeing anonymity when sharing medical data, the Datafly system.
- Sweeney
- 1997
(Show Context)
Citation Context ...his section is to provide a formal framework for constructing and evaluating algorithms and systems that release information such that the released information limits what can be revealed about properties of the entities that are to be protected. For convenience, I focus on person-specific data, so the entities are people, and the property to be protected is the identity of the subjects whose information is contained in the data. However, other properties could also be protected. The formal methods provided in this paper include the k-anonymity protection model. The real-world systems Datafly [20], µ-Argus [21] and kSimilar [22] motivate this approach. Unless otherwise stated, the term data refers to person-specific information that is conceptually organized as a table of rows (or records) and columns (or fields). Each row is termed a tuple. A tuple contains a relationship among the set of values associated with a person. Tuples within a table are not necessarily unique. Each column is called an attribute and denotes a field or semantic category of information that is a set of possible values; therefore, an attribute is also a domain. Attributes within a table are unique. So by observi... |

87 |
A Method For Limiting Disclosure in Microdata Based Random Noise and Transformation,”
- Kim
- 1986
(Show Context)
Citation Context ... the world have traditionally been entrusted with the release of statistical information about all aspects of the populace [5]. But like other data holders, statistics offices are also facing tremendous demand for person-specific data for applications such as data mining, L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002; 557-570. Page 4 cost analysis, fraud detection and retrospective research. But many of the established statistical database techniques, which involve various ways of adding noise [6] to the data while still maintaining some statistical invariant [7, 8], often destroy the integrity of records, or tuples, and so, for many new uses of data, these established techniques are not appropriate. Willenborg and De Waal [9] provide more extensive coverage of traditional statistical techniques. 2.2. Multi-level databases Another related area is aggregation and inference in multi-level databases [10, 11, 12, 13, 14, 15] which concerns restricting the release of lower classified information such that higher classified information cannot be derived. Denning and Lunt [16] described a mul... |

79 |
On the question of statistical confidentiality.
- Fellegi
- 1972
(Show Context)
Citation Context ...asing a version of privately held data so that the individuals who are the subjects of the data cannot be identified is not a new problem. There are existing works in the statistics community on statistical databases and in the computer security community on multi-level databases to consider. However, none of these works provide solutions to the broader problems experienced in today’s data rich setting. 2.1. Statistical databases Federal and state statistics offices around the world have traditionally been entrusted with the release of statistical information about all aspects of the populace [5]. But like other data holders, statistics offices are also facing tremendous demand for person-specific data for applications such as data mining, L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002; 557-570. Page 4 cost analysis, fraud detection and retrospective research. But many of the established statistical database techniques, which involve various ways of adding noise [6] to the data while still maintaining some statistical invariant [7, 8], often destroy the integrity of records, or tuples, ... |

70 |
Finding A Needle in a Haystack - or Identifying Anonymous Census Record‖.
- Dalenius
- 1986
(Show Context)
Citation Context ...ability to link released information to other external collections. So the properties to be controlled are operationally realized as attributes in the privately held collection. The data holder is expected to identify all attributes in the private information that could be used for linking with external information. Such attributes not only include explicit identifiers such as name, address, and phone number, but also include attributes that in combination can uniquely identify individuals such as birth date and gender. The set of such attributes has been termed a quasi-identifier by Dalenius [24]. So operationally, a goal of this work is to release person-specific data such that the ability to link to other information using the quasi-identifier is limited. Definition 2. Quasi-identifier Given a population of entities U, an entity-specific table T(A1,…,An), fc: U → T and fg: T→ U', where U ⊆ U'. A quasi-identifier of T, written QT, is a set of attributes {Ai,…,Aj} ⊆ {A1,…,An} where: ∃pi∈U such that fg(fc(pi)[QT]) = pi. Example 2. Quasi-identifier Let V be the voter-specific table described earlier in Figure 1 as the voter list. A quasi-identifier for V, written QV, is {name, address, ... |

65 | The tracker: A threat to statistical database security.
- Denning, Denning, et al.
- 1979
(Show Context)
Citation Context ... of this paper is not so much whether the recipient can get access or not to the information as much as what values will constitute the information the recipient will receive. A general doctrine of the work presented herein is to release all the information but to do so such that the identities of the people who are the subjects of the data (or other sensitive properties found in the data) are protected. Therefore, the goal of the work presented in this paper lies outside of traditional work on access control and authentication. 2.4. Multiple queries can leak inference Denning [17] and others [18, 19] were among the first to explore inferences realized from multiple queries to a database. For example, consider a table containing only (physician, patient, medication). A query listing the patients seen by each physician, i.e., a relation R(physician, patient), may not be sensitive. Likewise, a query itemizing medications prescribed by each physician may also not be sensitive. But the query associating patients with their prescribed medications may be sensitive because medications typically correlate with diseases. One common solution, called query restriction, prohibits queries that can reve... |

56 |
A multilevel relational data model,
- Lunt, Schell, et al.
- 1987
(Show Context)
Citation Context ... of adding noise [6] to the data while still maintaining some statistical invariant [7, 8], often destroy the integrity of records, or tuples, and so, for many new uses of data, these established techniques are not appropriate. Willenborg and De Waal [9] provide more extensive coverage of traditional statistical techniques. 2.2. Multi-level databases Another related area is aggregation and inference in multi-level databases [10, 11, 12, 13, 14, 15] which concerns restricting the release of lower classified information such that higher classified information cannot be derived. Denning and Lunt [16] described a multilevel relational database system (MDB) as having data stored at different security classifications and users having different security clearances. Su and Ozsoyoglu formally investigated inference in MDB. They showed that eliminating precise inference compromise due to functional dependencies and multi-valued dependencies is NP-complete. By extension to this work, the precise elimination of all inferences with respect to the identities of the individuals whose information is included in person-specific data is typically impossible to guarantee. Intuitively this makes sense bec... |

42 |
Security and Inference in multilevel database and knowledge based systems.
- Morgenstern
- 1987
(Show Context)
Citation Context ...5), 2002; 557-570. Page 4 cost analysis, fraud detection and retrospective research. But many of the established statistical database techniques, which involve various ways of adding noise [6] to the data while still maintaining some statistical invariant [7, 8], often destroy the integrity of records, or tuples, and so, for many new uses of data, these established techniques are not appropriate. Willenborg and De Waal [9] provide more extensive coverage of traditional statistical techniques. 2.2. Multi-level databases Another related area is aggregation and inference in multi-level databases [10, 11, 12, 13, 14, 15] which concerns restricting the release of lower classified information such that higher classified information cannot be derived. Denning and Lunt [16] described a multilevel relational database system (MDB) as having data stored at different security classifications and users having different security clearances. Su and Ozsoyoglu formally investigated inference in MDB. They showed that eliminating precise inference compromise due to functional dependencies and multi-valued dependencies is NP-complete. By extension to this work, the precise elimination of all inferences with respect to the id... |

39 |
Statistical disclosure control in practice.
- Willenborg, Waal
- 1996
(Show Context)
Citation Context ... for applications such as data mining, L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002; 557-570. Page 4 cost analysis, fraud detection and retrospective research. But many of the established statistical database techniques, which involve various ways of adding noise [6] to the data while still maintaining some statistical invariant [7, 8], often destroy the integrity of records, or tuples, and so, for many new uses of data, these established techniques are not appropriate. Willenborg and De Waal [9] provide more extensive coverage of traditional statistical techniques. 2.2. Multi-level databases Another related area is aggregation and inference in multi-level databases [10, 11, 12, 13, 14, 15] which concerns restricting the release of lower classified information such that higher classified information cannot be derived. Denning and Lunt [16] described a multilevel relational database system (MDB) as having data stored at different security classifications and users having different security clearances. Su and Ozsoyoglu formally investigated inference in MDB. They showed that eliminating... |

35 |
Detection and elimination of inference channels in multilevel relational database systems.
- Qian, Stickel, et al.
- 1993
(Show Context)
Citation Context ...5), 2002; 557-570. Page 4 cost analysis, fraud detection and retrospective research. But many of the established statistical database techniques, which involve various ways of adding noise [6] to the data while still maintaining some statistical invariant [7, 8], often destroy the integrity of records, or tuples, and so, for many new uses of data, these established techniques are not appropriate. Willenborg and De Waal [9] provide more extensive coverage of traditional statistical techniques. 2.2. Multi-level databases Another related area is aggregation and inference in multi-level databases [10, 11, 12, 13, 14, 15] which concerns restricting the release of lower classified information such that higher classified information cannot be derived. Denning and Lunt [16] described a multilevel relational database system (MDB) as having data stored at different security classifications and users having different security clearances. Su and Ozsoyoglu formally investigated inference in MDB. They showed that eliminating precise inference compromise due to functional dependencies and multi-valued dependencies is NP-complete. By extension to this work, the precise elimination of all inferences with respect to the id... |

31 |
µ- and τ -argus: Software for statistical disclosure control.
- Hundepool, Willenborg
- 1996
(Show Context)
Citation Context ... to provide a formal framework for constructing and evaluating algorithms and systems that release information such that the released information limits what can be revealed about properties of the entities that are to be protected. For convenience, I focus on person-specific data, so the entities are people, and the property to be protected is the identity of the subjects whose information is contained in the data. However, other properties could also be protected. The formal methods provided in this paper include the k-anonymity protection model. The real-world systems Datafly [20], µ-Argus [21] and kSimilar [22] motivate this approach. Unless otherwise stated, the term data refers to person-specific information that is conceptually organized as a table of rows (or records) and columns (or fields). Each row is termed a tuple. A tuple contains a relationship among the set of values associated with a person. Tuples within a table are not necessarily unique. Each column is called an attribute and denotes a field or semantic category of information that is a set of possible values; therefore, an attribute is also a domain. Attributes within a table are unique. So by observing a table, ea... |

30 | Aggregation and inference: Facts and fallacies.
- Lunt
- 1989
(Show Context)
Citation Context ...5), 2002; 557-570. Page 4 cost analysis, fraud detection and retrospective research. But many of the established statistical database techniques, which involve various ways of adding noise [6] to the data while still maintaining some statistical invariant [7, 8], often destroy the integrity of records, or tuples, and so, for many new uses of data, these established techniques are not appropriate. Willenborg and De Waal [9] provide more extensive coverage of traditional statistical techniques. 2.2. Multi-level databases Another related area is aggregation and inference in multi-level databases [10, 11, 12, 13, 14, 15] which concerns restricting the release of lower classified information such that higher classified information cannot be derived. Denning and Lunt [16] described a multilevel relational database system (MDB) as having data stored at different security classifications and users having different security clearances. Su and Ozsoyoglu formally investigated inference in MDB. They showed that eliminating precise inference compromise due to functional dependencies and multi-valued dependencies is NP-complete. By extension to this work, the precise elimination of all inferences with respect to the id... |

22 |
Uniqueness of Simple Demographics in the U.S. Population, LIDAPWP4.
- Sweeney
- 2000
(Show Context)
Citation Context ... all explicit identifiers, such as name, address and telephone number, removed on the assumption that anonymity is maintained because the resulting data look anonymous. However, in most of these cases, the remaining data can be used to re-identify individuals by linking or matching the data to other data or by looking at unique characteristics found in the released data. In an earlier work, experiments using 1990 U.S. Census summary data were conducted to determine how many individuals within geographically situated populations had combinations of demographic values that occurred infrequently [1]. Combinations of few characteristics often combine in populations to uniquely or nearly uniquely identify some individuals. For example, a finding in that study was that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP2, gender, date of birth}. Clearly, data released containing such information about these individuals should not be considered anonymous. Yet, health and other person-specific data are often publicly available in this form. Below is a demonstration of how such data can be ... |

16 |
Microdata disclosure limitation in statistical databases: query size and random sample query control.
- Duncan, Mukherjee
(Show Context)
Citation Context ... of this paper is not so much whether the recipient can get access or not to the information as much as what values will constitute the information the recipient will receive. A general doctrine of the work presented herein is to release all the information but to do so such that the identities of the people who are the subjects of the data (or other sensitive properties found in the data) are protected. Therefore, the goal of the work presented in this paper lies outside of traditional work on access control and authentication. 2.4. Multiple queries can leak inference Denning [17] and others [18, 19] were among the first to explore inferences realized from multiple queries to a database. For example, consider a table containing only (physician, patient, medication). A query listing the patients seen by each physician, i.e., a relation R(physician, patient), may not be sensitive. Likewise, a query itemizing medications prescribed by each physician may also not be sensitive. But the query associating patients with their prescribed medications may be sensitive because medications typically correlate with diseases. One common solution, called query restriction, prohibits queries that can reve... |

9 |
Regression methodology based disclosure of a statistical database
- Palley, Siminoff
- 1986
(Show Context)
Citation Context ...atistical information about all aspects of the populace [5]. But like other data holders, statistics offices are also facing tremendous demand for person-specific data for applications such as data mining, L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002; 557-570. Page 4 cost analysis, fraud detection and retrospective research. But many of the established statistical database techniques, which involve various ways of adding noise [6] to the data while still maintaining some statistical invariant [7, 8], often destroy the integrity of records, or tuples, and so, for many new uses of data, these established techniques are not appropriate. Willenborg and De Waal [9] provide more extensive coverage of traditional statistical techniques. 2.2. Multi-level databases Another related area is aggregation and inference in multi-level databases [10, 11, 12, 13, 14, 15] which concerns restricting the release of lower classified information such that higher classified information cannot be derived. Denning and Lunt [16] described a multilevel relational database system (MDB) as having data stored at diff... |

5 |
Towards the optimal suppression of details when disclosing medical data, the use of sub-combination analysis.
- Sweeney
(Show Context)
Citation Context ...al framework for constructing and evaluating algorithms and systems that release information such that the released information limits what can be revealed about properties of the entities that are to be protected. For convenience, I focus on person-specific data, so the entities are people, and the property to be protected is the identity of the subjects whose information is contained in the data. However, other properties could also be protected. The formal methods provided in this paper include the k-anonymity protection model. The real-world systems Datafly [20], µ-Argus [21] and kSimilar [22] motivate this approach. Unless otherwise stated, the term data refers to person-specific information that is conceptually organized as a table of rows (or records) and columns (or fields). Each row is termed a tuple. A tuple contains a relationship among the set of values associated with a person. Tuples within a table are not necessarily unique. Each column is called an attribute and denotes a field or semantic category of information that is a set of possible values; therefore, an attribute is also a domain. Attributes within a table are unique. So by observing a table, each row is an order... |

2 |
Computational Data Privacy Protection, LIDAP-WP5.
- Sweeney
- 2000
(Show Context)
Citation Context ...he notion of a quasi-identifier to provide more flexibility and granularity. Both the Datafly and µ-Argus systems weight the attributes of the quasi-identifier. For simplicity in this work, however, I consider a single quasi-identifier based on attributes, without weights, appearing together in an external table or in a possible join of external tables. 3.1. The k-anonymity protection model In an earlier work, I introduced basic protection models termed null-map, k-map and wrong-map which provide protection by ensuring that released information map to no, k or incorrect entities, respectively [25]. To determine how many individuals each released tuple actually matches requires combining the released data with externally available data and analyzing other possible attacks. Making such a determination directly can be an extremely difficult task for the data holder who releases information. Although I can assume the data holder knows which data in PT also appear externally, and therefore what constitutes a quasi-identifier, the specific values contained in external data cannot be assumed. I therefore seek to protect the information in this work by satisfying a slightly different constrain... |