DMCA
Privacy-Preserving Data Mining (2000)
Cached
Download Links
Citations: | 841 - 3 self |
Citations
6599 | C4.5: Programs for machine learning - Quinlan - 1993 |
5962 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...e willing to give not true values but modified values of certain fields. Given a population that satisfies the above assumptions, we address the concrete problem of building decision-tree classifiers =-=[BFOS84]-=- [Qui93] and show that that it is possible to develop accurate models while re1 specting users' privacy concerns. Classification is one the most used tasks in data mining. Decision-tree classifiers ar... |
967 |
Mathematical Methods of Statistics
- Cramer
- 1999
(Show Context)
Citation Context ...et O(m 2 ) time. Stopping Criterion With omniscience, we would stop when the reconstructed distribution was statistically the same as the original distribution (using, say, thes2 goodness-of-fit test =-=[Cra46]-=-). An alternative is to compare the observed randomized distribution with the result of randomizing the current estimate of the original distribution, and stop when these two distributions are statist... |
754 |
Regularization of inverse problems
- Engl, Hanke, et al.
- 1996
(Show Context)
Citation Context ...obal method and requires knowledge of values in other records. The problem of reconstructing original distribution from a given distribution can be viewed in the general framework of inverse problems =-=[EHN96]-=-. In [FJS97], it was shown that for smooth enough distributions (e.g. slowly varying time signals), it is possible to to fully recover original distribution from non-overlapping, contiguous partial su... |
613 | Cryptography and Data Security
- Denning
- 1982
(Show Context)
Citation Context ...on of data cells of small size (e.g. [Cox80]), and clustering entities into mutually exclusive atomic populations (e.g. [YC77]). The perturbation family includes swapping values between records (e.g. =-=[Den82]-=-), replacing the original database by a sample from the same distribution (e.g. [LST83] [LCL85] [Rei84]), adding noise to the values in the database (e.g. [TYW84] [War65]), adding noise to the results... |
412 | Security-control methods for statistical databases
- Adam, Wortmann
- 1989
(Show Context)
Citation Context ...sire to be able to provide statistical information (sum, count, average, maximum, minimum, pth percentile, etc.) without compromising sensitive information about individuals (see excellent surveys in =-=[AW89]-=- [Sho82].) The proposed techniques can be broadly classified into query restriction and data perturbation. The query restriction family includes restricting the size of query result (e.g. [Fel72] [DDS... |
312 | Sprint: A scalable parallel classifier for data mining
- Shafer, Agrawal, et al.
- 1996
(Show Context)
Citation Context ... 0.5 and-0.5 to each point gives similar results. 4 Decision-Tree Classification over Randomized Data 4.1 Background We begin with a brief review of decision tree classification, adapted from [MAR96] =-=[SAM96]-=-. A decision tree [BFOS84] [Qui93] is a class discriminator that recursively partitions the training set until each partition consists entirely or dominantly of examples from the same class. Each non-... |
259 |
Randomized response: A survey technique for eliminating evasive answer bias
- Warner
- 1965
(Show Context)
Citation Context ... values between records (e.g. [Den82]), replacing the original database by a sample from the same distribution (e.g. [LST83] [LCL85] [Rei84]), adding noise to the values in the database (e.g. [TYW84] =-=[War65]-=-), adding noise to the results of a query (e.g. [Bec80]), and sampling the result of a query (e.g. [Den80]). There are negative results showing that the proposed techniques cannot satisfy the conflict... |
240 | Sliq: A fast scalable classifier for data mining.
- Mehta, Agrawal, et al.
- 1996
(Show Context)
Citation Context ... between 0.5 and-0.5 to each point gives similar results. 4 Decision-Tree Classification over Randomized Data 4.1 Background We begin with a brief review of decision tree classification, adapted from =-=[MAR96]-=- [SAM96]. A decision tree [BFOS84] [Qui93] is a class discriminator that recursively partitions the training set until each partition consists entirely or dominantly of examples from the same class. E... |
199 | Classifi cation and Regression Trees - Breiman, Friedman, et al. - 1983 |
175 | Beyond concern: Understanding net Users attitudes about online privacy.
- J, Ackerman
- 1999
(Show Context)
Citation Context ...eu998] [Off98]. Privacy issues are further exacerbated now that the World Wide Web makes it easy for the new data to be automatically collected and added to databases [HE98] [Wes98a] [Wes98b] [Wes99] =-=[CRA99a]-=- [Cra99b]. The concerns over massive collection of data are naturally extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-obvious informati... |
129 | Privacy-enhancing technologies for the internet.
- Goldberg, Wagner, et al.
- 1997
(Show Context)
Citation Context ...ve in preventing unauthorized access to the system. Other relevant work includes efforts to create tools and standards that provide platform for implementing a system such as ours (e.g. [Wor] [Ben99] =-=[GWB97]-=- [Cra99b] [AC99] [LM99] [LEW99]). Paper Organization We discuss privacy-preserving methods in Section 2. We also introduce a quantitative measure to evaluate the amount of privacy offered by a method ... |
102 |
Suppression Methodology and Statistical Disclosure Control‖.
- Cox
- 1980
(Show Context)
Citation Context ...lap amongst successive queries (e.g. [DJL79]), keeping audit trail of all answered queries and constantly checking for possible compromise (e.g. [CO82]), suppression of data cells of small size (e.g. =-=[Cox80]-=-), and clustering entities into mutually exclusive atomic populations (e.g. [YC77]). The perturbation family includes swapping values between records (e.g. [Den82]), replacing the original database by... |
100 | Secure databases: Protection against user influence
- Dobkin, Jones, et al.
- 1979
(Show Context)
Citation Context ...to query restriction and data perturbation. The query restriction family includes restricting the size of query result (e.g. [Fel72] [DDS79]), controlling the overlap amongst successive queries (e.g. =-=[DJL79]-=-), keeping audit trail of all answered queries and constantly checking for possible compromise (e.g. [CO82]), suppression of data cells of small size (e.g. [Cox80]), and clustering entities into mutua... |
95 | Security and privacy implications of data mining
- Clifton, Marks
- 1996
(Show Context)
Citation Context ...turally extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-obvious information from large databases, is particularly vulnerable to misuse =-=[CM96]-=- [The98] [Off98] [ECB99]. A fruitful direction for future research in data mining will be the development of techniques that incorporate privacy concerns [Agr99]. Specifically, we address the followin... |
82 | Privacy critics: UI components to safeguard users’ privacy.
- Ackerman, Cranor
- 1999
(Show Context)
Citation Context ...unauthorized access to the system. Other relevant work includes efforts to create tools and standards that provide platform for implementing a system such as ours (e.g. [Wor] [Ben99] [GWB97] [Cra99b] =-=[AC99]-=- [LM99] [LEW99]). Paper Organization We discuss privacy-preserving methods in Section 2. We also introduce a quantitative measure to evaluate the amount of privacy offered by a method and evaluate the... |
79 |
On the question of statistical confidentiality.
- Fellegi
- 1972
(Show Context)
Citation Context ...ys in [AW89] [Sho82].) The proposed techniques can be broadly classified into query restriction and data perturbation. The query restriction family includes restricting the size of query result (e.g. =-=[Fel72]-=- [DDS79]), controlling the overlap amongst successive queries (e.g. [DJL79]), keeping audit trail of all answered queries and constantly checking for possible compromise (e.g. [CO82]), suppression of ... |
77 | Secure statistical databases with random sample queries
- Denning
- 1980
(Show Context)
Citation Context ...tion (e.g. [LST83] [LCL85] [Rei84]), adding noise to the values in the database (e.g. [TYW84] [War65]), adding noise to the results of a query (e.g. [Bec80]), and sampling the result of a query (e.g. =-=[Den80]-=-). There are negative results showing that the proposed techniques cannot satisfy the conflicting objectives of providing high quality statistics and at the same time prevent exact or partial disclosu... |
76 |
TRUSTe: An Online Privacy Seal Program,
- Benassi
- 1999
(Show Context)
Citation Context ... effective in preventing unauthorized access to the system. Other relevant work includes efforts to create tools and standards that provide platform for implementing a system such as ours (e.g. [Wor] =-=[Ben99]-=- [GWB97] [Cra99b] [AC99] [LM99] [LEW99]). Paper Organization We discuss privacy-preserving methods in Section 2. We also introduce a quantitative measure to evaluate the amount of privacy offered by a... |
72 |
Statistical databases: characteristics, problems, and some solutions.
- Shoshani
- 1982
(Show Context)
Citation Context ... be able to provide statistical information (sum, count, average, maximum, minimum, pth percentile, etc.) without compromising sensitive information about individuals (see excellent surveys in [AW89] =-=[Sho82]-=-.) The proposed techniques can be broadly classified into query restriction and data perturbation. The query restriction family includes restricting the size of query result (e.g. [Fel72] [DDS79]), co... |
70 | A data distortion by probability distribution.
- Liew, Choi, et al.
- 1985
(Show Context)
Citation Context ...atomic populations (e.g. [YC77]). The perturbation family includes swapping values between records (e.g. [Den82]), replacing the original database by a sample from the same distribution (e.g. [LST83] =-=[LCL85]-=- [Rei84]), adding noise to the values in the database (e.g. [TYW84] [War65]), adding noise to the results of a query (e.g. [Bec80]), and sampling the result of a query (e.g. [Den80]). There are negati... |
70 |
The statistical security of a statistical database
- Traub, Yemini, et al.
- 1984
(Show Context)
Citation Context ...swapping values between records (e.g. [Den82]), replacing the original database by a sample from the same distribution (e.g. [LST83] [LCL85] [Rei84]), adding noise to the values in the database (e.g. =-=[TYW84]-=- [War65]), adding noise to the results of a query (e.g. [Bec80]), and sampling the result of a query (e.g. [Den80]). There are negative results showing that the proposed techniques cannot satisfy the ... |
65 | The tracker: A threat to statistical database security.
- Denning, Denning, et al.
- 1979
(Show Context)
Citation Context ...W89] [Sho82].) The proposed techniques can be broadly classified into query restriction and data perturbation. The query restriction family includes restricting the size of query result (e.g. [Fel72] =-=[DDS79]-=-), controlling the overlap amongst successive queries (e.g. [DJL79]), keeping audit trail of all answered queries and constantly checking for possible compromise (e.g. [CO82]), suppression of data cel... |
65 |
Practical data-swapping: The first steps
- Reiss
- 1984
(Show Context)
Citation Context ...opulations (e.g. [YC77]). The perturbation family includes swapping values between records (e.g. [Den82]), replacing the original database by a sample from the same distribution (e.g. [LST83] [LCL85] =-=[Rei84]-=-), adding noise to the values in the database (e.g. [TYW84] [War65]), adding noise to the results of a query (e.g. [Bec80]), and sampling the result of a query (e.g. [Den80]). There are negative resul... |
63 |
Probability theory and mathematical statistics.
- Fisz
- 1963
(Show Context)
Citation Context ...riable has a uniform distribution, between [\Gammaff; + ff]. The mean of the random variable is 0. ffl Gaussian: The random variable has a normal distribution, with means= 0 and standard deviation oe =-=[Fis63]-=-. We fix the perturbation of an entity. Thus, it is not possible for snoopers to improve the estimates of the value of a field in a record by repeating queries [AW89]. 2.1 Quantifying Privacy For quan... |
56 | Privacy Interfaces For Information Management
- Lau, Etzioni, et al.
- 1999
(Show Context)
Citation Context ...ccess to the system. Other relevant work includes efforts to create tools and standards that provide platform for implementing a system such as ours (e.g. [Wor] [Ben99] [GWB97] [Cra99b] [AC99] [LM99] =-=[LEW99]-=-). Paper Organization We discuss privacy-preserving methods in Section 2. We also introduce a quantitative measure to evaluate the amount of privacy offered by a method and evaluate the proposed metho... |
54 |
Internet Security: FIREWALLS and BEYOND".
- Oppliger
- 1997
(Show Context)
Citation Context ...d not have to cope with information that has been intentionally distorted. Closely related, but orthogonal to our work, is the extensive literature on access control and security (e.g. [Din78] [ST90] =-=[Opp97]-=- [RG98]). Whenever sensitive information is exchanged, it must be transmitted over a secure channel and stored securely. For the purposes of this paper, we assume that appropriate access controls and ... |
49 |
A security mechanism for statistical database
- Beck
- 1980
(Show Context)
Citation Context ...riginal database by a sample from the same distribution (e.g. [LST83] [LCL85] [Rei84]), adding noise to the values in the database (e.g. [TYW84] [War65]), adding noise to the results of a query (e.g. =-=[Bec80]-=-), and sampling the result of a query (e.g. [Den80]). There are negative results showing that the proposed techniques cannot satisfy the conflicting objectives of providing high quality statistics and... |
40 | Recovering Information from Summary Data,”
- Faloutsos, Jagadish, et al.
- 1997
(Show Context)
Citation Context ...and requires knowledge of values in other records. The problem of reconstructing original distribution from a given distribution can be viewed in the general framework of inverse problems [EHN96]. In =-=[FJS97]-=-, it was shown that for smooth enough distributions (e.g. slowly varying time signals), it is possible to to fully recover original distribution from non-overlapping, contiguous partial sums. Such par... |
38 |
Design of LDV - a Multilevel Secure Relational Database Management System,"
- P, Thuraisingham
- 1990
(Show Context)
Citation Context ...orks did not have to cope with information that has been intentionally distorted. Closely related, but orthogonal to our work, is the extensive literature on access control and security (e.g. [Din78] =-=[ST90]-=- [Opp97] [RG98]). Whenever sensitive information is exchanged, it must be transmitted over a secure channel and stored securely. For the purposes of this paper, we assume that appropriate access contr... |
33 |
L.: Data Swapping: Balancing Privacy Against Precision in Mining for Logic Rules
- Estivill-Castro, Brankovic
- 1999
(Show Context)
Citation Context ...alytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-obvious information from large databases, is particularly vulnerable to misuse [CM96] [The98] [Off98] =-=[ECB99]-=-. A fruitful direction for future research in data mining will be the development of techniques that incorporate privacy concerns [Agr99]. Specifically, we address the following question. Since the pr... |
32 |
E-commerce and privacy: what net users want
- Westin
- 1998
(Show Context)
Citation Context ...globally [Tim97] [Eco99] [eu998] [Off98]. Privacy issues are further exacerbated now that the World Wide Web makes it easy for the new data to be automatically collected and added to databases [HE98] =-=[Wes98a]-=- [Wes98b] [Wes99] [CRA99a] [Cra99b]. The concerns over massive collection of data are naturally extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valua... |
23 | An Analytic Approach to Statistical Databases
- Lefons, Silvestri, et al.
- 1983
(Show Context)
Citation Context ...clusive atomic populations (e.g. [YC77]). The perturbation family includes swapping values between records (e.g. [Den82]), replacing the original database by a sample from the same distribution (e.g. =-=[LST83]-=- [LCL85] [Rei84]), adding noise to the values in the database (e.g. [TYW84] [War65]), adding noise to the results of a query (e.g. [Bec80]), and sampling the result of a query (e.g. [Den80]). There ar... |
23 | Sliq: A fast scalable classi er for data mining. - Mehta, Agrawal, et al. - 1996 |
21 | SPRINT: A scalable parallel classi er for data mining. - Shafer, Agrawal, et al. - 1996 |
17 |
Freebies and privacy: What the net users think.
- Westin
- 1999
(Show Context)
Citation Context ...Eco99] [eu998] [Off98]. Privacy issues are further exacerbated now that the World Wide Web makes it easy for the new data to be automatically collected and added to databases [HE98] [Wes98a] [Wes98b] =-=[Wes99]-=- [CRA99a] [Cra99b]. The concerns over massive collection of data are naturally extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-obvious ... |
13 |
The Death of Privacy
- Time
- 1997
(Show Context)
Citation Context ... ultra large databases that record unprecedented amount of transactional information. In tandem with this dramatic increase in digital data, concerns about informational privacy have emerged globally =-=[Tim97]-=- [Eco99] [eu998] [Off98]. Privacy issues are further exacerbated now that the World Wide Web makes it easy for the new data to be automatically collected and added to databases [HE98] [Wes98a] [Wes98b... |
12 |
Selective partial access to a database,"
- Conway, Strip
- 1976
(Show Context)
Citation Context ...ccuracy the original distributions of the values of the confidential attributes. We adopt from the statistics literature two methods that a person may use in our system to modify the value of a field =-=[CS76]-=-: ffl Value-Class Membership. Partition the values into a set of disjoint, mutually-exhaustive classes and return the class into which the true value x i falls. ffl Value Distortion. Return a value x ... |
9 |
Auditing and infrence control in statistical databases
- Chin, Ozsoyoglu
- 1982
(Show Context)
Citation Context ...ery result (e.g. [Fel72] [DDS79]), controlling the overlap amongst successive queries (e.g. [DJL79]), keeping audit trail of all answered queries and constantly checking for possible compromise (e.g. =-=[CO82]-=-), suppression of data cells of small size (e.g. [Cox80]), and clustering entities into mutually exclusive atomic populations (e.g. [YC77]). The perturbation family includes swapping values between re... |
9 |
Privacy concerns & consumer choice
- Westin
- 1998
(Show Context)
Citation Context ...[Tim97] [Eco99] [eu998] [Off98]. Privacy issues are further exacerbated now that the World Wide Web makes it easy for the new data to be automatically collected and added to databases [HE98] [Wes98a] =-=[Wes98b]-=- [Wes99] [CRA99a] [Cra99b]. The concerns over massive collection of data are naturally extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-... |
9 |
A study on the protection of statistical databases
- Yu, Chin
- 1977
(Show Context)
Citation Context ...queries and constantly checking for possible compromise (e.g. [CO82]), suppression of data cells of small size (e.g. [Cox80]), and clustering entities into mutually exclusive atomic populations (e.g. =-=[YC77]-=-). The perturbation family includes swapping values between records (e.g. [Den82]), replacing the original database by a sample from the same distribution (e.g. [LST83] [LCL85] [Rei84]), adding noise ... |
8 |
Privacy in the marketplace
- Hine
- 1998
(Show Context)
Citation Context ...merged globally [Tim97] [Eco99] [eu998] [Off98]. Privacy issues are further exacerbated now that the World Wide Web makes it easy for the new data to be automatically collected and added to databases =-=[HE98]-=- [Wes98a] [Wes98b] [Wes99] [CRA99a] [Cra99b]. The concerns over massive collection of data are naturally extending to analytic tools applied to data. Data mining, with its promise to efficiently disco... |
6 | Arun Swami, \An Interval Classi er for Database Mining Applications", VLDB-92 - Agrawal, Ghosh, et al. - 1992 |
4 |
Data Mining: Crossing the Chasm
- Agrawal
- 1999
(Show Context)
Citation Context ...s, is particularly vulnerable to misuse [CM96] [The98] [Off98] [ECB99]. A fruitful direction for future research in data mining will be the development of techniques that incorporate privacy concerns =-=[Agr99]-=-. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise in... |
4 |
Data mining and privacy: A conflict in making. DS
- Thearling
- 1998
(Show Context)
Citation Context ... extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-obvious information from large databases, is particularly vulnerable to misuse [CM96] =-=[The98]-=- [Off98] [ECB99]. A fruitful direction for future research in data mining will be the development of techniques that incorporate privacy concerns [Agr99]. Specifically, we address the following questi... |
3 |
A survey of the world wide web security
- Rubin, Greer
- 1998
(Show Context)
Citation Context ...ve to cope with information that has been intentionally distorted. Closely related, but orthogonal to our work, is the extensive literature on access control and security (e.g. [Din78] [ST90] [Opp97] =-=[RG98]-=-). Whenever sensitive information is exchanged, it must be transmitted over a secure channel and stored securely. For the purposes of this paper, we assume that appropriate access controls and securit... |
2 |
Quasi cubes: Exploiting approximations in multidimensional databases. SIGMOD Record
- Barbara, Sullivan
- 1997
(Show Context)
Citation Context ... literature on estimating attribute distributions from partial information [BDF + 97]. In the OLAP literature, there is work on approximating queries on sub-cubes from higher-level aggregations (e.g. =-=[BS97]-=-). However, these works did not have to cope with information that has been intentionally distorted. Closely related, but orthogonal to our work, is the extensive literature on access control and secu... |
2 | Design of LDV-A Multilevel Secure Relational Database Management System - Stachour, Thuraisingham - 1990 |
2 | Probability 2~heory and Mathematical Statistics. - Fizz - 1963 |
1 |
Computers and Security
- Dinardo
- 1978
(Show Context)
Citation Context ... these works did not have to cope with information that has been intentionally distorted. Closely related, but orthogonal to our work, is the extensive literature on access control and security (e.g. =-=[Din78]-=- [ST90] [Opp97] [RG98]). Whenever sensitive information is exchanged, it must be transmitted over a secure channel and stored securely. For the purposes of this paper, we assume that appropriate acces... |
1 |
Method and system for client/server communications with user information revealed as a function of willingness to reveal and whether the information is required
- Lotspiech, Morris
- 1999
(Show Context)
Citation Context ...rized access to the system. Other relevant work includes efforts to create tools and standards that provide platform for implementing a system such as ours (e.g. [Wor] [Ben99] [GWB97] [Cra99b] [AC99] =-=[LM99]-=- [LEW99]). Paper Organization We discuss privacy-preserving methods in Section 2. We also introduce a quantitative measure to evaluate the amount of privacy offered by a method and evaluate the propos... |
1 | On the question of statistical con dentiality - Fellegi - 1972 |
1 | Practical data-swapping: The rst steps - Reiss - 1984 |
1 | Data mining and privacy: A con ict in making. DS - Thearling - 1998 |
1 | Data swapping: Balancing privacy against precision in mining for logic rules - TODS - 1979 |
1 | The European Union's Directive on Privacy Protection - Fellegi - 1998 |
1 | Recovering information from summary data - Sidiropoulos - 1997 |
1 | The Information Society - Worth - 1998 |
1 | Privacy concerns 8z consumer choice - Westin - 1998 |