Results 1 - 10
of
123
Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics
, 1996
"... For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has ..."
Abstract
-
Cited by 337 (12 self)
- Add to MetaCart
For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has run for M steps, with M sufficiently large, the distribution governing the state of the chain approximates the desired distribution. Unfortunately it can be difficult to determine how large M needs to be. We describe a simple variant of this method that determines on its own when to stop, and that outputs samples in exact accordance with the desired distribution. The method uses couplings, which have also played a role in other sampling schemes; however, rather than running the coupled chains from the present into the future, one runs from a distant point in the past up until the present, where the distance into the past that one needs to go is determined during the running of the al...
ℓ-diversity: Privacy beyond k-anonymity
- In ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract
-
Cited by 294 (8 self)
- Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a k-anonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently. 1.
SOLVING SYSTEMS OF POLYNOMIAL EQUATIONS
, 2002
"... These are the lecture notes for ten lectures to be given at the CBMS ..."
Abstract
-
Cited by 122 (10 self)
- Add to MetaCart
These are the lecture notes for ten lectures to be given at the CBMS
Toward privacy in public databases
- In TCC
, 2005
"... Abstract. We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respon ..."
Abstract
-
Cited by 66 (11 self)
- Add to MetaCart
Abstract. We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respondents and utility of the sanitized data. Unlike in the study of secure function evaluation, in which privacy is preserved to the extent possible given a specific functionality goal, in the census problem privacy is paramount; intuitively, things that cannot be learned “safely ” should not be learned at all. An important contribution of this work is a definition of privacy (and privacy compromise) for statistical databases, together with a method for describing and comparing the privacy offered by specific sanitization techniques. We obtain several privacy results using two different sanitization techniques, and then show how to combine them via cross training. We also obtain two utility results involving clustering. 1
Fastest Mixing Markov Chain on A Graph
- SIAM REVIEW
, 2003
"... We consider a symmetric random walk on a connected graph, where each edge is labeled with the probability of transition between the two adjacent vertices. The associated Markov chain has a uniform equilibrium distribution; the rate of convergence to this distribution, i.e. the mixing rate of the Mar ..."
Abstract
-
Cited by 66 (13 self)
- Add to MetaCart
We consider a symmetric random walk on a connected graph, where each edge is labeled with the probability of transition between the two adjacent vertices. The associated Markov chain has a uniform equilibrium distribution; the rate of convergence to this distribution, i.e. the mixing rate of the Markov chain, is determined by the second largest (in magnitude) eigenvalue of the transition matrix. In this paper we address the problem of assigning probabilities to the edges of the graph in such a way as to minimize the second largest magnitude eigenvalue, i.e., the problem of finding the fastest mixing Markov chain on the graph. We show that
Honest Exploration of Intractable Probability Distributions Via Markov Chain Monte Carlo
- STATISTICAL SCIENCE
, 2001
"... Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burn-in? and (Q2) How long should the sampling continue after burn-in? Developing rigorous answers to these questions presently requires a detailed study of the ..."
Abstract
-
Cited by 55 (17 self)
- Add to MetaCart
Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burn-in? and (Q2) How long should the sampling continue after burn-in? Developing rigorous answers to these questions presently requires a detailed study of the convergence properties of the underlying Markov chain. Consequently, in most practical applications of MCMC, exact answers to (Q1) and (Q2) are not sought. The goal of this paper is to demystify the analysis that leads to honest answers to (Q1) and (Q2). The authors hope that this article will serve as a bridge between those developing Markov chain theory and practitioners using MCMC to solve practical problems. The ability to formally address (Q1) and (Q2) comes from establishing a drift condition and an associated minorization condition, which together imply that the underlying Markov chain is geometrically ergodic. In this paper, we explain exactly what drift and minorization are as well as how and why these conditions can be used to form rigorous answers to (Q1) and (Q2). The basic ideas are as follows. The results of Rosenthal (1995) and Roberts and Tweedie (1999) allow one to use drift and minorization conditions to construct a formula giving an analytic upper bound on the distance to stationarity. A rigorous answer to (Q1) can be calculated using this formula. The desired characteristics of the target distribution are typically estimated using ergodic averages. Geometric ergodicity of the underlying Markov chain implies that there are central limit theorems available for ergodic averages (Chan and Geyer 1994). The regenerative simulation technique (Mykland, Tierney and Yu 1995, Robert 1995) can be used to get a consistent estimate of the variance of the asymptotic nor...
Markov Chains and Polynomial time Algorithms
, 1994
"... This paper outlines the use of rapidly mixing Markov Chains in randomized polynomial time algorithms to solve approximately certain counting prob-lems. They fall into two classes: combinatorial problems like counting the number of perfect matchings in certain graphs and geometric ones like computing ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
This paper outlines the use of rapidly mixing Markov Chains in randomized polynomial time algorithms to solve approximately certain counting prob-lems. They fall into two classes: combinatorial problems like counting the number of perfect matchings in certain graphs and geometric ones like computing the volumes of convex sets.
Variation of Cost Functions in Integer Programming
- MATHEMATICAL PROGRAMMING
, 1994
"... We study the problem of minimizing c \Delta x subject to A \Delta x = b, x 0 and x integral, for a fixed matrix A. Two cost functions c and c 0 are considered equivalent if they give the same optimal solutions for each b. We construct a polytope St(A) whose normal cones are the equivalence classe ..."
Abstract
-
Cited by 37 (8 self)
- Add to MetaCart
We study the problem of minimizing c \Delta x subject to A \Delta x = b, x 0 and x integral, for a fixed matrix A. Two cost functions c and c 0 are considered equivalent if they give the same optimal solutions for each b. We construct a polytope St(A) whose normal cones are the equivalence classes. Explicit inequality presentations of these cones are given by the reduced Gröbner bases associated with A. The union of the reduced Gröbner bases as c varies (called the universal Gröbner basis) consists precisely of the edge directions of St(A). We present geometric algorithms for computing St(A), the Graver basis [Gra], and the universal Gröbner basis.
Simulatable auditing
- In PODS
, 2005
"... Given a data set consisting of private information about individuals, we consider the online query auditing problem: given a sequence of queries that have already been posed about the data, their corresponding answers – where each answer is either the true answer or “denied ” (in the event that reve ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
Given a data set consisting of private information about individuals, we consider the online query auditing problem: given a sequence of queries that have already been posed about the data, their corresponding answers – where each answer is either the true answer or “denied ” (in the event that revealing the answer compromises privacy) – and given a new query, deny the answer if privacy may be breached or give the true answer otherwise. A related problem is the offline auditing problem where one is given a sequence of queries and all of their true answers and the goal is to determine if a privacy breach has already occurred. We uncover the fundamental issue that solutions to the offline auditing problem cannot be directly used to solve the online auditing problem since query denials may leak information. Consequently, we introduce a new model called simulatable auditing where query denials provably do not leak information. We demonstrate that max queries may be audited in this simulatable paradigm under the classical definition of privacy where a breach occurs if a sensitive value is fully compromised. We also introduce a probabilistic notion of (partial) compromise. Our privacy definition requires that the a-priori probability that a sensitive value lies within some small interval is not that different from the posterior probability (given the query answers). We demonstrate that sum queries can be audited in a simulatable fashion under this privacy definition.
Gröbner Bases and Polyhedral Geometry of Reducible and Cyclic Models
- J. Combin. Theory Ser. A
, 2002
"... This article studies the polyhedral structure and combinatorics of polytopes that arise from hierarchical models in statistics, and shows how to construct Gröbner bases of toric ideals associated to a subset of such models. We study the polytopes for cyclic models, and we give a complete polyhedral ..."
Abstract
-
Cited by 30 (9 self)
- Add to MetaCart
This article studies the polyhedral structure and combinatorics of polytopes that arise from hierarchical models in statistics, and shows how to construct Gröbner bases of toric ideals associated to a subset of such models. We study the polytopes for cyclic models, and we give a complete polyhedral description of these polytopes in the binary cyclic case. Further we show how to build Gröbner bases of a reducible model from the Gröbner bases of its pieces. This result also gives a different proof that decomposable models have quadratic Gröbner bases. Finally, we present the solution of a problem posed by Vlach [13] concerning the dimension of fibers coming from models corresponding to the boundary of a simplex.

