Results 1 - 10
of
134
De-anonymizing social networks
, 2009
"... Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present ..."
Abstract
-
Cited by 216 (6 self)
- Add to MetaCart
(Show Context)
Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized socialnetwork graphs. To demonstrate its effectiveness on realworld networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12 % error rate. Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy “sybil” nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary’s auxiliary information is small.
Towards identity anonymization on graphs
- In Proceedings of ACM SIGMOD
, 2008
"... The proliferation of network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of the nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itsel ..."
Abstract
-
Cited by 123 (6 self)
- Add to MetaCart
(Show Context)
The proliferation of network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of the nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals. To address this issue, we study a specific graph-anonymization problem. We call a graph k-degree anonymous if for every node v, there exist at least k−1 other nodes in the graph with the same degree as v. This definition of anonymity prevents the re-identification of individuals by adversaries with a priori knowledge of the degree of certain nodes. We formally define the graph-anonymization problem that, given a graph G, asks for the k-degree anonymous graph that stems from G with the minimum number of graph-modification operations. We devise simple and efficient algorithms for solving this problem. Our algorithms are based on principles related to the realizability of degree sequences. We apply our methods to a large spectrum of synthetic and real datasets and demonstrate their efficiency and practical utility.
Resisting Structural Re-identification in Anonymized Social Networks
, 2008
"... We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked dat ..."
Abstract
-
Cited by 105 (6 self)
- Add to MetaCart
(Show Context)
We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked data is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we quantify the privacy risks associated with three classes of attacks on the privacy of individuals in networks, based on the knowledge used by the adversary. We show that the risks of these attacks vary greatly based on network structure and size. We propose a novel approach to anonymizing network data that models aggregate network structure and then allows samples to be drawn from that model. The approach guarantees anonymity for network entities while preserving the ability to estimate a wide variety of network measures with relatively little bias.
Characterizing Privacy in Online Social Networks
"... Online social networks (OSNs) with half a billion users have dramatically raised concerns on privacy leakage. Users, often willingly, share personal identifying information about themselves, but do not have a clear idea of who accesses their private information or what portion of it really needs to ..."
Abstract
-
Cited by 94 (8 self)
- Add to MetaCart
(Show Context)
Online social networks (OSNs) with half a billion users have dramatically raised concerns on privacy leakage. Users, often willingly, share personal identifying information about themselves, but do not have a clear idea of who accesses their private information or what portion of it really needs to be accessed. In this study we examine popular OSNs from a viewpoint of characterizing potential privacy leakage. Our study identifies what bits of information are currently being shared, how widely, and what users can do to prevent such sharing. We also examine the role of third-party sites that track OSN users and compare with privacy leakage on popular traditional Web sites. Our long term goal is to identify the narrow set of private information that users really need to share to accomplish specific interactions on OSNs.
Anonymizing Bipartite Graph Data using Safe Groupings
, 2008
"... Private data often comes in the form of associations between entities, such as customers and products bought from a pharmacy, which are naturally represented in the form of a large, sparse bipartite graph. As with tabular data, it is desirable to be able to publish anonymized versions of such data, ..."
Abstract
-
Cited by 56 (4 self)
- Add to MetaCart
Private data often comes in the form of associations between entities, such as customers and products bought from a pharmacy, which are naturally represented in the form of a large, sparse bipartite graph. As with tabular data, it is desirable to be able to publish anonymized versions of such data, to allow others to perform ad hoc analysis of aggregate graph properties. However, existing tabular anonymization techniques do not give useful or meaningful results when applied to graphs: small changes or masking of the edge structure can radically change aggregate graph properties. We introduce a new family of anonymizations, for bipartite graph data, called (k, ℓ)-groupings. These groupings preserve the underlying graph structure perfectly, and instead anonymize the mapping from entities to nodes of the graph. We identify a class of “safe ” (k, ℓ)-groupings that have provable guarantees to resist a variety of attacks, and show how to find such safe groupings. We perform experiments on real bipartite graph data to study the utility of the anonymized version, and the impact of publishing alternate groupings of the same graph data. Our experiments demonstrate that (k, ℓ)-groupings offer strong tradeoffs between privacy and utility.
A Clustering Approach for Data and Structural Anonymity in Social Networks
- In Privacy, Security, and Trust in KDD Workshop (PinKDD
, 2008
"... The advent of social network sites in the last few years seems to be a trend that will likely continue in the years to come. Online social interaction has become very popular around the globe and most sociologists agree that this will not fade away. Such a development is possible due to the advancem ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
(Show Context)
The advent of social network sites in the last few years seems to be a trend that will likely continue in the years to come. Online social interaction has become very popular around the globe and most sociologists agree that this will not fade away. Such a development is possible due to the advancements in computer power, technologies, and the spread of the World Wide Web. What many naïve technology users may not always realize is that the information they provide online is stored in massive data repositories and may be used for various purposes. Researchers have pointed out for some time the privacy implications of massive data gathering, and a lot of effort has been made to protect the data from unauthorized disclosure. However, most of the data privacy research has been focused on more traditional data models such as microdata (data stored as one relational table, where each row represents an individual entity). More recently, social network data has begun to be analyzed from a different, specific privacy perspective. Since the individual entities in social networks, besides the attribute values that characterize them, also have relationships with other entities, the possibility of privacy breaches increases. Our main contributions in this paper are the development of a greedy privacy algorithm for anonymizing a social network and the introduction of a structural information loss measure that quantifies the amount of information lost due to edge generalization in the anonymization process.
Relationship privacy: Output perturbation for queries with joins
- In ACM Symposium on Principles of Database Systems, 2009. [13] Yossi
"... We study privacy-preserving query answering over data containing relationships. A social network is a prime example of such data, where the nodes represent individuals and edges represent relationships. Nearly all interesting queries over social networks involve joins, and for such queries, existing ..."
Abstract
-
Cited by 53 (8 self)
- Add to MetaCart
(Show Context)
We study privacy-preserving query answering over data containing relationships. A social network is a prime example of such data, where the nodes represent individuals and edges represent relationships. Nearly all interesting queries over social networks involve joins, and for such queries, existing output perturbation algorithms severely distort query answers. We propose an algorithm that significantly improves utility over competing techniques, typically reducing the error bound from polynomial in the number of nodes to polylogarithmic. The algorithm is, to the best of our knowledge, the first to answer such queries with acceptable accuracy, even for worst-case inputs. The improved utility is achieved by relaxing the privacy condition. Instead of ensuring strict differential privacy, we guarantee a weaker (but still quite practical) condition based on adversarial privacy. To explain precisely the nature of our relaxation in privacy, we provide a new result that characterizes the relationship between ǫ-indistinguishability (a variant of the differential privacy definition) and adversarial privacy, which is of independent interest: an algorithm is ǫ-indistinguishable iff it is private for a particular class of adversaries (defined precisely herein). Our perturbation algorithm guarantees privacy against adversaries in this class whose prior distribution is numerically bounded.
K-Automorphism: A general framework for privacy preserving network publication
- In VLDB
, 2009
"... The growing popularity of social networks has generated interesting data management and data mining problems. An important concern in the release of these data for study is their privacy, since social networks usually contain personal information. Simply removing all identifiable personal informatio ..."
Abstract
-
Cited by 50 (1 self)
- Add to MetaCart
(Show Context)
The growing popularity of social networks has generated interesting data management and data mining problems. An important concern in the release of these data for study is their privacy, since social networks usually contain personal information. Simply removing all identifiable personal information (such as names and social security number) before releasing the data is insufficient. It is easy for an attacker to identify the target by performing different structural queries. In this paper we propose k-automorphism to protect against multiple structural attacks and develop an algorithm (called KM) that ensures k-automorphism. We also discuss an extension of KM to handle “dynamic ” releases of the data. Extensive experiments show that the algorithm performs well in terms of protection it provides. 1.
Accurate Estimation of the Degree Distribution of Private Networks
"... Abstract—We describe an efficient algorithm for releasing a provably private estimate of the degree distribution of a network. The algorithm satisfies a rigorous property of differential privacy, and is also extremely efficient, running on networks of 100 million nodes in a few seconds. Theoretical ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
(Show Context)
Abstract—We describe an efficient algorithm for releasing a provably private estimate of the degree distribution of a network. The algorithm satisfies a rigorous property of differential privacy, and is also extremely efficient, running on networks of 100 million nodes in a few seconds. Theoretical analysis shows that the error scales linearly with the number of unique degrees, whereas the error of conventional techniques scales linearly with the number of nodes. We complement the theoretical analysis with a thorough empirical analysis on real and synthetic graphs, showing that the algorithm’s variance and bias is low, that the error diminishes as the size of the input graph increases, and that common analyses like fitting a power-law can be carried out very accurately. Keywords-privacy; social networks; privacy-preserving data mining; differential privacy. I.
A Brief Survey on Anonymization Techniques for Privacy Preserving Publishing of Social Network Data
- In SIGKDD Explorations
"... Nowadays, partly driven by many Web 2.0 applications, more and more social network data has been made publicly available and analyzed in one way or another. Privacy preserving publishing of social network data becomes a more and more important concern. In this paper, we present a brief yet systemati ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
(Show Context)
Nowadays, partly driven by many Web 2.0 applications, more and more social network data has been made publicly available and analyzed in one way or another. Privacy preserving publishing of social network data becomes a more and more important concern. In this paper, we present a brief yet systematic review of the existing anonymization techniques for privacy preserving publishing of social network data. We identify the new challenges in privacy preserving publishing of social network data comparing to the extensively studied relational case, and examine the possible problem formulation in three important dimensions: privacy, background knowledge, and data utility. We survey the anonymization methods for privacy preservation in two categories: clustering-based approaches and graph modification approaches. 1