Results 1 - 10
of
22
DSybil: Optimal Sybil-Resistance for Recommendation Systems
, 2009
"... Recommendation systems can be attacked in various ways, and the ultimate attack form is reached with a sybil attack, where the attacker creates a potentially unlimited number of sybil identities to vote. Defending against sybil attacks is often quite challenging, and the nature of recommendation sys ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
(Show Context)
Recommendation systems can be attacked in various ways, and the ultimate attack form is reached with a sybil attack, where the attacker creates a potentially unlimited number of sybil identities to vote. Defending against sybil attacks is often quite challenging, and the nature of recommendation systems makes it even harder. This paper presents DSybil, a novel defense for diminishing the influence of sybil identities in recommendation systems. DSybil provides strong provable guarantees that hold even under the worst-case attack and are optimal. DSybil can defend against an unlimited number of sybil identities over time. DSybil achieves its strong guarantees by i) exploiting the heavy-tail distribution of the typical voting behavior of the honest identities, and ii) carefully identifying whether the system is already getting “enough help ” from the (weighted) voters already taken into account or whether more “help ” is needed. Our evaluation shows that DSybil would continue to provide high-quality recommendations even when a millionnode botnet uses an optimal strategy to launch a sybil attack. 1.
Benefits of bias: Towards better characterization of network sampling
- In SIGKDD
, 2011
"... From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural r ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
(Show Context)
From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they “push” the sampling process towards inclusion of desired properties. Finally, we describe how these sampling biases can be exploited in several, real-world applications including disease outbreak detection and market research.
Fast Nearest-neighbor Search in Disk-resident Graphs
, 2010
"... Link prediction, personalized graph search, fraud detection, and many such graph mining problems revolve around the computation of the most “similar” k nodes to a given query node. One widely used class of similarity measures is based on random walks on graphs, e.g., personalized pagerank, hitting a ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Link prediction, personalized graph search, fraud detection, and many such graph mining problems revolve around the computation of the most “similar” k nodes to a given query node. One widely used class of similarity measures is based on random walks on graphs, e.g., personalized pagerank, hitting and commute times, and simrank. There are two fundamental problems associated with these measures. First, existing online algorithms typically examine the local neighborhood of the query node which can become significantly slower whenever high-degree nodes are encountered (a common phenomenon in real-world graphs). We prove that turning high degree nodes into sinks results in only a small approximation error, while greatly improving running times. The second problem is that of computing similarities at query time when the graph is too large to be memoryresident. The obvious solution is to split the graph into clusters of nodes and store each cluster on a disk page; ideally random walks will rarely cross cluster boundaries and cause page-faults. Our contributions here are twofold: (a) we present an efficient deterministic algorithm to find the k closest neighbors (in terms of personalized pagerank) of any query node in such a clustered graph, and (b) we develop a clustering algorithm (RWDISK) that uses only sequential sweeps over data files. Empirical results on several large publicly available graphs like DBLP, Citeseer and Live-Journal ( ∼ 90 M edges) demonstrate that turning high degree nodes into sinks not only improves running time of RWDISK by a factor of 3 but also boosts link prediction accuracy by a factor of 4 on average. We also show that RWDISK returns more desirable (high conductance and small size) clusters than the popular clustering algorithm METIS, while requiring much less memory. Finally our deterministic algorithm for computing nearest neighbors incurs far fewer page-faults (factor of 5) than actually simulating random walks
A Sharp PageRank Algorithm with Applications to Edge Ranking and Graph Sparsification
, 2010
"... We give an improved algorithm for computing personalized PageRank vectors with tight error bounds which can be as small as Ω(n −p) for any fixed positive integer p. The improved PageRank algorithm is crucial for computing a quantitative ranking of edges in a given graph. We will use the edge rankin ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
We give an improved algorithm for computing personalized PageRank vectors with tight error bounds which can be as small as Ω(n −p) for any fixed positive integer p. The improved PageRank algorithm is crucial for computing a quantitative ranking of edges in a given graph. We will use the edge ranking to examine two interrelated problems – graph sparsification and graph partitioning. We can combine the graph sparsification and the partitioning algorithms using PageRank vectors to derive an improved partitioning algorithm.
Graph theory in the information age
, 2009
"... In the past decade, graph theory has gone through a remarkable shift and a profound transformation. The change is in large part due to the humongous amount of information that we are confronted with. A main way to sort through massive data sets is to build and examine the network formed by interrela ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In the past decade, graph theory has gone through a remarkable shift and a profound transformation. The change is in large part due to the humongous amount of information that we are confronted with. A main way to sort through massive data sets is to build and examine the network formed by interrelations. For example, Google’s successful web search algorithms are based on the WWW graph which contains all Webpages as vertices and hyperlinks as edges. There are all sorts of information networks such as biological networks built from biological databases and social networks formed by email, phone calls, instant messaging, etc., as well as various types of physical networks. Of particular interest to mathematicians is the collaboration graph which is based on the data of Mathematical Reviews. In the collaboration graph, every mathematician is a vertex and two mathematicians who wrote a joint paper are connected by an edge. Figure 1 illustrates a portion of the collaboration graph consisting of about 5000 vertices, representing mathematicians with Erdős number 2 (i.e., who wrote a paper with a coauthor of Paul Erdős). Graph
MANIPULATION OF PAGERANK AND COLLECTIVE HIDDEN MARKOV MODELS
, 2010
"... The first part of this thesis explores issues surrounding the manipulation of PageRank, a popular link-analysis based reputation system for the web. Page-Rank is an essential part of web search, but it is also subject to manipulation by selfish web authors. We develop an alternative to PageRank, bas ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The first part of this thesis explores issues surrounding the manipulation of PageRank, a popular link-analysis based reputation system for the web. Page-Rank is an essential part of web search, but it is also subject to manipulation by selfish web authors. We develop an alternative to PageRank, based on ex-pected hitting time in a random walk, that is provably robust to manipulation by outlinks. We then study the effects of manipulation on the network itself by analyzing the stable outcomes of the PageRank Game, a network-formation model where web pages place outlinks strategically to maximize PageRank. The second part of the thesis explores probabilistic inference algorithms for a family of models called collective hidden Markov models. These generalize hidden Markov models (HMMs) to the situation in which one views partial in-formation about many indistinguishable objects that all behave according to the same Markovian dynamics. Collective HMMs are motivated by an important problem in ecology: inferring bird migration paths from a large database of observations.
Network Reputation Games
, 2008
"... Originally, hyperlinks on the web were placed for organic reasons, presumably to aid navigation or identify a resource deemed relevant by the human author. However, link-based reputation measures used by search engines (e.g., PageRank) have altered the dynamics of link-placement by introducing new i ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Originally, hyperlinks on the web were placed for organic reasons, presumably to aid navigation or identify a resource deemed relevant by the human author. However, link-based reputation measures used by search engines (e.g., PageRank) have altered the dynamics of link-placement by introducing new incentives into the system. Strategic authors — spammers and others — now explicitly attempt to boost their own PageRank by careful link-placement. This paper investigates the consequences of such strategic behavior via a network formation game. Our model assumes that authors may place outlinks arbitrarily, but have no control over their inlinks, and their objective is to maximize reputation. What is the best link-placement strategy? What are the equilibrium outcomes? What properties do equilibria possess? We show that two similar reputation measures — PageRank and hitting time — lead to dramatically different equilibrium outcomes. Since hitting time is immune to strategic placement of outlinks, any directed graph is a Nash equilibrium. On the other hand, equilibria in the PageRank game have a very rich structure: unless the links are delicately balanced, some page can increase its PageRank by dropping all of its links and pointing to just one carefully chosen page. Every equilibrium has a core in which all edges are bidirectional. In a slightly restricted setting, equilibria are characterized exactly by simple properties, the essential of which is a combintorial equivalence among all (bidirectional) edges called edgewise walk-regularity. We also demonstrate surprising algebraic properties of equilibria, relating eigenvalues and their multiplicities to graph structure.
On the Sybil-proofness of accounting mechanisms
- In Proc. NetEcon
, 2011
"... A common challenge in distributed work systems like P2P file-sharing communities, or ad-hoc routing networks, is to minimize the number of free-riders and incentivize contributions. Without any centralized monitoring it is difficult to distinguish contributors from free-riders. One way to address th ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
A common challenge in distributed work systems like P2P file-sharing communities, or ad-hoc routing networks, is to minimize the number of free-riders and incentivize contributions. Without any centralized monitoring it is difficult to distinguish contributors from free-riders. One way to address this problem is via accounting mechanisms which rely on voluntary reports by individual agents and compute a score for each agent in the network. In Seuken et al. [11], we have recently proposed a mechanism which removes any incentive for a user to manipulate the mechanism via misreports. However, we left the existence of sybil-proof accounting mechanisms as an open question. In this paper, we settle this question, and show the striking impossibility result that under reasonable assumptions no sybil-proof accounting mechanism exists. We show, that a significantly weaker form of K-sybil-proofness can be achieved against certain classes of sybil attacks. Finally, we explain how limited robustness to sybil manipulations can be achieved by using max-flow algorithms in accounting mechanism design.
Hybrid Transitive Trust Mechanisms
, 2010
"... Establishing trust amongst agents is of central importance to the development of well-functioning multi-agent systems. For example, the anonymity of transactions on the Internet can lead to inefficiencies; e.g., a seller on eBay failing to ship a good as promised, or a user free-riding on a file-sha ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Establishing trust amongst agents is of central importance to the development of well-functioning multi-agent systems. For example, the anonymity of transactions on the Internet can lead to inefficiencies; e.g., a seller on eBay failing to ship a good as promised, or a user free-riding on a file-sharing network. Trust (or reputation) mechanisms can help by aggregating and sharing trust information between agents. Unfortunately these mechanisms can often be manipulated by strategic agents. Existing mechanisms are either very robust to manipulation (i.e., manipulations are not beneficial for strategic agents), or they are very informative (i.e., good at aggregating trust data), but never both. This paper explores this trade-off between these competing desiderata. First, we introduce a metric to evaluate the informativeness of existing trust mechanisms. We then show analytically that trust mechanisms can be combined to generate new hybrid mechanisms with intermediate robustness properties. We establish through simulation that hybrid mechanisms can achieve higher overall efficiency in environments with risky transactions and mixtures of agent types (some cooperative, some malicious, and some strategic) than any previously known mechanism.
Evaluating Multi-Way Joins over Discounted Hitting Time
"... Abstract—The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, is useful in various applications, including link prediction, collaborative recommendation, and reputation ranking. We examine a novel query, called the multi-way join (or n-way join), on DHT ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract—The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, is useful in various applications, including link prediction, collaborative recommendation, and reputation ranking. We examine a novel query, called the multi-way join (or n-way join), on DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a set of n-tuples with the k highest scores, according to some aggregation function of DHT values. This query enables analysis and prediction of complex relationship among n sets of nodes. Since an n-way join is expensive to compute, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since PJ may necessitate the computation of top-(m + 1) 2-way joins, we study an incremental solution, which allows the top-(m + 1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. We further examine fast processing and pruning algorithms for 2-way joins. An extensive evaluation on three real datasets shows that PJ accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions. I.