Results 1 - 10
of
120
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
Expander Graphs and their Applications
, 2003
"... Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . ..."
Abstract
-
Cited by 113 (4 self)
- Add to MetaCart
Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.3 De-randomizing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Magical Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 A Super Concentrator with O(n) edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.3 De-randomizing Random Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The unique games conjecture, integrality gap for cut problems and embeddability of negative type metrics into ℓ1
- In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science
, 2005
"... In this paper we disprove the following conjecture due to Goemans [16] and Linial [24] (also see [5, 26]): “Every negative type metric embeds into ℓ1 with constant distortion.” We show that for every δ>0, and for large enough n, there is an n-point negative type metric which requires distortion at-l ..."
Abstract
-
Cited by 101 (6 self)
- Add to MetaCart
In this paper we disprove the following conjecture due to Goemans [16] and Linial [24] (also see [5, 26]): “Every negative type metric embeds into ℓ1 with constant distortion.” We show that for every δ>0, and for large enough n, there is an n-point negative type metric which requires distortion at-least (log log n) 1/6−δ to embed into ℓ1. Surprisingly, our construction is inspired by the Unique Games Conjecture (UGC) of Khot [19], establishing a previously unsuspected connection between PCPs and the theory of metric embeddings. We first prove that the UGC implies super-constant hardness results for (non-uniform) SPARSEST CUT and MINIMUM UNCUT problems. It is already known that the UGC also implies an optimal hardness result for MAXIMUM CUT [20]. Though these hardness results depend on the UGC, the integrality gap instances rely “only ” on the PCP reductions for the respective problems. Towards this, we first construct an integrality gap instance for a natural SDP relaxation of UNIQUE GAMES. Then, we “simulate ” the PCP reduction and “translate ” the integrality gap instance of UNIQUE GAMES to integrality gap instances for the respective cut problems! This enables us to prove a (log log n) 1/6−δ integrality gap for (non-uniform) SPARSEST CUT and MIN-IMUM UNCUT, and an optimal integrality gap for MAX-IMUM CUT. All our SDP solutions satisfy the so-called “triangle inequality ” constraints. This also shows, for the first time, that the triangle inequality constraints do not add any power to the Goemans-Williamson’s SDP relaxation of MAXIMUM CUT. The integrality gap for SPARSEST CUT immediately implies a lower bound for embedding negative type metrics into ℓ1. It also disproves the non-uniform version of Arora, Rao and Vazirani’s Conjecture [5], asserting that the integrality gap of the SPARSEST CUT SDP, with the triangle inequality constraints, is bounded from above by a constant.
Euclidean distortion and the Sparsest Cut
- In Proceedings of the 37th Annual ACM Symposium on Theory of Computing
, 2005
"... Bi-Lipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] ..."
Abstract
-
Cited by 77 (20 self)
- Add to MetaCart
Bi-Lipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] which shows that every
Measured descent: A new embedding method for finite metrics
- Geom. Funct. Anal
, 2004
"... We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. ..."
Abstract
-
Cited by 74 (20 self)
- Add to MetaCart
We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure.
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract
-
Cited by 65 (6 self)
- Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse real-world networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large real-world networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “community-like.” This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
On the Hardness of Approximating Multicut and Sparsest-Cut
- In Proceedings of the 20th Annual IEEE Conference on Computational Complexity
, 2005
"... We show that the MULTICUT, SPARSEST-CUT, and MIN-2CNF ≡ DELETION problems are NP-hard to approximate within every constant factor, assuming the Unique Games Conjecture of Khot [STOC, 2002]. A quantitatively stronger version of the conjecture implies inapproximability factor of Ω(log log n). 1. ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
We show that the MULTICUT, SPARSEST-CUT, and MIN-2CNF ≡ DELETION problems are NP-hard to approximate within every constant factor, assuming the Unique Games Conjecture of Khot [STOC, 2002]. A quantitatively stronger version of the conjecture implies inapproximability factor of Ω(log log n). 1.
A combinatorial, primal-dual approach to semidefinite programs
- In STOC
, 2007
"... Semidefinite programs (SDP) have been used in many recent approximation algorithms. We develop a general primal-dual approach to solve SDPs using a generalization of the well-known multiplicative weights update rule to symmetric matrices. For a number of problems, such as Sparsest Cut and Balanced S ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
Semidefinite programs (SDP) have been used in many recent approximation algorithms. We develop a general primal-dual approach to solve SDPs using a generalization of the well-known multiplicative weights update rule to symmetric matrices. For a number of problems, such as Sparsest Cut and Balanced Separator in undirected and directed weighted graphs, and the Min UnCut problem, this yields combinatorial approximation algorithms that are significantly more efficient than interior point methods. The design of our primal-dual algorithms is guided by a robust analysis of rounding algorithms used to obtain integer solutions from fractional ones. 1
Can ISPs and P2P users cooperate for improved performance
- ACM SIGCOMM Computer Communication Review
, 2007
"... This paper addresses the antagonistic relationship between overlay/p2p networks and IPS providers: they both try to manage and control traffic at different level and with different goals, but in a way that inevitably leads to overlapping, duplicated, and conflicting behavior. The creation of a p2p n ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
This paper addresses the antagonistic relationship between overlay/p2p networks and IPS providers: they both try to manage and control traffic at different level and with different goals, but in a way that inevitably leads to overlapping, duplicated, and conflicting behavior. The creation of a p2p network and the routing at the p2p layer are ultimately treading on the routing functions of ISPs. The paper proposes a solution to develop a synergistic relationship between p2p and ISPs: ISPs maintain an “oracle ” to help p2p networks in making better choices in picking neighboring nodes. The solution provides benefits to both parties. ISPs become able to influence the p2p decisions, and ultimately the amount of traffic that flows in and out of their network, while p2p networks get performance information for “free. ” The reviewers find that the problem is important and the solution is interesting and shows promise. An advantage of the method is that ISPs do not run into legal issues, since they do not engage in caching of potentially illegal content, they just provide performance information. a c m s i g c o m m Public review written by
The multiplicative weights update method: a meta algorithm and applications
, 2005
"... Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies ..."
Abstract
-
Cited by 37 (9 self)
- Add to MetaCart
Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies these disparate algorithms and drives them as simple instantiations of the meta algorithm. 1

