## Local correlation clustering

### Citations

450 | Clustering gene expression patterns
- Ben-Dor, Shamir, et al.
- 1999
(Show Context)
Citation Context ...lidean distance of vectors. Thanks to this generality, the technique is applicable to a multitude of problems in different domains, including duplicate detection and similarity joins [17,27], biology =-=[11]-=-, image segmentation [30] and social networks [12]. Another key feature of correlation clustering is that it does not require a prefixed number of clusters, instead it automatically finds the optimal ... |

400 |
Probabilistic computations: Toward a unified measure of complexity
- Yao
- 1977
(Show Context)
Citation Context ...on, and then we look at which cluster contains this piece in the best coarsening of the partition. The details are in Section 5. The lower bounds, proven in Section 6, are applications of Yao’s lemma =-=[42]-=-. Broadly speaking, we give the candidate algorithm a perfect clustering of most vertices of the graph into t = O(1/ε) clusters of equal size, and for each of the remaining vertices a “secret” cluster... |

331 | Correlation clustering
- Bansal, Blum, et al.
- 2004
(Show Context)
Citation Context ...MaxAgree and MinDisagree, while MaxAgree[k] and MinDisagree[k] refer to the variants of the problem with a bound k on the number of clusters. Not surprisingly MaxAgree and MinDisagree are NP-complete =-=[10,37]-=-; the same holds for their bounded counterparts, provided that k ≥ 2. Therefore approximate solutions are of interest. For MaxAgree, there is a (randomized) PTAS: the first such result was due to Bans... |

250 | Szemeredi’s Regularity Lemma and its applications in graph theory.
- Komlos, Simonovits
- 1996
(Show Context)
Citation Context ...ity lemma. One of the cornerstone results in graph theory is the regularity lemma of Szemerédi, which has found a myriad applications in combinatorics, number theory and theoretical computer science =-=[31]-=-. It asserts that every graph G can be approximated by a small collection of random bipartite graphs; in fact from G we can construct a small “reduced” weighted graph G̃ of constant size which inherit... |

226 | Aggregating inconsistent information: Ranking and clustering. - Ailon, Charikar, et al. - 2008 |

145 | Quick approximation to matrices and applications,
- Frieze, Kannan
- 1999
(Show Context)
Citation Context ...of the clustering found by working with the reduced graph, even if no heuristics were applied. To address these issues, we opt to use a weaker variant of the regularity lemma due to Frieze and Kannan =-=[21,22]-=-. It has better quantitative parameters and gives an implicit description of the partition, which opens the door for local clustering. 5.2 Cut decompositions of matrices The idea of Frieze and Kannan ... |

122 | Clustering with qualitative information
- Charikar, Guruswami, et al.
- 2003
(Show Context)
Citation Context ... (O(1/ε)), later improved to n · 2poly(1/ε) by Giotis and Guruswami [25]. The latter also presented a PTAS for MaxAgree[k] that runs in time n · kO(ε−3 log(k/ε)). In contrast, MinDisagree is APX-hard =-=[14]-=-, so we do not expect a PTAS. Nevertheless, there are constantfactor approximation algorithms [2,10,14]. The best factor (2.5) was given by Ailon et al. [2], who also present a simple, elegant algorit... |

105 | A characterization of the (natural) graph properties testable with one-sided error
- Alon, Shapira
(Show Context)
Citation Context ... hereditary graph property (closed under removal 5 and renaming of vertices), hence it can be tested with one-sided error using a constant number of queries by the powerful result of Alon and Shapira =-=[8]-=-. Combined with the work of Fischer and Newman [20], this also yields estimators for cluster edit distance that run in time independent of the graph size. Unfortunately, the query complexity of the al... |

104 | The algorithmic aspects of the regularity lemma
- Alon, Duke, et al.
- 1994
(Show Context)
Citation Context ...in is chosen such that “internal” edges among vertices from the same class are few enough to be ignored. The original result was existential, but algorithms to construct a regular partition are known =-=[6, 19,23]-=- which run in time polynomial in |V | (for constant ε). This naturally suggests trying to use the partition classes in order to obtain an approximation of the optimal clustering. Nevertheless, to the ... |

89 |
The regularity lemma and approximation schemes for dense problems
- Frieze, Kannan
- 1996
(Show Context)
Citation Context ...of the clustering found by working with the reduced graph, even if no heuristics were applied. To address these issues, we opt to use a weaker variant of the regularity lemma due to Frieze and Kannan =-=[21,22]-=-. It has better quantitative parameters and gives an implicit description of the partition, which opens the door for local clustering. 5.2 Cut decompositions of matrices The idea of Frieze and Kannan ... |

87 | Non-redundant data clustering.
- Gondek, Hofmann
- 2004
(Show Context)
Citation Context ... constant hidden in the notation has an exponential dependence on the approximation parameter. The literature on active clustering also contains algorithms with sublinear query complexity (see, e.g., =-=[28]-=-); many of them are heuristic or do not apply to correlation clustering. Ailon et al. [1] obtain algorithms for MinDisagree[k] with sublinear query complexity, but the running time of their solutions ... |

79 |
Lower bounds of tower type for Szemerédi’s uniformity lemma
- Gowers
- 1997
(Show Context)
Citation Context ...uced graph G̃, and apply standard clustering algorithms to G̃. Since the partition size m required by the lemma is an 1/ε5-level iterated exponential ofmmin (and this kind of growth rate is necessary =-=[26]-=-), they propose heuristics to avoid this tower-exponential behaviour. However, the running time of their algorithms is at least nω, where ω ∈ [2, 2.373) is the exponent for matrix multiplication. More... |

61 | Testing of clustering
- Alon, Dar, et al.
- 2003
(Show Context)
Citation Context ...n [25] and [29]. There is also work on correlation clustering on incomplete graphs [10,14,17,25,41]. Sublinear clustering algorithms. Sublinear clustering algorithms for geometric data sets are known =-=[5, 9, 15, 16, 32]-=-. Many of these find implicit representations of the clustering they output. There is a natural implicit representation for most of this problems, e.g., the set of k cluster centers. By contrast, in c... |

59 | A local clustering algorithm for massive graphs and its application to nearly-linear time graph partitioning.
- Spielman, Teng
- 2008
(Show Context)
Citation Context ...eir framework: it corresponds to query-oblivious, parallelizable, strongly local algorithms that compute a cluster label function in constant time. Finally, we point out the work of Spielman and Teng =-=[39]-=- pertaining local clustering algorithms. In their papers an algorithm is “local” if it can, given a vertex v, output v’s cluster in time nearly linear in the cluster’s size. Our local clustering algor... |

49 | Correlation clustering: maximizing agreements via semidefinite programming.
- Swamy
- 2004
(Show Context)
Citation Context ...ly weaker expected approximation ratio of 3, called QuickCluster (see Section 4). For MinDisagree[k], PTAS appeared in [25] and [29]. There is also work on correlation clustering on incomplete graphs =-=[10,14,17,25,41]-=-. Sublinear clustering algorithms. Sublinear clustering algorithms for geometric data sets are known [5, 9, 15, 16, 32]. Many of these find implicit representations of the clustering they output. Ther... |

46 | Sublinear time approximate clustering
- Mishra, Oblinger, et al.
- 2001
(Show Context)
Citation Context ...n [25] and [29]. There is also work on correlation clustering on incomplete graphs [10,14,17,25,41]. Sublinear clustering algorithms. Sublinear clustering algorithms for geometric data sets are known =-=[5, 9, 15, 16, 32]-=-. Many of these find implicit representations of the clustering they output. There is a natural implicit representation for most of this problems, e.g., the set of k cluster centers. By contrast, in c... |

41 | Correlation clustering in general weighted graphs.
- Demaine, Emanuel, et al.
- 2006
(Show Context)
Citation Context ...c such as the Euclidean distance of vectors. Thanks to this generality, the technique is applicable to a multitude of problems in different domains, including duplicate detection and similarity joins =-=[17,27]-=-, biology [11], image segmentation [30] and social networks [12]. Another key feature of correlation clustering is that it does not require a prefixed number of clusters, instead it automatically find... |

40 | A discriminative framework for clustering via similarity functions.
- Balcan, Blum, et al.
- 2008
(Show Context)
Citation Context ...n [25] and [29]. There is also work on correlation clustering on incomplete graphs [10,14,17,25,41]. Sublinear clustering algorithms. Sublinear clustering algorithms for geometric data sets are known =-=[5, 9, 15, 16, 32]-=-. Many of these find implicit representations of the clustering they output. There is a natural implicit representation for most of this problems, e.g., the set of k cluster centers. By contrast, in c... |

40 | On approximation the minimum vertex cover in sublinear time and the connection to distributed algorithms
- Parnas, Ron
- 2005
(Show Context)
Citation Context ...eries (Section 6), we need less stringent requirements.2 One way is to allow an additional ε-fraction of edges to be violated, compared to the optimal clustering of cost OPT. Following Parnas and Ron =-=[33]-=-, we study (c, ε) approximations: solutions with at most c · OPT+ ε · n2 disagreements. These solutions form 2We remark that in a different model that uses neighborhood oracles [4], it is possible to ... |

39 | Survey of local algorithms
- Suomela
(Show Context)
Citation Context ...re. Each vertex of a sparse graph is assigned a processor, and each processor can compute a certain function in a constant number of rounds by passing messages to its neighbours (see Suomela’s survey =-=[40]-=-). Our algorithms are also local in this sense. Recently, Rubinfeld et al. [34] introduced a model that encompasses notions from several algorithmic subfields, such as locally decodable codes, local r... |

38 | Guruswami V.: Correlation clustering with a fixed number of clusters.
- Giotis
- 2006
(Show Context)
Citation Context ...ackground and related work Correlation clustering. Minimizing disagreements is the same as maximizing agreements for exact algorithms, but the two tasks differ with regard to approximation. Following =-=[25]-=-, we refer to these two problems as MaxAgree and MinDisagree, while MaxAgree[k] and MinDisagree[k] refer to the variants of the problem with a bound k on the number of clusters. Not surprisingly MaxAg... |

35 | Testing versus estimation of graph properties
- Fischer, Newman
- 2005
(Show Context)
Citation Context ... and renaming of vertices), hence it can be tested with one-sided error using a constant number of queries by the powerful result of Alon and Shapira [8]. Combined with the work of Fischer and Newman =-=[20]-=-, this also yields estimators for cluster edit distance that run in time independent of the graph size. Unfortunately, the query complexity of the algorithm given by these results would be a tower exp... |

29 | R.J.: Framework for Evaluating Clustering Algorithms in Duplicate Detection.
- Hassanzadeh, Chiang, et al.
- 2009
(Show Context)
Citation Context ...c such as the Euclidean distance of vectors. Thanks to this generality, the technique is applicable to a multitude of problems in different domains, including duplicate detection and similarity joins =-=[17,27]-=-, biology [11], image segmentation [30] and social networks [12]. Another key feature of correlation clustering is that it does not require a prefixed number of clusters, instead it automatically find... |

29 | Higher-order correlation clustering for image segmentation.
- Kim, Nowozin, et al.
- 2011
(Show Context)
Citation Context ...s. Thanks to this generality, the technique is applicable to a multitude of problems in different domains, including duplicate detection and similarity joins [17,27], biology [11], image segmentation =-=[30]-=- and social networks [12]. Another key feature of correlation clustering is that it does not require a prefixed number of clusters, instead it automatically finds the optimal number. Despite its appea... |

21 |
Biclustering in data mining",
- Busygin, Pardalos
- 2008
(Show Context)
Citation Context ... v’s cluster. 21 Variants of correlation clustering. The second algorithm (based on cut matrices) can easily be extended to chromatic correlation clustering [12], and to bi-clustering (co-clustering) =-=[13]-=-. 8 Concluding remarks This paper initiates the investigation into local correlation clustering, devising algorithms with sublinear time and query complexity. The tradeoff between the running time of ... |

19 |
Sublinear-time approximation algorithms for clustering via random sampling. Random Structures & Algorithms
- Czumaj, Sohler
- 2007
(Show Context)
Citation Context |

18 | A simple algorithm for constructing Szemerédi’s regularity partition
- Frieze, Kannan
- 1999
(Show Context)
Citation Context ...in is chosen such that “internal” edges among vertices from the same class are few enough to be ignored. The original result was existential, but algorithms to construct a regular partition are known =-=[6, 19,23]-=- which run in time polynomial in |V | (for constant ε). This naturally suggests trying to use the partition classes in order to obtain an approximation of the optimal clustering. Nevertheless, to the ... |

17 | D.: Property-preserving data reconstruction. Algorithmica 51
- Ailon, Chazelle, et al.
- 2008
(Show Context)
Citation Context ...usterable graph) so that the modified graph we output is close to the input and satisfies the property of being clusterable. This fits the paradigm of local property-preserving data reconstruction of =-=[3]-=- and [35]. To the best of our knowledge, this is the first work about local algorithms for correlation clustering. 3This bound can in fact be reduced to t for the non-adaptive algorithms we devise. 4 ... |

17 |
On graph problems in a semi-streaming model. Theoretical Computer Science
- Feigenbaum, Kannan, et al.
- 2005
(Show Context)
Citation Context ...ase the sublinear behaviour is lost because we still need to process every edge. However, the memory footprint of the algorithm can be brought down from Ω(n2) to O(n) (called the semi-streaming model =-=[18]-=-). Indeed, note that given a fixed random seed, for every vertex v the set of all possible queries Qv that can be made during the computation of cℓ(v) has size3 at most 2t. This set can be computed be... |

17 |
Fast Monte Carlo algorithms for finding low-rank approximations
- Frieze, Kannan, et al.
(Show Context)
Citation Context ...e of [2] with an additive error term. The algorithm and its analysis are given in Section 4. The second local algorithm (Theorem 3.6) borrows ideas from the PTAS for dense MaxCut of Frieze and Kannan =-=[24]-=- and uses low-rank approximations to the adjacency matrix of the graph. (Interestingly, while such approximations have been known for a long time, their implications for correlation clustering have be... |

13 |
Cluster graph modification problems. Discrete Applied Mathematics
- Shamir, Sharan, et al.
- 2004
(Show Context)
Citation Context ...ween the similarity graph and a clustering when it is a “+” edge connecting vertices in different clusters, or a “–” edge connecting vertices within the same cluster. If we were given a cluster graph =-=[37]-=- (or clusterable graph), i.e., a graph whose set of positive edges is the union of vertex-disjoint cliques, we would be able to produce a perfect (i.e., cost 0) clustering simply by computing the conn... |

12 | C.: Local monotonicity reconstruction
- Saks, Seshadhri
- 2010
(Show Context)
Citation Context ...e graph) so that the modified graph we output is close to the input and satisfies the property of being clusterable. This fits the paradigm of local property-preserving data reconstruction of [3] and =-=[35]-=-. To the best of our knowledge, this is the first work about local algorithms for correlation clustering. 3This bound can in fact be reduced to t for the non-adaptive algorithms we devise. 4 2 Backgro... |

10 | Active learning using smooth relative regret approximations with applications
- Ailon, Begleiter, et al.
- 2012
(Show Context)
Citation Context ...mial-time approximation scheme for correlation clustering known to date. 1 1 Introduction In correlation clustering1 we are given a set V of n objects and a pairwise similarity function sim : V × V → =-=[0, 1]-=-, and the goal is to cluster the items in such a way that, to the best possible extent, similar objects are put in the same cluster and dissimilar objects are put in different clusters. Assuming that ... |

10 | Linear time approximation schemes for the Gale-Berlekamp game and related minimization problems
- Karpinski, Schudy
- 2009
(Show Context)
Citation Context ...[2], who also present a simple, elegant algorithm that achieves a slightly weaker expected approximation ratio of 3, called QuickCluster (see Section 4). For MinDisagree[k], PTAS appeared in [25] and =-=[29]-=-. There is also work on correlation clustering on incomplete graphs [10,14,17,25,41]. Sublinear clustering algorithms. Sublinear clustering algorithms for geometric data sets are known [5, 9, 15, 16, ... |

9 | Fast local computation algorithms.
- Rubinfeld, Tamir, et al.
- 2011
(Show Context)
Citation Context ...an compute a certain function in a constant number of rounds by passing messages to its neighbours (see Suomela’s survey [40]). Our algorithms are also local in this sense. Recently, Rubinfeld et al. =-=[34]-=- introduced a model that encompasses notions from several algorithmic subfields, such as locally decodable codes, local reconstruction and local distributed computation. Our definition fits into their... |

7 | Approximate hypergraph partitioning and applications
- Fischer, Matsliah, et al.
(Show Context)
Citation Context ...in is chosen such that “internal” edges among vertices from the same class are few enough to be ignored. The original result was existential, but algorithms to construct a regular partition are known =-=[6, 19,23]-=- which run in time polynomial in |V | (for constant ε). This naturally suggests trying to use the partition classes in order to obtain an approximation of the optimal clustering. Nevertheless, to the ... |

6 | Correlation clustering revisited: The ”true” cost of error minimization problems
- Ailon, Liberty
- 2009
(Show Context)
Citation Context ...ing Parnas and Ron [33], we study (c, ε) approximations: solutions with at most c · OPT+ ε · n2 disagreements. These solutions form 2We remark that in a different model that uses neighborhood oracles =-=[4]-=-, it is possible to bypass the Ω(n2) lower bound for multiplicative approximations that holds for edge queries. In fact from our analysis we can derive the first sublinear-time constant-factor approxi... |

5 |
Wenceslas Fernandez de la
- Alon
(Show Context)
Citation Context ... the query complexity of the algorithm given by these results would be a tower exponential of height poly(1/ε), where ε is the approximation parameter. Approximation algorithms for MIN-2-CSP problems =-=[7]-=- also give estimators for cluster edit distance. However, they provide no way of computing each variable assignment in constant time. Moreover, they use time ∼ n2 to calculate all assignments, and hen... |

5 |
Small space representations for metric min-sum k-clustering and their applications
- Czumaj, Sohler
(Show Context)
Citation Context |

4 | Chromatic Correlation Clustering
- BONCHI, GIONIS, et al.
- 2012
(Show Context)
Citation Context ...ity, the technique is applicable to a multitude of problems in different domains, including duplicate detection and similarity joins [17,27], biology [11], image segmentation [30] and social networks =-=[12]-=-. Another key feature of correlation clustering is that it does not require a prefixed number of clusters, instead it automatically finds the optimal number. Despite its appeal, correlation clustering... |

3 | Szemerédi’s regularity lemma and its applications to pairwise clustering and segmentation. In Energy minimization methods in computer science and pattern recognition, volume 4679
- Sperotto, Pelilo
- 2007
(Show Context)
Citation Context ...an approximation of the optimal clustering. Nevertheless, to the best of our knowledge, the only prior attempts to exploit the regularity lemma for clustering 13 are the papers of Speroto and Pelillo =-=[38]-=- and Sárközy, Song, Szemerédi and Trivedi [36]. They use the constructive versions of the lemma to find the reduced graph G̃, and apply standard clustering algorithms to G̃. Since the partition siz... |

1 | A practical regularity partitioning algorithm and its applications in clustering
- Sárközy, Song, et al.
(Show Context)
Citation Context ...theless, to the best of our knowledge, the only prior attempts to exploit the regularity lemma for clustering 13 are the papers of Speroto and Pelillo [38] and Sárközy, Song, Szemerédi and Trivedi =-=[36]-=-. They use the constructive versions of the lemma to find the reduced graph G̃, and apply standard clustering algorithms to G̃. Since the partition size m required by the lemma is an 1/ε5-level iterat... |

1 | U = ⋃ i(Ui ∪ Π1(Zi)), V = ⋃ i(Vi ∪ Π2(Zi)). We need to compute compute Wi(u, v), I [u ∈ Rj ], I [v ∈ Sj] and dj for all (u, v - Let |