Results 1  10
of
32
SDP gaps and UGChardness for MaxCutGain
, 2008
"... Given a graph with maximum cut of (fractional) size c, the Goemans–Williamson semidefinite programming (SDP)based algorithm is guaranteed to find a cut of size at least.878 · c. However this guarantee becomes trivial when c is near 1/2, since making random cuts guarantees a cut of size 1/2 (i.e., ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
. This implies that beating the CharikarWirth guarantee with any efficient algorithm is NPhard, assuming the Unique Games Conjecture (UGC). This result essentially settles the asymptotic approximability of MaxCut, assuming UGC. Building on the first contribution, we show how “randomness reduction ” on related
unknown title
"... Maximizing quadratic programs:extending Grothendieck's inequality Moses Charikar*Princeton University Anthony Wirth#Princeton University Abstract This paper considers the following type of quadratic programming problem. Given an arbitrary matrix A, whose diagonal elements are zero, find x 2 { ..."
Abstract
 Add to MetaCart
Maximizing quadratic programs:extending Grothendieck's inequality Moses Charikar*Princeton University Anthony Wirth#Princeton University Abstract This paper considers the following type of quadratic programming problem. Given an arbitrary matrix A, whose diagonal elements are zero, find x 2
Improved Approximation Algorithms for Bipartite Correlation Clustering
"... Abstract. In this work we study the problem of Bipartite Correlation Clustering (BCC), a natural bipartite counterpart of the well studied Correlation Clustering (CC) problem. Given a bipartite graph, the objective of BCC is to generate a set of vertexdisjoint bicliques (clusters) which minimize ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
minimizes the symmetric difference to it. The best known approximation algorithm for BCC due to Amit (2004) guarantees an 11approximation ratio.4 In this paper we present two algorithms. The first is an improved 4approximation algorithm. However, like the previous approximation algorithm, it requires
Approximation algorithms for clustering
, 2005
"... Clustering items into groups is a fundamental problem in the information sciences. Many typical clustering optimization problems are NPhard and so cannot be expected to be solved optimally in a reasonable amount of time. Although the use of heuristics is common, in this dissertation we seek appro ..."
Abstract
 Add to MetaCart
approximation algorithms, whose performance ratio in relation to the optimal solution can be guaranteed and whose running time is a polynomial function of the problem instance size. We start by examining variants of the asymmetric kcenter problem. We demonstrate an O(log ∗ n)approximation algorithm
Correlation Clustering in Data Streams
"... In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a nodepartition such that the endpoints of negativeweight edges are typically in different clusters w ..."
Abstract
 Add to MetaCart
In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a nodepartition such that the endpoints of negativeweight edges are typically in different clusters whereas the endpoints of positiveweight edges are typically in the same cluster. We present polynomialtime, O(n ·polylog n)space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the “quality ” of a given nodepartition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. However the standard LP and SDP formulations are not obviously solvable in O(n ·polylog n)space. Our work presents spaceefficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. Note that the improved space and runningtime bounds achieved from streaming algorithms are also useful for offline settings such as MapReduce models.
by
, 2007
"... Let G be an undirected graph for which the standard MaxCut SDP relaxation achieves at least a c fraction of the total edge weight, 1 2 ≤ c ≤ 1. If the actual optimal cut for G is at most an s fraction of the total edge weight, we say that (c, s) is an SDP gap. We define the SDP gap curve GapSDP: [ ..."
Abstract
 Add to MetaCart
Let G be an undirected graph for which the standard MaxCut SDP relaxation achieves at least a c fraction of the total edge weight, 1 2 ≤ c ≤ 1. If the actual optimal cut for G is at most an s fraction of the total edge weight, we say that (c, s) is an SDP gap. We define the SDP gap curve GapSDP: [ 1 2
Improved Consensus Clustering via Linear Programming
"... We consider the problem of Consensus Clustering. Given a finite set of input clusterings over some data items, a consensus clustering is a partitioning of the items which matches as closely as possible the given input clusterings. The best exact approach to tackling this problem is by modelling it ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the problem of Consensus Clustering. Given a finite set of input clusterings over some data items, a consensus clustering is a partitioning of the items which matches as closely as possible the given input clusterings. The best exact approach to tackling this problem is by modelling it as a Boolean Integer Program (BIP). Unfortunately, the size of the BIP grows cubically in the number of data items, hence this method is applicable to only small sets of items. In this paper we show how to tackle the problem progressively, leading to much improved solution times and far less memory usage than previously. For the case where approximate clusterings are acceptable, we show a number of heuristic techniques for extracting good clusterings from the solutions of the linear relaxation of the BIP, and on several very large data sets we demonstrate much higher quality approximations than previously possible. When optimal solutions are desired, the problem is much harder, and we present some novel and existing techniques that can assist in finding candidate answers and proving the optimality thereof. For the first time we present optimal Consensus Clusterings for several complete, albeit small, data sets.
Divisive Correlation Clustering . . . of genes: detecting varying patterns in expression profiles
, 2008
"... Motivation: Cluster analysis (of geneexpression data) is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Various methods have been proposed for clustering geneexpression data. However most of these al ..."
Abstract
 Add to MetaCart
Motivation: Cluster analysis (of geneexpression data) is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Various methods have been proposed for clustering geneexpression data. However most of these algorithms have several shortcomings for geneexpression data clustering. In the present article, we focus on several shortcomings of conventional clustering algorithms and propose a new one that is able to produce better clustering solution than that produced by some others. Results: We present the Divisive Correlation Clustering Algorithm (DCCA) that is suitable for finding a group of genes having similar pattern of variation in their expression values. To detect clusters with high correlation and biological significance, we use the correlation clustering concept introduced by Bansal et al. Our proposed algorithm DCCA produces a clustering solution without taking number of clusters to be created as an input. DCCA uses the correlation matrix in such a way that all genes in a cluster have highest average correlation with genes in that cluster. To test the performance of the DCCA, we have applied DCCA and some wellknown conventional methods to an artificial dataset, and nine geneexpression datasets, and compared the performance of the algorithms. The clustering results of the DCCA are found to be more significantly relevant to the biological annotations than those of the other methods. All these facts show the superiority of the DCCA over some others for the clustering of geneexpression data. Availability: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from
of genes: detecting varying patterns in expression profiles
"... Motivation: Cluster analysis (of geneexpression data) is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Various methods have been proposed for clustering geneexpression data. However most of these a ..."
Abstract
 Add to MetaCart
Motivation: Cluster analysis (of geneexpression data) is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Various methods have been proposed for clustering geneexpression data. However most of these algorithms have several shortcomings for geneexpression data clustering. In the present article, we focus on several shortcomings of conventional clustering algorithms and propose a new one that is able to produce better clustering solution than that produced by some others. Results: We present the Divisive Correlation Clustering Algorithm (DCCA) that is suitable for finding a group of genes having similar pattern of variation in their expression values. To detect clusters with high correlation and biological significance, we use the correlation clustering concept introduced by Bansal et al. Our proposed algorithm DCCA produces a clustering solution without taking number of clusters to be created as an input. DCCA uses the correlation matrix in such a way that all genes in a cluster have highest average correlation with genes in that cluster. To test the performance of the DCCA, we have applied DCCA and some wellknown conventional methods to an artificial dataset, and nine geneexpression datasets, and compared the performance of the algorithms. The clustering results of the DCCA are found to be more significantly relevant to the biological annotations than those of the other methods. All these facts show the superiority of the DCCA over some others for the clustering of geneexpression data. Availability: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from
Results 1  10
of
32