Results 1  10
of
498
Correlation Clustering
 MACHINE LEARNING
, 2002
"... We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as mu ..."
Abstract

Cited by 329 (4 self)
 Add to MetaCart
(Show Context)
We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of # edges within clusters, plus the number of edges between clusters (equivalently, minimizes the number of disagreements: the number of edges inside clusters plus the number of # edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function # learned from past data, and the goal is to partition the current set of documents in a way that correlates with # as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting
Polynomial Time Approximation Schemes for Dense Instances of NPHard Problems
, 1995
"... We present a unified framework for designing polynomial time approximation schemes (PTASs) for "dense" instances of many NPhard optimization problems, including maximum cut, graph bisection, graph separation, minimum kway cut with and without specified terminals, and maximum 3satisfiabi ..."
Abstract

Cited by 195 (32 self)
 Add to MetaCart
We present a unified framework for designing polynomial time approximation schemes (PTASs) for "dense" instances of many NPhard optimization problems, including maximum cut, graph bisection, graph separation, minimum kway cut with and without specified terminals, and maximum 3satisfiability. By dense graphs we mean graphs with minimum degree &Omega;(n), although our algorithms solve most of these problems so long as the average degree is &Omega;(n). Denseness for nongraph problems is defined similarly. The unified framework begins with the idea of exhaustive sampling: picking a small random set of vertices, guessing where they go on the optimum solution, and then using their placement to determine the placement of everything else. The approach then develops into a PTAS for approximating certain smooth integer programs where the objective function and the constraints are "dense" polynomials of constant degree.
Efficient Testing of Large Graphs
 Combinatorica
"... Let P be a property of graphs. An test for P is a randomized algorithm which, given the ability to make queries whether a desired pair of vertices of an input graph G with n vertices are adjacent or not, distinguishes, with high probability, between the case of G satisfying P and the case that it h ..."
Abstract

Cited by 187 (49 self)
 Add to MetaCart
(Show Context)
Let P be a property of graphs. An test for P is a randomized algorithm which, given the ability to make queries whether a desired pair of vertices of an input graph G with n vertices are adjacent or not, distinguishes, with high probability, between the case of G satisfying P and the case that it has to be modified by adding and removing more than n 2 edges to make it satisfy P . The property P is called testable, if for every there exists an test for P whose total number of queries is independent of the size of the input graph. Goldreich, Goldwasser and Ron [8] showed that certain graph properties admit an test. In this paper we make a first step towards a logical characterization of all testable graph properties, and show that properties describable by a very general type of coloring problem are testable. We use this theorem to prove that first order graph properties not containing a quantifier alternation of type "89" are always testable, while we show that some properties containing this alternation are not. Our results are proven using a combinatorial lemma, a special case of which, that may be of independent interest, is the following. A graph H is called unavoidable in G if all graphs that differ from G in no more than jGj 2 places contain an induced copy of H . A graph H is called abundant in G if G contains at least jGj jHj induced copies of H. If H is unavoidable in G then it is also ( ; jHj)abundant.
The art of uninformed decisions: A primer to property testing
 Science
, 2001
"... Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size. ..."
Abstract

Cited by 157 (25 self)
 Add to MetaCart
(Show Context)
Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size.
Quick Approximation to Matrices and Applications
, 1998
"... We give algorithms to find the following simply described approximation to a given matrix. Given an m \Theta n matrix A with entries between say1 and 1, and an error parameter ffl between 0 and 1, we find a matrix D (implicitly) which is the sum of O(1=ffl 2 ) simple rank 1 matrices so that the ..."
Abstract

Cited by 151 (7 self)
 Add to MetaCart
We give algorithms to find the following simply described approximation to a given matrix. Given an m \Theta n matrix A with entries between say1 and 1, and an error parameter ffl between 0 and 1, we find a matrix D (implicitly) which is the sum of O(1=ffl 2 ) simple rank 1 matrices so that the sum of entries of any submatrix (among the 2 m+n ) of (A \Gamma D) is at most fflmn in absolute value. Our algorithm takes time dependent only on ffl and the allowed probability of failure (not on m;n). We draw on two lines of research to develop the algorithms: one is built around the fundamental Regularity Lemma of Szemerédi in Graph Theory and the constructive version of Alon, Duke, Leffman, Rödl and Yuster. The second one is from the papers of Arora, Karger and Karpinski, Fernandez de la Vega and most directly Goldwasser, Goldreich and Ron who develop approximation algorithms for a set of graph problems, typical of which is the maximum cut problem. ?From our matrix approximation, the...
Property Testing in Bounded Degree Graphs
 Algorithmica
, 1997
"... We further develop the study of testing graph properties as initiated by Goldreich, Goldwasser and Ron. Whereas they view graphs as represented by their adjacency matrix and measure distance between graphs as a fraction of all possible vertex pairs, we view graphs as represented by boundedlength in ..."
Abstract

Cited by 133 (38 self)
 Add to MetaCart
We further develop the study of testing graph properties as initiated by Goldreich, Goldwasser and Ron. Whereas they view graphs as represented by their adjacency matrix and measure distance between graphs as a fraction of all possible vertex pairs, we view graphs as represented by boundedlength incidence lists and measure distance between graphs as a fraction of the maximum possible number of edges. Thus, while the previous model is most appropriate for the study of dense graphs, our model is most appropriate for the study of boundeddegree graphs. In particular, we present randomized algorithms for testing whether an unknown boundeddegree graph is connected, kconnected (for k ? 1), planar, etc. Our algorithms work in time polynomial in 1=ffl, always accept the graph when it has the tested property, and reject with high probability if the graph is fflaway from having the property. For example, the 2Connectivity algorithm rejects (w.h.p.) any Nvertex ddegree graph for which more ...
Sampling from large matrices: an approach through geometric functional analysis
 Journal of the ACM
, 2006
"... Abstract. We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = �A�2 F /�A�22 is the numerical rank of A. The numerical rank is always bounded by, a ..."
Abstract

Cited by 128 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = �A�2 F /�A�22 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing lowrank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cutnorm of random submatrices of A. The result for the cutnorm yields a slight improvement on the best known sample complexity for an approximation algorithm for MAX2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operatorvalued random variables. 1.
Using Output Codes to Boost Multiclass Learning Problems
 MACHINE LEARNING: PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE, 1997 (ICML97)
, 1997
"... This paper describes a new technique for solving multiclass learning problems by combining Freund and Schapire's boosting algorithm with the main ideas of Dietterich and Bakiri's method of errorcorrecting output codes (ECOC). Boosting is a general method of improving the accuracy of a giv ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
This paper describes a new technique for solving multiclass learning problems by combining Freund and Schapire's boosting algorithm with the main ideas of Dietterich and Bakiri's method of errorcorrecting output codes (ECOC). Boosting is a general method of improving the accuracy of a given base or "weak" learning algorithm. ECOC is a robust method of solving multiclass learning problems by reducing to a sequence of twoclass problems. We show that our new hybrid method has advantages of both: Like ECOC, our method only requires that the base learning algorithm work on binarylabeled data. Like boosting, we prove that the method comes with strong theoretical guarantees on the training and generalization error of the final combined hypothesis assuming only that the base learning algorithm perform slightly better than random guessing. Although previous methods were known for boosting multiclass problems, the new method may be significantly faster and require less programming effort in creating the base
learning algorithm. We also compare the new algorithm
experimentally to other voting methods.
A characterization of the (natural) graph properties testable with onesided error
 Proc. of FOCS 2005
, 2005
"... The problem of characterizing all the testable graph properties is considered by many to be the most important open problem in the area of propertytesting. Our main result in this paper is a solution of an important special case of this general problem; Call a property tester oblivious if its decis ..."
Abstract

Cited by 111 (19 self)
 Add to MetaCart
(Show Context)
The problem of characterizing all the testable graph properties is considered by many to be the most important open problem in the area of propertytesting. Our main result in this paper is a solution of an important special case of this general problem; Call a property tester oblivious if its decisions are independent of the size of the input graph. We show that a graph property P has an oblivious onesided error tester, if and only if P is (almost) hereditary. We stress that any ”natural ” property that can be tested (either with onesided or with twosided error) can be tested by an oblivious tester. In particular, all the testers studied thus far in the literature were oblivious. Our main result can thus be considered as a precise characterization of the ”natural” graph properties, which are testable with onesided error. One of the main technical contributions of this paper is in showing that any hereditary graph property can be tested with onesided error. This general result contains as a special case all the previous results about testing graph properties with onesided error. These include the results of [20] and [5] about testing kcolorability, the characterization of [21] of the graphpartitioning problems that are testable with onesided error, the induced vertex colorability properties of [3], the induced edge colorability properties of [14], a transformation from twosided to onesided error testing [21], as well as a recent result about testing monotone graph properties [10]. More importantly, as a special case of our main result, we infer that some of the most well studied graph properties, both in graph theory and computer science, are testable with onesided error. Some of these properties are the well known graph properties of being Perfect, Chordal, Interval, Comparability, Permutation and more. None of these properties was previously known to be testable. 1
Clustering Large Graphs via the Singular Value Decomposition
 MACHINE LEARNING
, 2004
"... We consider the problem of partitioning a set of m points in the ndimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the ..."
Abstract

Cited by 109 (2 self)
 Add to MetaCart
We consider the problem of partitioning a set of m points in the ndimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the kmeans clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NPhard even for k 2, and we consider a continuous relaxation of this discrete problem: find the kdimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the n matrix A that represents the m points; this solution can be used to get a 2approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right. Finally, we