Results 1  10
of
434
Proof verification and hardness of approximation problems
 IN PROC. 33RD ANN. IEEE SYMP. ON FOUND. OF COMP. SCI
, 1992
"... We show that every language in NP has a probablistic verifier that checks membership proofs for it using logarithmic number of random bits and by examining a constant number of bits in the proof. If a string is in the language, then there exists a proof such that the verifier accepts with probabilit ..."
Abstract

Cited by 822 (39 self)
 Add to MetaCart
(Show Context)
We show that every language in NP has a probablistic verifier that checks membership proofs for it using logarithmic number of random bits and by examining a constant number of bits in the proof. If a string is in the language, then there exists a proof such that the verifier accepts with probability 1 (i.e., for every choice of its random string). For strings not in the language, the verifier rejects every provided “proof " with probability at least 1/2. Our result builds upon and improves a recent result of Arora and Safra [6] whose verifiers examine a nonconstant number of bits in the proof (though this number is a very slowly growing function of the input length). As a consequence we prove that no MAX SNPhard problem has a polynomial time approximation scheme, unless NP=P. The class MAX SNP was defined by Papadimitriou and Yannakakis [82] and hard problems for this class include vertex cover, maximum satisfiability, maximum cut, metric TSP, Steiner trees and shortest superstring. We also improve upon the clique hardness results of Feige, Goldwasser, Lovász, Safra and Szegedy [42], and Arora and Safra [6] and shows that there exists a positive ɛ such that approximating the maximum clique size in an Nvertex graph to within a factor of N ɛ is NPhard.
A Threshold of ln n for Approximating Set Cover
 JOURNAL OF THE ACM
, 1998
"... Given a collection F of subsets of S = f1; : : : ; ng, set cover is the problem of selecting as few as possible subsets from F such that their union covers S, and max kcover is the problem of selecting k subsets from F such that their union has maximum cardinality. Both these problems are NPhar ..."
Abstract

Cited by 778 (5 self)
 Add to MetaCart
(Show Context)
Given a collection F of subsets of S = f1; : : : ; ng, set cover is the problem of selecting as few as possible subsets from F such that their union covers S, and max kcover is the problem of selecting k subsets from F such that their union has maximum cardinality. Both these problems are NPhard. We prove that (1 \Gamma o(1)) ln n is a threshold below which set cover cannot be approximated efficiently, unless NP has slightly superpolynomial time algorithms. This closes the gap (up to low order terms) between the ratio of approximation achievable by the greedy algorithm (which is (1 \Gamma o(1)) ln n), and previous results of Lund and Yannakakis, that showed hardness of approximation within a ratio of (log 2 n)=2 ' 0:72 lnn. For max kcover we show an approximation threshold of (1 \Gamma 1=e) (up to low order terms), under the assumption that P != NP .
Automatic Subspace Clustering of High Dimensional Data
 Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the or ..."
Abstract

Cited by 724 (12 self)
 Add to MetaCart
(Show Context)
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Selection of relevant features and examples in machine learning
 ARTIFICIAL INTELLIGENCE
, 1997
"... In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been mad ..."
Abstract

Cited by 590 (2 self)
 Add to MetaCart
In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare different methods. We close with some challenges for future work in this area.
Probabilistic checking of proofs: a new characterization of NP
 JOURNAL OF THE ACM
, 1998
"... We give a new characterization of NP: the class NP contains exactly those languages L for which membership proofs (a proof that an input x is in L) can be verified probabilistically in polynomial time using logarithmic number of random bits and by reading sublogarithmic number of bits from the proof ..."
Abstract

Cited by 437 (27 self)
 Add to MetaCart
We give a new characterization of NP: the class NP contains exactly those languages L for which membership proofs (a proof that an input x is in L) can be verified probabilistically in polynomial time using logarithmic number of random bits and by reading sublogarithmic number of bits from the proof. We discuss implications of this characterization; specifically, we show that approximating Clique and Independent Set, even in a very weak sense, is NPhard.
A Parallel Repetition Theorem
 SIAM Journal on Computing
, 1998
"... We show that a parallel repetition of any twoprover oneround proof system (MIP(2, 1)) decreases the probability of error at an exponential rate. No constructive bound was previously known. The constant in the exponent (in our analysis) depends only on the original probability of error and on the t ..."
Abstract

Cited by 378 (9 self)
 Add to MetaCart
(Show Context)
We show that a parallel repetition of any twoprover oneround proof system (MIP(2, 1)) decreases the probability of error at an exponential rate. No constructive bound was previously known. The constant in the exponent (in our analysis) depends only on the original probability of error and on the total number of possible answers of the two provers. The dependency on the total number of possible answers is logarithmic, which was recently proved to be almost the best possible [U. Feige and O. Verbitsky, Proc. 11th Annual IEEE Conference on Computational Complexity, IEEE Computer Society Press, Los Alamitos, CA, 1996, pp. 7076].
Approximation Algorithms for Connected Dominating Sets
 Algorithmica
, 1996
"... The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in the dominating set. We focus on the question of finding a connected dominating set of minimum size, whe ..."
Abstract

Cited by 376 (9 self)
 Add to MetaCart
(Show Context)
The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in the dominating set. We focus on the question of finding a connected dominating set of minimum size, where the graph induced by vertices in the dominating set is required to be connected as well. This problem arises in network testing, as well as in wireless communication. Two polynomial time algorithms that achieve approximation factors of O(H (\Delta)) are presented, where \Delta is the maximum degree, and H is the harmonic function. This question also arises in relation to the traveling tourist problem, where one is looking for the shortest tour such that each vertex is either visited, or has at least one of its neighbors visited. We study a generalization of the problem when the vertices have weights, and give an algorithm which achieves a performance ratio of 3 ln n. We also consider the ...
Routing in AdHoc Networks Using Minimum Connected Dominating Sets
, 1997
"... this paper, we impose a virtual backbone structure on the adhoc network, in order to support unicast, multicast, and faulttolerant routing within the adhoc network. This virtual backbone differs from the wired backbone of cellular networks in two key ways: (a) it may change as nodes move, and (b) ..."
Abstract

Cited by 303 (3 self)
 Add to MetaCart
this paper, we impose a virtual backbone structure on the adhoc network, in order to support unicast, multicast, and faulttolerant routing within the adhoc network. This virtual backbone differs from the wired backbone of cellular networks in two key ways: (a) it may change as nodes move, and (b) it is not used primarily for routing packets or flows, but only for computing and updating routes. The primary routes for packets and flows are still computed by a shortestpaths computation; the virtual backbone can, if necessary, provide backup routes to handle interim failures. Because of the dynamic nature of the virtual backbone, our approach splits the routing problem into two levels: (a) find and update the virtual backbone, and (b) then find and update routes. The key contribution of this paper is to describe several alternatives for the first part of finding and updating the virtual backbone. In this paper, to keep the virtual backbone as small as possible, we use an approximation to the minimum connected dominating set (MCDS) of the adhoc network topology as the virtual backbone. The hosts in the MCDS maintain local copies of the global topology of the network, along with shortest paths between all pairs of nodes. We note that the concept of a virtual backbone is not new. Ephremides et al.
When trees collide: An approximation algorithm for the generalized Steiner problem on networks
, 1994
"... We give the first approximation algorithm for the generalized network Steiner problem, a problem in network design. An instance consists of a network with linkcosts and, for each pair fi; jg of nodes, an edgeconnectivity requirement r ij . The goal is to find a minimumcost network using the a ..."
Abstract

Cited by 256 (39 self)
 Add to MetaCart
We give the first approximation algorithm for the generalized network Steiner problem, a problem in network design. An instance consists of a network with linkcosts and, for each pair fi; jg of nodes, an edgeconnectivity requirement r ij . The goal is to find a minimumcost network using the available links and satisfying the requirements. Our algorithm outputs a solution whose cost is within 2dlog 2 (r + 1)e of optimal, where r is the highest requirement value. In the course of proving the performance guarantee, we prove a combinatorial minmax approximate equality relating minimumcost networks to maximum packings of certain kinds of cuts. As a consequence of the proof of this theorem, we obtain an approximation algorithm for optimally packing these cuts; we show that this algorithm has application to estimating the reliability of a probabilistic network.
Greedy strikes back: Improved facility location algorithms
 Journal of Algorithms
, 1999
"... A fundamental facility location problem is to choose the location of facilities, such as industrial plants and warehouses, to minimize the cost of satisfying the demand for some commodity. There are associated costs for locating the facilities, as well as transportation costs for distributing the co ..."
Abstract

Cited by 221 (11 self)
 Add to MetaCart
(Show Context)
A fundamental facility location problem is to choose the location of facilities, such as industrial plants and warehouses, to minimize the cost of satisfying the demand for some commodity. There are associated costs for locating the facilities, as well as transportation costs for distributing the commodities. We assume that the transportation costs form a metric. This problem is commonly referred to as the uncapacitated facility location (UFL) problem. Applications to bank account location and clustering, as well as many related pieces of work, are discussed by Cornuejols, Nemhauser and Wolsey [2]. Recently, the first constant factor approximation algorithm for this problem was obtained by Shmoys, Tardos and Aardal [16]. We show that a simple greedy heuristic combined with the algorithm by Shmoys, Tardos and Aardal, can be used to obtain an approximation guarantee of 2.408. We discuss a few variants of the problem, demonstrating better approximation factors for restricted versions of the problem. We also show that the problem is Max SNPhard. However, the inapproximability constants derived from the Max SNP hardness are very close to one. By relating this problem to Set Cover, we prove a lower bound of 1.463 on the best possible approximation ratio assuming NP / ∈ DT IME[n O(log log n)]. 1