Results 1  10
of
2,000
Randomized Algorithms
, 1995
"... Randomized algorithms, once viewed as a tool in computational number theory, have by now found widespread application. Growth has been fueled by the two major benefits of randomization: simplicity and speed. For many applications a randomized algorithm is the fastest algorithm available, or the simp ..."
Abstract

Cited by 2210 (37 self)
 Add to MetaCart
Randomized algorithms, once viewed as a tool in computational number theory, have by now found widespread application. Growth has been fueled by the two major benefits of randomization: simplicity and speed. For many applications a randomized algorithm is the fastest algorithm available, or the simplest, or both. A randomized algorithm is an algorithm that uses random numbers to influence the choices it makes in the course of its computation. Thus its behavior (typically quantified as running time or quality of output) varies from
The space complexity of approximating the frequency moments
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1996
"... The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, ..."
Abstract

Cited by 855 (12 self)
 Add to MetaCart
(Show Context)
The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, it turns out that the numbers F0, F1 and F2 can be approximated in logarithmic space, whereas the approximation of Fk for k ≥ 6 requires nΩ(1) space. Applications to data bases are mentioned as well.
Mining Generalized Association Rules
, 1995
"... We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (isa hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy th ..."
Abstract

Cited by 577 (7 self)
 Add to MetaCart
(Show Context)
We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (isa hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets isa outerwear isa clothes, we may infer a rule that "people who buy outerwear tend to buy shoes". This rule may hold even if rules that "people who buy jackets tend to buy shoes", and "people who buy clothes tend to buy shoes" do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining association rules on these "extended transactions ". However, this "Basic" algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one reallife dataset). We also present a new interes...
The Capacity of LowDensity ParityCheck Codes Under MessagePassing Decoding
, 2001
"... In this paper, we present a general method for determining the capacity of lowdensity paritycheck (LDPC) codes under messagepassing decoding when used over any binaryinput memoryless channel with discrete or continuous output alphabets. Transmitting at rates below this capacity, a randomly chos ..."
Abstract

Cited by 569 (9 self)
 Add to MetaCart
(Show Context)
In this paper, we present a general method for determining the capacity of lowdensity paritycheck (LDPC) codes under messagepassing decoding when used over any binaryinput memoryless channel with discrete or continuous output alphabets. Transmitting at rates below this capacity, a randomly chosen element of the given ensemble will achieve an arbitrarily small target probability of error with a probability that approaches one exponentially fast in the length of the code. (By concatenating with an appropriate outer code one can achieve a probability of error that approaches zero exponentially fast in the length of the code with arbitrarily small loss in rate.) Conversely, transmitting at rates above this capacity the probability of error is bounded away from zero by a strictly positive constant which is independent of the length of the code and of the number of iterations performed. Our results are based on the observation that the concentration of the performance of the decoder around its average performance, as observed by Luby et al. [1] in the case of a binarysymmetric channel and a binary messagepassing algorithm, is a general phenomenon. For the particularly important case of beliefpropagation decoders, we provide an effective algorithm to determine the corresponding capacity to any desired degree of accuracy. The ideas presented in this paper are broadly applicable and extensions of the general method to lowdensity paritycheck codes over larger alphabets, turbo codes, and other concatenated coding schemes are outlined.
Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics
, 1996
"... For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has ..."
Abstract

Cited by 548 (13 self)
 Add to MetaCart
For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has run for M steps, with M sufficiently large, the distribution governing the state of the chain approximates the desired distribution. Unfortunately it can be difficult to determine how large M needs to be. We describe a simple variant of this method that determines on its own when to stop, and that outputs samples in exact accordance with the desired distribution. The method uses couplings, which have also played a role in other sampling schemes; however, rather than running the coupled chains from the present into the future, one runs from a distant point in the past up until the present, where the distance into the past that one needs to go is determined during the running of the al...
On the Resemblance and Containment of Documents
 In Compression and Complexity of Sequences (SEQUENCES’97
, 1997
"... Given two documents A and B we define two mathematical notions: their resemblance r(A, B)andtheircontainment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection probl ..."
Abstract

Cited by 499 (7 self)
 Add to MetaCart
(Show Context)
Given two documents A and B we define two mathematical notions: their resemblance r(A, B)andtheircontainment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that can be done independently for each document. Furthermore, the resemblance can be evaluated using a fixed size sample for each document.
Sampling Large Databases for Association Rules
, 1996
"... Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the data ..."
Abstract

Cited by 465 (4 self)
 Add to MetaCart
Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the database activity considerably. Theidea is to pick a random sample, to ndusingthis sample all association rules that probably hold in the whole database, and then to verify the results with the restofthe database. The algorithms thus produce exact association rules, not approximations based on a sample. The approach is, however, probabilistic, and inthose rare cases where our sampling method does not produce all association rules, the missing rules can be found inasecond pass. Our experiments show that the proposed algorithms can nd association rules very e ciently in only onedatabase pass. 1
A PolynomialTime Approximation Algorithm for the Permanent of a Matrix with NonNegative Entries
 JOURNAL OF THE ACM
, 2004
"... We present a polynomialtime randomized algorithm for estimating the permanent of an arbitrary n ×n matrix with nonnegative entries. This algorithm—technically a “fullypolynomial randomized approximation scheme”—computes an approximation that is, with high probability, within arbitrarily small spec ..."
Abstract

Cited by 436 (25 self)
 Add to MetaCart
We present a polynomialtime randomized algorithm for estimating the permanent of an arbitrary n ×n matrix with nonnegative entries. This algorithm—technically a “fullypolynomial randomized approximation scheme”—computes an approximation that is, with high probability, within arbitrarily small specified relative error of the true value of the permanent.
A Random Graph Model for Massive Graphs
 STOC 2000
, 2000
"... We propose a random graph model which is a special case of sparse random graphs with given degree sequences. This model involves only a small number of parameters, called logsize and loglog growth rate. These parameters capture some universal characteristics of massive graphs. Furthermore, from t ..."
Abstract

Cited by 414 (26 self)
 Add to MetaCart
We propose a random graph model which is a special case of sparse random graphs with given degree sequences. This model involves only a small number of parameters, called logsize and loglog growth rate. These parameters capture some universal characteristics of massive graphs. Furthermore, from these parameters, various properties of the graph can be derived. For example, for certain ranges of the parameters, we will compute the expected distribution of the sizes of the connected components which almost surely occur with high probability. We will illustrate the consistency of our model with the behavior of some massive graphs derived from data in telecommunications. We will also discuss the threshold function, the giant component, and the evolution of random graphs in this model.
Concentration Of Measure And Isoperimetric Inequalities In Product Spaces
, 1995
"... . The concentration of measure phenomenon in product spaces roughly states that, if a set A in a product# N of probability spaces has measure at least one half, "most" of the points of# N are "close" to A. We proceed to a systematic exploration of this phenomenon. The meaning ..."
Abstract

Cited by 383 (4 self)
 Add to MetaCart
. The concentration of measure phenomenon in product spaces roughly states that, if a set A in a product# N of probability spaces has measure at least one half, "most" of the points of# N are "close" to A. We proceed to a systematic exploration of this phenomenon. The meaning of the word "most" is made rigorous by isoperimetrictype inequalities that bound the measure of the exceptional sets. The meaning of the work "close" is defined in three main ways, each of them giving rise to related, but di#erent inequalities. The inequalities are all proved through a common scheme of proof. Remarkably, this simple approach not only yields qualitatively optimal results, but, in many cases, captures near optimal numerical constants. A large number of applications are given, in particular to Percolation, Geometric Probability, Probability in Banach Spaces, to demonstrate in concrete situations the extremely wide range of application of the abstract tools. AMS Classification numbers: Primary 60E15, 28A35, 60G99; Secondary 60G15, 68C15. Typeset by A M ST E X 1 2 M. TALAGRAND Table of Contents I.