Results 1  10
of
25
Random sampling from a search engine’s index
 In Proceedings of the 15th International World Wide Web Conference (WWW
, 2006
"... We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search engine’s public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines. ..."
Abstract

Cited by 94 (6 self)
 Add to MetaCart
We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search engine’s public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines. The technique of Bharat and Broder suffers from a wellrecorded bias: it favors long documents. In this paper we introduce two novel sampling algorithms: a lexiconbased algorithm and a random walk algorithm. Our algorithms produce biased samples, but each sample is accompanied by a weight, which represents its bias. The samples, in conjunction with the weights, are then used to simulate nearuniform samples. To this end, we resort to four wellknown Monte Carlo simulation methods: rejection sampling, importance sampling, the MetropolisHastings algorithm, and the Maximum Degree method. The limited access to search engines force our algorithms to use bias weights that are only “approximate”. We characterize analytically the effect of approximate bias weights on Monte Carlo methods and conclude that our algorithms are guaranteed to produce nearuniform samples from the search engine’s corpus. Our study of approximate Monte Carlo methods could be of independent interest. Experiments on a corpus of 2.4 million documents substantiate our analytical findings and show that our algorithms do not have significant bias towards long documents. We use our algorithms to collect comparative statistics about the corpora of the Google, MSN Search, and Yahoo! search engines.
Efficient search engine measurements
 In Proc. 16th WWW
, 2007
"... We address the problem of measuring global quality metrics of search engines, like corpus size, index freshness, and density of duplicates in the corpus. The recently proposed estimators for such metrics [2, 6] suffer from significant bias and/or poor performance, due to inaccurate approximation of ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
(Show Context)
We address the problem of measuring global quality metrics of search engines, like corpus size, index freshness, and density of duplicates in the corpus. The recently proposed estimators for such metrics [2, 6] suffer from significant bias and/or poor performance, due to inaccurate approximation of the so called “document degrees”. We present two new estimators that are able to overcome the bias introduced by approximate degrees. Our estimators are based on a careful implementation of an approximate importance sampling procedure. Comprehensive theoretical and empirical analysis of the estimators demonstrates that they have essentially no bias even in situations where document degrees are poorly approximated. Building on an idea from [6], we discuss Rao Blackwellization as a generic method for reducing variance in search engine estimators. We show that RaoBlackwellizing our estimators results in significant performance improvements, while not compromising accuracy.
Statistical Blockade: A Novel Method for Very Fast Monte Carlo Simulation of Rare Circuit Events
 and Its Application,’’ Proc. Design, Automation and Test in Europe Conf. (DATE 07), IEEE CS
, 2007
"... Circuit reliability under statistical process variation is an area of growing concern. For highly replicated circuits such as SRAMs and flip flops, a rare statistical event for one circuit may induce a notsorare system failure. Existing techniques perform poorly when tasked to generate both effici ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
(Show Context)
Circuit reliability under statistical process variation is an area of growing concern. For highly replicated circuits such as SRAMs and flip flops, a rare statistical event for one circuit may induce a notsorare system failure. Existing techniques perform poorly when tasked to generate both efficient sampling and sound statistics for these rare events. Statistical Blockade is a novel Monte Carlo technique that allows us to efficiently filter—to block—unwanted samples insufficiently rare in the tail distributions we seek. The method synthesizes ideas from data mining and Extreme Value Theory, and shows speedups of 10X100X over standard Monte Carlo. 1.
Mining Search Engine Query Logs via Suggestion Sampling
, 2008
"... Many search engines and other web applications suggest autocompletions as the user types in a query. The suggestions are generated from hidden underlying databases, such as query logs, directories, and lexicons. These databases consist of interesting and useful information, but they are typically n ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Many search engines and other web applications suggest autocompletions as the user types in a query. The suggestions are generated from hidden underlying databases, such as query logs, directories, and lexicons. These databases consist of interesting and useful information, but they are typically not directly accessible. In this paper we describe two algorithms for sampling suggestions using only the public suggestion interface. One of the algorithms samples suggestions uniformly at random and the other samples suggestions proportionally to their popularity. These algorithms can be used to mine the hidden suggestion databases. Example applications include comparison of popularity of given keywords within a search engine’s query log, estimation of the volume of commerciallyoriented queries in a query log, and evaluation of the extent to which a search engine exposes its users to negative content. Our algorithms employ Monte Carlo methods in order to obtain unbiased samples from the suggestion database. Empirical analysis using a publicly available query log demonstrates that our algorithms are efficient and accurate. Results of experiments on two major suggestion services are also provided.
30.1 SRAM Parametric Failure Analysis
"... With aggressive technology scaling, SRAM design has been seriously challenged by the difficulties in analyzing rare failure events. In this paper we propose to create statistical performance models with accuracy sufficient to facilitate probability extraction for SRAM parametric failures. A piecewis ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
With aggressive technology scaling, SRAM design has been seriously challenged by the difficulties in analyzing rare failure events. In this paper we propose to create statistical performance models with accuracy sufficient to facilitate probability extraction for SRAM parametric failures. A piecewise modeling technique is first proposed to capture the performance metrics over the large variation space. A controlled sampling scheme and a nested Monte Carlo analysis method are then applied for the failure probability extraction at celllevel and arraylevel respectively. Our 65nm SRAM example demonstrates that by combining the piecewise model and the fast probability extraction methods, we have significantly accelerated the SRAM failure analysis.
Power grid simulation via efficient samplingbased sensitivity analysis and hierarchical symbolic relaxation
 In DAC
, 2005
"... Onchip supply networks are playing an increasingly important role for modern nanometerscale designs. However, the ever growing sizes of power grids make the analysis problem extremely difficult thereby introducing severe challenges in design and optimization. The inherent analysis complexity calls ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Onchip supply networks are playing an increasingly important role for modern nanometerscale designs. However, the ever growing sizes of power grids make the analysis problem extremely difficult thereby introducing severe challenges in design and optimization. The inherent analysis complexity calls for innovations in simulation techniques that must provide appropriate accuracy, efficiency as well as the tradeoff thereof to aid design verification and optimization. In this paper, we first present a samplingbased sensitivity analysis by employing the notation of importance sampling in a Monte Carlo based circuit simulation framework. This technique allows the extraction of multiparameter sensitivities for the node voltages of interest in the same Monte Carlo runs that are used for computing the nominal voltage values. For more efficient nonstructured wholegrid solution approaches, we further introduce a new direct solution method by embedding symbolic relaxation steps in a hierarchical fashion. As a direct method, the proposed hierarchical symbolic relaxation is suitable to both dc and transient analyses. Circuit examples are included to demonstrate the efficacy of the proposed techniques.
Random Sampling from a Search Engine’s Corpus ∗
, 2006
"... We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search engine’s public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search engine’s public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines. The technique of Bharat and Broder suffers from a wellrecorded bias: it favors long documents. In this paper we introduce two novel sampling algorithms: a lexiconbased algorithm and a random walk algorithm. Our algorithms produce biased samples, but each sample is accompanied by a weight, which represents its bias. The samples, in conjunction with the weights, are then used to simulate nearuniform samples. To this end, we resort to four wellknown Monte Carlo simulation methods: rejection sampling, importance sampling, the MetropolisHastings algorithm, and the Maximum Degree method. The limited access to search engines force our algorithms to use bias weights that are only “approximate”. We characterize analytically the effect of approximate bias weights on Monte Carlo methods and conclude that our algorithms are guaranteed to produce nearuniform samples from the search engine’s corpus. Our study of approximate Monte Carlo methods could be of independent interest. Experiments on a corpus of 2.4 million documents substantiate our analytical findings and show that our algorithms do not have significant bias towards long documents. We use our algorithms to collect fresh comparative statistics about the corpora of the Google, MSN Search, and Yahoo! search engines. 1
Recursive statistical blockade: an enhanced technique for rare event simulation with application to SRAM circuit design
 in VLSI Design Conference, 2008
"... Circuit reliability under statistical process variation is an area of growing concern. For highly replicated circuits such as SRAMs and flip flops, a rare statistical event for one circuit may induce a notsorare system failure. The authors of [1] proposed Statistical Blockade as a Monte Carlo tech ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Circuit reliability under statistical process variation is an area of growing concern. For highly replicated circuits such as SRAMs and flip flops, a rare statistical event for one circuit may induce a notsorare system failure. The authors of [1] proposed Statistical Blockade as a Monte Carlo technique that allows us to efficiently filter—to block—unwanted samples insufficiently rare in the tail distributions we seek. However, there are significant practical problems with the technique. In this work, we show common scenarios in SRAM design where these problems render Statistical Blockade ineffective. We then propose significant extensions to make Statistical Blockade practically usable in these common scenarios. We show speedups of 102+ over standard Statistical Blockade and 104+ over standard Monte Carlo, for an SRAM cell in an industrial 90nm technology. 1.
Counterfactual Reasoning and Learning Systems
, 2013
"... This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select the changes that would have improved the syste ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select the changes that would have improved the system performance. This work is illustrated by experiments carried out on the ad placement system associated with the Bing search engine.