68 citations found. Retrieving documents...
K. Bharat and A. Broder. 1998. A Technique for Measuring the Relative Size and Overlap of the Public Web Search Engines. In Proceedings of WWW7, pages 379--388.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Web Document Clustering Using Hyperlink Structures - He, Zha, Ding, Simon (2001)   (1 citation)  (Correct)

....of clusters obtained for several broad topic queries. 2. Organizing query result sets for search engines. The World Wide Web is a rich source of information. It has grown to more than one billion unique web documents [9] and continues to grow roughly at a rate of one million documents per day [4]. While the users are enjoying the large amount of information available on the World Wide Web, it poses a problem to both the users and the search engines. The problem lies in the sheer quantity of the web documents. When users submit a query, the search engine returns the retrieved web documents ....

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. Proc. of the 7th World-Wide Web Conference (WWW7), 1998.


Metasearch: Data Fusion for Document Retrieval - Montague   (Correct)

....database size of the metasearch engine it can be as small as the intersection of the constituent databases for a poor fusion algorithm, or as large as the union for a good one. Interestingly, the di erence between the database union and intersection can be large. In 1997, Bharat and Broder [8] estimated the size of the indexable portion of the Web at 200 million pages. Of these, an estimated 160 million pages were in the union of the databases of HotBot, AltaVista, Excite, and Infoseek. However, the estimated intersection of their databases was less than 1.4 of their union only about ....

K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public Web search engines. In The 7th WWW Conference, pages 379-388, 1998.


An Ontology-Based Knowledge Management Platform - Aldea..   (Correct)

....journals, attending conferences or, quite often, by hearsay. Instead, the web (as well as other information resources) offers scattered and distributed information that is impossible to analyse manually. It has been estimated that the World Wide Web contains more than 300 million static objects [Bharat and Broder, 1998] accessible through 100 million internet hosts [Cameron, 2002] In addition, organisations have intranets amounting to several million pages. The large majority of these documents are weakly structured. These repositories are usually searched by means of keywordbased search engines allowing a ....

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engine. In 7th WWW Conference, Brisbane, Australina, 1998.


Web Information Retrieval - an Algorithmic Perspective - Henzinger (2000)   (Correct)

....(also called spider or robot) collects documents by recursively fetching links from a set of starting pages. Each crawler has di erent policies with respect to which links are followed, how deeply various sites are explored, etc. Thus, the pages indexed by various search engines di er considerably [27, 2]. The indexer processes the pages collected by the crawler. First it decides which of them to index. For example, it might discard duplicate documents. Then it builds various data structures representing the pages. Most search engines build some variant of an inverted index data structure (see ....

K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public Web search engines. In Proceedings of the Seventh International World Wide Web Conference 1998, pages 379-388.


A Survey On Web Information Retrieval Technologies - Lan Huang Computer (2000)   (2 citations)  (Correct)

....opinions explicitly. 25 5 Measuring the Web There are many questions related to the statistics of Web itself . how big is the web How fast does the Web grow What is the proportion of pages in French, XML, etc How do various search engines compare Bharat and Broder described a technique in [4] to compare the coverage of different search engines. They measure search engine coverage and overlap through random queries. The idea behind their approach is simple: Consider sets A and B of size 4 and 12 units respectively. Let the size of their intersection A n B be 2. Suppose that we do not ....

....Web, since the Web is a directed graph with highly non uniform degrees and has many small cuts. Little is known about the graph structure of the Web. Hence the length of the random walks required to generate a distribution close to the uniform may be extremely large. The solution to this used in [4] is to make use of the search engines themselves to generate page samples. First they build a lexicon of the Web and their associated frequencies on the Web. A random URL is generated from a search engine by firing a random query at it and selecting a random URL from the result set. Both ....

[Article contains additional citation context not shown here]

K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of the Seventh International World Wide Confer- ence[WWW7], pages 379-388.


The Design And Evaluation Of Web Prefetching and Caching Techniques - Davison (2002)   (1 citation)  (Correct)

....of pages, no one entity has a complete enumeration. Even the major search engines only know of a fraction of the Web, and the pages retained in those datasets are biased samples of the Web [LG98c, LG99] As a result, the unbiased selection of a random subset of the Web is an open question [BB98] although some progress has been made [HHMN00] Accordingly, the data set used as the starting points in this chapter were selected at random from a subset of the Web. We randomly selected 100,000 pages out of the approximately 3 million pages that our local research search engine (DiscoWeb ....

Krishna Bharat and Andrei Broder. A technique for measuring the relative size and overlap of public Web search engines. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, April 1998. 272


Autonomous Cooperating Web Crawlers - McLearn (2002)   (2 citations)  (Correct)

....corresponding to non web crawlers. damental level. Studies in 1997 indicate a small region (1.4 ) of crawling overlap common to major web crawlers at the time. Further analysis showed that pairwise crawling overlap regions between four major search engines of the time ranged from 0.24 to 4. 08 [3]. Of course, since this study, the results have very likely been significantly altered. Over the years, more and more crawlers have been released on the web. Many web crawlers now make use of distributed or parallel technology in e#orts to increase a search engine s web coverage. Web crawlers ....

Krishna Bharat and Andrei Broder. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. In Proceedings of the 7th International World Wide Web Conference, pages 379--388, Brisbane, Australia, April 1998.


A Clustering Interface For Web Search Results In Polish And English - Weiss   (Correct)

.... a scientific articles searcher, FTP Lycos [FtpLycos, el] which indexes files on FTP (file transfer protocol) servers or Moreover [Moreover, el] collecting only the freshest articles from newspapers. Due to the fast growth of the Internet (estimated to be 25 millions pages per month in 1998 [Bharat and Broder, 98] an ordinary Web user is yet again lost, this time in the abundance of searching capabilities. As Mamma meta search engine combines several popular services noticed by Eric Salberg: no single Web search service can provide all the germane [to the query] information, but] perhaps ....

Bharat K., Broder A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines, In Proceedings of the 7th World Wide Web Conference, Brisbane, Australia, August 1998.


Efficient URL Caching for World Wide Web Crawling - Broder, Najork, Wiener (2003)   (1 citation)  Self-citation (Broder)   (Correct)

No context found.

K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of the 7th World Wide Web Conference, pages 379--388, 1998.


A Figure of Merit for the Evaluation of Web-Corpus Randomness - Ciaramita, Baroni (2006)   (Correct)

No context found.

K. Bharat and A. Broder. 1998. A Technique for Measuring the Relative Size and Overlap of the Public Web Search Engines. In Proceedings of WWW7, pages 379--388.


Downloading Hidden Web Content - Ntoulas, Zerfos, Cho (2004)   (Correct)

No context found.

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In WWW, 1998.


The Indexable Web is More than 11.5 Billion Pages - Gulli, Signorini (2005)   (Correct)

No context found.

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In 7th WWW, 1998.


A Personalized Search Engine Based on Web-Snippet.. - Ferragina, Gulli (2005)   (Correct)

No context found.

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In WWW7, 1998. 809


Analysis of Web Caching Architectures: Hierarchical.. - Rodriguez, Spanner.. (2001)   (13 citations)  (Correct)

No context found.

K. Bharat and A. Broder, "A technique for measuring the relative size and overlap of public web search engines," in Proc. 7th Int. WWW Conf., Brisbane, Australia, Apr. 1997, pp. 379--388.


Methods for Sampling Pages Uniformly from the World.. - Rusmevichientong.. (2001)   (4 citations)  (Correct)

No context found.

Bharat, K., and Broder, A. 1998. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of the 7th International World Wide Web Conference.


Bringing the Web to the Network Edge: Large Caches and.. - Rodriguez, Biersack (2000)   (7 citations)  (Correct)

No context found.

K. Bharat and A. Broder, "A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines", In Seventh International WWW Conference, Brisbane Australia, April 1997.


Web Mining - Fürnkranz (2004)   (1 citation)  (Correct)

No context found.

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. Computer Networks, 30(1--7):107--117, 1998. Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia.


An efficient algorithm to rank Web resources - Zhang, Dong (2000)   (3 citations)  (Correct)

No context found.

K. Bharat and A. Broder, A technique for measuring the relative size and overlap of public Web search engines, Computer Networks and ISDN Systems 30 (1998) 379--388.


The Anatomy of a Clustering Engine for Web-page Snippets - Ferragina, Gulli (2004)   (Correct)

No context found.

Krishna Bharat and Andrei Broder. A technique for measuring the relative size and overlap of public web search engines. In WWW7, 1998.


Query Refinement for Domain-Specific Web Search - Oyama   (Correct)

No context found.

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In In Proceedings of the 7th International World Wide Web Conference., 1998.


Web Search Services - Jiying Wang And   (Correct)

No context found.

Bharat, K., and Broder, A., "A technique for measuring the relative size and overlap of public Web search engines," Computer Networks 30(1-7), 379-388, 1998.


Using SiteRank for P2P Web Retrieval - Wu, Aberer (2004)   (1 citation)  (Correct)

No context found.

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proc. of the 7th World-Wide Web Conference (WWW7), 1998.


The Web as a graph - Kumar, Raghavan, Rajagopalan..   (27 citations)  (Correct)

No context found.

K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Proc. 7th WWW Conf., 1998.


Database Selection for Longer Queries - Wu, Yu, Meng (2003)   (Correct)

No context found.

K. Bharat and A. Broder. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. WWW7, April 1998.


Personalized and Focused Web Spiders - Chau, Chen (2003)   (Correct)

No context found.

Bharat, K. and Broder, A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, Apr 1998.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC