| K. Bharat and A. Broder. 1998. A Technique for Measuring the Relative Size and Overlap of the Public Web Search Engines. In Proceedings of WWW7, pages 379--388. |
....of clusters obtained for several broad topic queries. 2. Organizing query result sets for search engines. The World Wide Web is a rich source of information. It has grown to more than one billion unique web documents [9] and continues to grow roughly at a rate of one million documents per day [4]. While the users are enjoying the large amount of information available on the World Wide Web, it poses a problem to both the users and the search engines. The problem lies in the sheer quantity of the web documents. When users submit a query, the search engine returns the retrieved web documents ....
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. Proc. of the 7th World-Wide Web Conference (WWW7), 1998.
....database size of the metasearch engine it can be as small as the intersection of the constituent databases for a poor fusion algorithm, or as large as the union for a good one. Interestingly, the di erence between the database union and intersection can be large. In 1997, Bharat and Broder [8] estimated the size of the indexable portion of the Web at 200 million pages. Of these, an estimated 160 million pages were in the union of the databases of HotBot, AltaVista, Excite, and Infoseek. However, the estimated intersection of their databases was less than 1.4 of their union only about ....
K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public Web search engines. In The 7th WWW Conference, pages 379-388, 1998.
....journals, attending conferences or, quite often, by hearsay. Instead, the web (as well as other information resources) offers scattered and distributed information that is impossible to analyse manually. It has been estimated that the World Wide Web contains more than 300 million static objects [Bharat and Broder, 1998] accessible through 100 million internet hosts [Cameron, 2002] In addition, organisations have intranets amounting to several million pages. The large majority of these documents are weakly structured. These repositories are usually searched by means of keywordbased search engines allowing a ....
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engine. In 7th WWW Conference, Brisbane, Australina, 1998.
....(also called spider or robot) collects documents by recursively fetching links from a set of starting pages. Each crawler has di erent policies with respect to which links are followed, how deeply various sites are explored, etc. Thus, the pages indexed by various search engines di er considerably [27, 2]. The indexer processes the pages collected by the crawler. First it decides which of them to index. For example, it might discard duplicate documents. Then it builds various data structures representing the pages. Most search engines build some variant of an inverted index data structure (see ....
K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public Web search engines. In Proceedings of the Seventh International World Wide Web Conference 1998, pages 379-388.
....opinions explicitly. 25 5 Measuring the Web There are many questions related to the statistics of Web itself . how big is the web How fast does the Web grow What is the proportion of pages in French, XML, etc How do various search engines compare Bharat and Broder described a technique in [4] to compare the coverage of different search engines. They measure search engine coverage and overlap through random queries. The idea behind their approach is simple: Consider sets A and B of size 4 and 12 units respectively. Let the size of their intersection A n B be 2. Suppose that we do not ....
....Web, since the Web is a directed graph with highly non uniform degrees and has many small cuts. Little is known about the graph structure of the Web. Hence the length of the random walks required to generate a distribution close to the uniform may be extremely large. The solution to this used in [4] is to make use of the search engines themselves to generate page samples. First they build a lexicon of the Web and their associated frequencies on the Web. A random URL is generated from a search engine by firing a random query at it and selecting a random URL from the result set. Both ....
[Article contains additional citation context not shown here]
K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of the Seventh International World Wide Confer- ence[WWW7], pages 379-388.
....of pages, no one entity has a complete enumeration. Even the major search engines only know of a fraction of the Web, and the pages retained in those datasets are biased samples of the Web [LG98c, LG99] As a result, the unbiased selection of a random subset of the Web is an open question [BB98] although some progress has been made [HHMN00] Accordingly, the data set used as the starting points in this chapter were selected at random from a subset of the Web. We randomly selected 100,000 pages out of the approximately 3 million pages that our local research search engine (DiscoWeb ....
Krishna Bharat and Andrei Broder. A technique for measuring the relative size and overlap of public Web search engines. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, April 1998. 272
....corresponding to non web crawlers. damental level. Studies in 1997 indicate a small region (1.4 ) of crawling overlap common to major web crawlers at the time. Further analysis showed that pairwise crawling overlap regions between four major search engines of the time ranged from 0.24 to 4. 08 [3]. Of course, since this study, the results have very likely been significantly altered. Over the years, more and more crawlers have been released on the web. Many web crawlers now make use of distributed or parallel technology in e#orts to increase a search engine s web coverage. Web crawlers ....
Krishna Bharat and Andrei Broder. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. In Proceedings of the 7th International World Wide Web Conference, pages 379--388, Brisbane, Australia, April 1998.
.... a scientific articles searcher, FTP Lycos [FtpLycos, el] which indexes files on FTP (file transfer protocol) servers or Moreover [Moreover, el] collecting only the freshest articles from newspapers. Due to the fast growth of the Internet (estimated to be 25 millions pages per month in 1998 [Bharat and Broder, 98] an ordinary Web user is yet again lost, this time in the abundance of searching capabilities. As Mamma meta search engine combines several popular services noticed by Eric Salberg: no single Web search service can provide all the germane [to the query] information, but] perhaps ....
Bharat K., Broder A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines, In Proceedings of the 7th World Wide Web Conference, Brisbane, Australia, August 1998.
No context found.
K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of the 7th World Wide Web Conference, pages 379--388, 1998.
No context found.
K. Bharat and A. Broder. 1998. A Technique for Measuring the Relative Size and Overlap of the Public Web Search Engines. In Proceedings of WWW7, pages 379--388.
No context found.
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In WWW, 1998.
No context found.
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In 7th WWW, 1998.
No context found.
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In WWW7, 1998. 809
No context found.
K. Bharat and A. Broder, "A technique for measuring the relative size and overlap of public web search engines," in Proc. 7th Int. WWW Conf., Brisbane, Australia, Apr. 1997, pp. 379--388.
No context found.
Bharat, K., and Broder, A. 1998. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of the 7th International World Wide Web Conference.
No context found.
K. Bharat and A. Broder, "A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines", In Seventh International WWW Conference, Brisbane Australia, April 1997.
No context found.
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. Computer Networks, 30(1--7):107--117, 1998. Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia.
No context found.
K. Bharat and A. Broder, A technique for measuring the relative size and overlap of public Web search engines, Computer Networks and ISDN Systems 30 (1998) 379--388.
No context found.
Krishna Bharat and Andrei Broder. A technique for measuring the relative size and overlap of public web search engines. In WWW7, 1998.
No context found.
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In In Proceedings of the 7th International World Wide Web Conference., 1998.
No context found.
Bharat, K., and Broder, A., "A technique for measuring the relative size and overlap of public Web search engines," Computer Networks 30(1-7), 379-388, 1998.
No context found.
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proc. of the 7th World-Wide Web Conference (WWW7), 1998.
No context found.
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Proc. 7th WWW Conf., 1998.
No context found.
K. Bharat and A. Broder. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. WWW7, April 1998.
No context found.
Bharat, K. and Broder, A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, Apr 1998.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC