| M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles and M. Gori, Focused Crawling Using Context Graphs, in Proceedings of VLDB 2000, pages 527-534. |
....evaluation which has shown promising results. Keywords: Web Searching and Crawling, Semantic Web 1. INTRODUCTION The enormous growth of the World Wide Web in recent years has made it important to develop document discovery mechanisms based on intelligent techniques such as focused crawling [3, 6, 1]. In its classical sense a crawler is a program that retrieves Web pages. This service is commonly used by search engines or Web caches to build up their repositories. Crawling is the basic technique for building huge data storages. Focused crawling goes a step further than the classic approach. ....
....are also mentioned. However, no detailed evaluation of their approach is provided. The idea of tunnelling [2] through several links without actually rating them is a proposal on how to prevent getting caught in one domain. Whereas our idea is based on text and contents, the approach presented in [6] uses so called context graphs as a means to model the paths leading to relevant web pages. Context graphs in their sense represent link hierarchies within which relevant Web pages occur together. A combination of context graphs and ontology focused crawling might be future work. 4] did ....
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused Crawling using Context Graphs. In VLDB-00, 2000.
....version of Kleinberg s algorithm [16] to find topical hubs. The hubs provide links to many authoritative sources on the topic. The distiller is activated at various times during the crawl and some of the top hubs are added to the frontier. 3. 4 Context Focused Crawler Context focused crawlers [13] use Bayesian classifiers to guide their crawl. However, unlike the focused crawler described above, these classifiers are trained to estimate the link distance between a crawled page and the relevant pages. We can appreciate the value of such an estimation from our own s http: dmoz.org 14 ....
....may be trained to identify the pages that are relevant to the information need or task. The training is done using the seed (or pre specified relevant) pages as positive examples. The trained classifier will then provide boolean or continuous relevance scores to each of the crawled pages [9, 13]. 5. Retrieval system rank: N different crawlers are started from the same seeds and allowed to run till each crawler gathers P pages. All of the iV P pages collected from the crawlers are ranked against the initiating query or description using a retrieval system such as SMART. The rank provided ....
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proc. 26th International Conference on Very Large Databases (VLDB 2000), pages 52534, Cairo, Egypt, 2000.
....while being robust against crashes, manageable, and considerate of resources and web servers. There has been some recent academic interest in the first issue, including work on strategies for crawling important pages first [12, 21] crawling pages on a particular topic or of a particular type [9, 8, 23, 13], recrawling (refreshing) pages in order to optimize the overall freshness of a collection of pages [11, 10] or scheduling of crawling activity over time [25] In contrast, there has been less work on the second issue. Clearly, all the major search engines have highly optimized crawling ....
....focus only on certain types of pages, e.g. pages on a particular topic or in a particular language, images, mp3 files, or computer science research papers. In addition to heuristics, more general approaches have been proposed based on link structure analysis [9, 8] and machine learning techniques [13, 23]. The goal of a focused crawler is to find many pages of interest without using a lot of bandwidth. Thus, most of the previous work does not use a high performance crawler, although doing so could support large specialized collections that are significantly more up to date than a broad search ....
M. Diligenti, F. Coetzee, S. Lawrence, C. Giles, and M. Gori. Focused crawling using context graphs. In Proc. of 26th Int. Conf. on Very Large Data Bases, September 2000.
....Budapest, Hungary. ACM xxx. a small but focused document collection from the Web that can then be thoroughly mined for appropriate information using offthe shelf text mining, indexing and ranking tools. Topical crawlers, also called focused crawlers, have been studied extensively in the past [6, 8, 3, 12, 7, 1, 2]. In our previous evaluation studies of topical crawlers, we found a similarity based Best First crawler to be quite effective [13, 14, 18] However, the Best First crawler made no use of the inherent structure available in an HTML document. We also studied algorithms that attempt to identify the ....
....harvested relevant pages at considerable link distances from the seeds. It is important to note that the method used by the distiller to identify the top hubs is a modified version of Kleinberg s algorithm [9] and is different from the way we identify hubs in the current work. Diligenti et al. [7] trained classifiers to predict the link distance of a given page from the relevant targets. The classifiers were then used to prioritize the crawl frontier. Aggarwal et al. 1] introduced an intelligent crawling infrastructure that could learn predicate dependent features. In more recent work ....
[Article contains additional citation context not shown here]
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proc. 26th International Conference on Very Large Databases (VLDB 2000.
....breadth first [34] and depth first crawlers [18] defining the beginnings of research on crawlers, we now see a variety of crawler algorithms. There is Shark Search [24] a more aggressive variant of De Bra s Fish Search [18] There are crawlers whose decisions rely heavily on link based criteria [15, 19]. Others exploit lexical and conceptual knowledge such as those provided by a topic hierarchy [13] Still others emphasize contextual knowledge [1, 28] for the topic including those received via relevance feedback. Our work also relates to collaborative filtering [26, 5, 25] since the queries ....
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proc. 26th International Conference on Very Large Databases (VLDB 2000.
....relevant pages only. Because the crawler is only a computer program, it cannot determine how relevant a web page is. In the literature, there are many researches mentioning about the topic specific crawling. For example, they are the breadth first search [12] pagerank [4,8,15] focused crawling [17,10], shark search [11] adaptive agent [2,3,9] reinforcement learning [6] artificial life agent[13] arbitrary predicate [1] WTMS [18,19] etc. The main purpose of these algorithms is to gather the most relevant web pages as possible. Therefore, the assessment of topic specific web crawling is ....
M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, and M. Gori. "Focused Crawling Using Context Graphs." In Proceedings of the 26th International Conference on Very Large Data Bases, 2000.
....centralized indexing schemes to decentralized mechanisms that navigate the underlying network without knowledge of its global structure. The decentralized approach appears in a variety of settings: in the behavior of users browsing the Web by following hyperlinks; in the design of focused crawlers [4, 5, 8] and other agents that explore the Web s links to gather information; and in the search protocols underlying decentralized peer to peer systems such as Gnutella [10] Freenet [7] and recent research prototypes [21, 22, 23] through which users can share resources without a central server. In ....
M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, M. Gori, "Focused Crawling Using Context Graphs," Proc. 26th Intl. Conf. on Very Large Databases (VLDB), 2000.
....such as news and financial data, came along a steep growth of dynamic documents on the web. Thus, search engines require exhaustivecrawling work to maintain and update their indices to the increasingly time sensitive web content. Currently, publicly indexed documents exceed one billion in numbers [1]. In addition to the general purpose crawlers, an ever growing number of focused crawlers selectively seek out documents that are relevanttoaspecific pre defined set of subjects [2] To obtain information about a product or service requested by a customer (e.g. price, expected delivery time, ....
M. Diligenti, F. Coetzee, S. Lawrence, C. Giles and M. Gori, "Focused Crawling Using Context Graphs," Proc. 26th VLDB Conf., Egypt 2000.
....breadth first [33] and depth first crawlers [14] defining the beginnings of research on crawlers, we now see a variety of crawler algorithms. There is Shark Search [18] a more aggressive variant of De Bra s Fish Search [14] There are crawlers whose decisions rely heavily on link based criteria [12, 15, 6]. Others exploit lexical and conceptual knowledge such as those provided by a topic hierarchy [11] Still others emphasize contextual knowledge [1, 29, 25] for the topic including those received via relevance feedback. In a companion paper we study several machine learning issues related to ....
....di#erent from the target pages. In this case there is less prior information available about the target pages when the crawl begins. Links become important not only in terms of the particular pages pointed to, but also in terms of the likelihood of reaching relevant documents further down the path [15]. This problem is also very realistic since quite commonly users are unable to specify a known relevant page and also the search engines may not return relevant pages. With a few exceptions, this second task is rarely considered in the literature. The e#ort in [1] is somewhat related to this task ....
[Article contains additional citation context not shown here]
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proc. 26th International Conference on Very Large Databases (VLDB 2000.
No context found.
Diligenti, M.; Coetzee, F.; Lawrence, S.; Giles, C. L.; and Gori, M. 2000. Focused crawling using context graphs. In 26th International Conference on Very Large Databases, VLDB 2000, 527--534.
No context found.
M. Diligenti, F. Coetzee, L. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graph. In Proceedings of the 27th Conf. on Very Large Data Base, 2000.
No context found.
Diligenti M., Coetzee F., Lawrence S., Giles C., Gori M., "Focused crawling using context graphs", In Proceedings of the 26 Int. Conf. On Very Large Databases, Cairo, Egypt, 2000. 24
No context found.
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. and Gori, M. (2000) Focused crawling using context graphs, Proceedings of the 26th Int. Conf. On Very Large Databases, Cairo, Egypt.
No context found.
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Proc. 26th Intl. Conference on Very Large Data Bases (VLDB 2000.
....to find as more T on topic pages as possible in a predefined number of steps. By step is meant visiting (and downloading and indexing) a page reachable from S following hyperlinks from pages in S. In other words it is important to estimate whether an outgoing link is promising or not. In [3][4] and [5] different techniques are described. In any case when crawler decides to take into account page for link expansion, all links from the page are inserted into the crawl frontier (links that are to be visited) But many of them are not interesting at all (i.e. this page is designed by ....
....we constructed a mixture of classifiers each one trained to recognize words appearing in a specific portion of the page. In the future, we plan to use Neural Networks to find the optimal weights of our mixture of classifiers. We hope our system will also improve focus crawling strategies [4] by estimating importance of the link based on its position and neighborhood. We believe that our visual page representation can find its application in many other areas related to search engines, information retrieval and data mining from the Web. 7. ....
Diligenti M., Coetzee F., Lawrence S., Giles C., Gori M., "Focused crawling using context graphs", Proceedings of the 26 Int. Conf. On Very Large Databases, Cairo, Egypt, 2000.
No context found.
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles and M. Gori, Focused Crawling Using Context Graphs, in Proceedings of VLDB 2000, pages 527-534.
No context found.
M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles and M. Gori. Focused crawling using context graphs. In Proceeding of the 26th VLDB Conference, Cairo, Egypt, 2000.
No context found.
M.Diligenti, F.M.Coetzee, S. Lawrence, C.L.Giles and M.Gori. Focused Crawling using Context Graphs, 26th International Conference on Very Large Databases, VLDB2000.
No context found.
Diligenti, M. et al: Focused crawling using context graphs. In Proceedings of VLDB (2000) 527-534.
No context found.
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori, "Focused crawling using context graphs," in 26th International Conference on Very Large Databases, VLDB 2000.
No context found.
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori, "Focused crawling using context graphs," in 26th International Conference on Very Large Databases, VLDB 2000.
No context found.
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, M. Gori, "Focused Crawling Using Context Graphs", VLDB 2000, pp. 527-534.
No context found.
Diligenti, M., Coetzee, F.M., Lawrence, S., Giles, C.L., and Gori, M., "Focused crawling using context graphs," Proc. 26 Intl. Conf. on Very Large Data Bases, 527-534, 2000. Available at http://external.nj.nec.com/homepages/coetzee/focusCrawler.pdf
No context found.
, 269--271. Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. L. and Gori, M. 2000 Focused crawling using context graphs. In VLDB 2000, Proc. 26th Int. Conf. on Very Large Data Bases,
No context found.
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C. L., and Gori, M. Focused Crawling using Context Graphs. In Proceedings of the 26 th International Conference on Very Large Databases, VLDB 2000, Cairo, Egypt, pp. 527-534.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC