26 citations found. Retrieving documents...
C.C. Aggarwal, F. Al-Garawi, and P.S. Yu. Intelligent crawling on the world wide web with arbitrary predicates. In Proceedings of the WWW Conference, 2001.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Ontology-Focused Crawling of Web Documents - Ehrig, Maedche (2003)   (3 citations)  (Correct)

....evaluation which has shown promising results. Keywords: Web Searching and Crawling, Semantic Web 1. INTRODUCTION The enormous growth of the World Wide Web in recent years has made it important to develop document discovery mechanisms based on intelligent techniques such as focused crawling [3, 6, 1]. In its classical sense a crawler is a program that retrieves Web pages. This service is commonly used by search engines or Web caches to build up their repositories. Crawling is the basic technique for building huge data storages. Focused crawling goes a step further than the classic approach. ....

....has 52 nodes and 167 lexical entries. Metric. Perhaps the most crucial evaluation of focused crawling is to measure the rate at which relevant pages are acquired, and how effectively irrelevant pages are filtered off from the crawl. The metric used in our scenario is the well known harvest rate [3, 1]. hr = #r #p , hr [0, 1] It represents the fraction of Web pages crawled satisfying the crawling target #r among the retrieved pages #p. This harvest ratio must be high, otherwise the focused crawler would spend a lot of time merely eliminating irrelevant pages, and it may be better to ....

[Article contains additional citation context not shown here]

C. C. Aggarwal, F. Al-Garawi, and P. Yu. Intelligent crawling on the world wide web with arbitrary predicates. In WWW-10, Hong Kong, 2001.


Crawling the Web - Pant, Srinivasan, Menczer (2004)   (1 citation)  (Correct)

....discuss a few such precision like measures: 1. Acquisition rate: In case we have boolean relevance scores we could measure the explicit rate at which good pages are found. Therefore, if 50 relevant pages are found in the first 500 pages crawled, then we have an acquisition rate or harvest rate [1] of 10 at 500 pages. 2. Average relevance: If the relevance scores are continuous they can be averaged over the crawled pages. This is a more general form of harvest rate [9, 21, 8] The scores may be provided through simple cosine similarity or a trained classifier. Such averages (see Figure ....

C. C. Aggarwal, F. Al-Garawi, and P.S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW10, Hong Kong, May 2001.


A General Framework for Searching in Distributed Data.. - Bakiras, Kalnis..   (Correct)

....sites that contain the results. Typical examples in this category are web search engines (e.g. Google [WWW3] and music distribution systems (e.g. Napster [WWW4] Changes of the source data are reflected to the central index either by periodically polling the source sites (e.g. web crawling [AGY01]) or by allowing the sites to notify the server whenever updates occur. Centralized systems suffer from several drawbacks: i) they cannot follow high frequency changes in the source data, ii) they require expensive dedicated infrastructure (i.e. high end server farms, fast network ....

Aggarwal, C., A1-Garawi, F., Yu, P. Intelligent crawling on the world wide web with arbitrary predicates. 10th Int. WWW Conf., 2001.


Topical Crawling for Business Intelligence - Pant, Menczer   (Correct)

....Budapest, Hungary. ACM xxx. a small but focused document collection from the Web that can then be thoroughly mined for appropriate information using offthe shelf text mining, indexing and ranking tools. Topical crawlers, also called focused crawlers, have been studied extensively in the past [6, 8, 3, 12, 7, 1, 2]. In our previous evaluation studies of topical crawlers, we found a similarity based Best First crawler to be quite effective [13, 14, 18] However, the Best First crawler made no use of the inherent structure available in an HTML document. We also studied algorithms that attempt to identify the ....

....of Kleinberg s algorithm [9] and is different from the way we identify hubs in the current work. Diligenti et al. 7] trained classifiers to predict the link distance of a given page from the relevant targets. The classifiers were then used to prioritize the crawl frontier. Aggarwal et al. [1] introduced an intelligent crawling infrastructure that could learn predicate dependent features. In more recent work Chakrabarti et al. 2] use the DOM structure of a Web page in a focused crawler. The features of HTML elements on the DOM tree are used to train a classifier. However, this use ....

C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW10, Hong Kong, May 2001.


Search Engine-Crawler Symbiosis - Pant, Bradshaw, Menczer (2003)   (Correct)

.... Shark Search [24] a more aggressive variant of De Bra s Fish Search [18] There are crawlers whose decisions rely heavily on link based criteria [15, 19] Others exploit lexical and conceptual knowledge such as those provided by a topic hierarchy [13] Still others emphasize contextual knowledge [1, 28] for the topic including those received via relevance feedback. Our work also relates to collaborative filtering [26, 5, 25] since the queries submitted by the users, help in preparing the collection for similar queries by other users in the future. The use of queries to collaborate between users ....

C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW10, Hong Kong, May 2001.


A General Evaluation Framework for Topical Crawlers - Srinivasan, Menczer, Pant (2002)   (1 citation)  (Correct)

.... Search [18] a more aggressive variant of De Bra s Fish Search [14] There are crawlers whose decisions rely heavily on link based criteria [12, 15, 6] Others exploit lexical and conceptual knowledge such as those provided by a topic hierarchy [11] Still others emphasize contextual knowledge [1, 29, 25] for the topic including those received via relevance feedback. In a companion paper we study several machine learning issues related to crawler algorithms, including for example, the role of adaptation in crawling and the scaling of algorithms [27] One research area that is gathering increasing ....

....of humans and other Web agents than humans themselves. Thus it is quite reasonable to explore crawlers in a context where the parameters of crawl time and crawl distance may be beyond the limits of human acceptance imposed by user based experimentation. Our analysis of the crawler literature [1, 2, 4, 6, 8, 11, 10, 17, 18, 29, 37] and our own experience [21, 24, 25, 26, 22, 31, 32, 27] indicate that in general, when embarking upon an experiment comparing crawling algorithms, several critical decisions are made. These impact not only the immediate outcome and value of the study but also the ability to make comparisons with ....

[Article contains additional citation context not shown here]

CC Aggarwal, F Al-Garawi, and PS Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proc. 10th International World Wide Web Conference, pages 96--105, 2001.


Topic-Oriented Collaborative Crawling - Chung, Clarke (2002)   (1 citation)  (Correct)

....crawler is a program that gathers resources from the Web. Web crawlers are widely used to gather pages for indexing by Web search engines [12, 23] but may also be used to gather information for Web data mining [16, 20] for question answering [14, 28] and for locating pages with specific content [1, 9]. A crawler operates by maintaining a pending queue of URLs that the crawler intends to visit. At each stage of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or ....

....describe focused crawlers that learn to identify apparently o# topic pages that reliably lead to on topic pages. Menczer et al. 31] consider the problem of evaluating and comparing the e#ectiveness of the strategies used by focused crawlers. The intelligent crawler proposed by Aggarwal et al. [1] generalizes the idea of a focused crawler, encompassing much of the prior research in this area. Their work assumes the existence of a predicate that determines membership in the target group. Starting at an arbitrary point, the crawler adaptively learns linkage structures in the Web that lead to ....

Charu C Aggarwal, Fatima Al-Garawi, and Philip S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Tenth International World Wide Web Conference, pages 96--105, May 2001.


Information Retrieval on the World Wide Web and.. - Barfourosh.. (2002)   (Correct)

....out performs the others; Infospider is in second place and finally page rank has the lowest efficiency among them. They conclude that the page rank method is too general to be very useful in topic specific search. A focused crawler that tries to learn the link structure of the web is detailed in [61]. This approach tries to find some features in a page that make it more likely that its links lead to on topic pages. For example, page content, URL structure of page, link structure of web or a combination of them can be considered as features. The researchers of this approach claim that learning ....

C. Aggarwal, F. Al-Garawi, P. Yu, Intelligent Crawling on the World Wide web with Arbitrary Predicates, Proceedings of the 10th International World Wide web Conference, Hong Kong, 2001.


Accelerated Focused Crawling through Online Relevance.. - Chakrabarti, Punera.. (2002)   (14 citations)  (Correct)

....central to the surfing activity. Automatic programs which can learn this capability would be valuable for a number of applications which can be broadly characterized as personalized, topic specific information foragers. Large scale, topic specific information gatherers are called focused crawlers [1, 9, 14, 28, 30]. In contrast to giant, all purpose crawlers which must process large portions of the Web in a centralized manner, a distributed federation of focused crawlers can cover specialized topics in more depth and keep the crawl more fresh, because there is less to cover for each crawler. In its ....

....Lexical proximity and contextual features have been used extensively in natural language processing for disambiguating word sense [15] Compared to plain text, DOM trees and hyperlinks give us a richer set of potential features. Aggarwal et al. have proposed an intelligent crawling framework [1] in which only one classifier is used, but similar to our system, that classifier trains as the crawl progresses. They do not use our apprentice critic approach, and do not exploit features derived from tag trees to guide the crawler. The intelligent agents literature has brought forth several ....

C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW2001.


Topic-Driven Crawlers: Machine Learning Issues - Menczer, Pant, Srinivasan (2002)   (1 citation)  (Correct)

....will continue to grow. Hence the solution o#ered by search engines, i.e. the capacity to answer any query from any user is recognized as being limited. It therefore comes as no surprise that the development of topic driven crawler algorithms has received significant attention in recent years [1, 6, 8, 16, 26, 28, 24]. Topic driven crawlers (also known as focused crawlers) respond to the particular information needs expressed by topical queries or interest profiles. These could be the needs of an individual user (query time or online crawlers) or those of a community with shared interests (topical search ....

....likelihood of a page leading to a relevant page, even if the source page itself is not relevant [11] Others exploit lexical and conceptual knowledge. For example in [6] Chakrabarti et al. use a hierarchical topic classifier to select links for crawling. Still others emphasize contextual knowledge [1, 26, 29] for the topic including that received via relevance feedback. For example in [1] Aggarwal et al. learn a statistical model of the features appropriate for a topic while crawling. In previous work by one of the authors, Menczer and Belew [26] show that in well organized portions of the Web, ....

[Article contains additional citation context not shown here]

CC Aggarwal, F Al-Garawi, and PS Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proc. 10th Intl. World Wide Web Conference, pages 96--105, 2001.


Web Crawling Agents for Retrieving Biomedical Information - Srinivasan, Mitchell.. (2002)   (2 citations)  (Correct)

....to estimate the likelihood of a page leading to a relevant page, even if it is not relevant itself [9] Others exploit lexical and conceptual knowledge. For example Chakrabarti et al. 5] use a hierarchical topic classifier to select links for crawling. Still others emphasize contextual knowledge [1, 22, 27] for the topic including that received via relevance feedback. We are at present part of a highly creative phase regarding the design of topic driven retrieval agents. Almost all of this research is characterized by an emphasis on general topic queries, such as those derived from the Yahoo and ....

C. Aggarwal, F. Al-Garawi, and P. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proc. 10th Intl. World Wide Web Conference, pages 96--105, 2001.


Exploration versus Exploitation in Topic Driven Crawlers - Pant, Srinivasan, Menczer (2002)   (Correct)

....benefit that crawlers can be driven by a rich context (topics, queries, user profiles) within which to interpret pages and select the links to be visited. It comes as no surprise, therefore, that the development of topic driven crawler algorithms has received significant attention in recent years [9, 14, 8, 18, 1, 19]. Topic driven crawlers (also known as focused crawlers) respond to the particular information needs expressed by topical queries or interest profiles. These could be the needs of an individual user (query time or online crawlers) or those of a community with shared interests (topical search ....

....guide their focused crawlers. Fetuccino [3] and InfoSpiders [18] begin their focused crawling with starting points generated from CLEVER [7] or other search engines. Most crawlers follow fixed strategies, while some can adapt in the course of the crawl by learning to estimate the quality of links [18, 1, 22]. The question of exploration versus exploitation in crawler strategies has been addressed in a number of papers, more or less directly. Fish Search [11] limited exploration by bounding the depth along any path that appeared suboptimal. Cho et al. 9] found that exploratory crawling behaviors ....

C. Aggarwal, F. Al-Garawi, and P. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proc. 10th Intl. World Wide Web Conference, pages 96--105, 2001.


Exploration versus Exploitation in Topic Driven Crawlers - Pant, Srinivasan, Menczer (2002)   (Correct)

....benefit that crawlers can be driven by a rich context (topics, queries, user profiles) within which to interpret pages and select the links to be visited. It comes as no surprise, therefore, that the development of topic driven crawler algorithms has received significant attention in recent years [9, 16, 8, 23, 1, 24]. Topic driven crawlers (also known as focused crawlers) respond to the particular information needs expressed by topical queries or interest profiles. These could be the needs of an individual user (query time or online crawlers) or those of a community with shared interests (topical search ....

....guide their focused crawlers. Fetuccino [3] and InfoSpiders [23] begin their focused crawling with starting points generated from CLEVER [7] or other search engines. Most crawlers follow fixed strategies, while some can adapt in the course of the crawl by learning to estimate the quality of links [23, 1, 27]. The question of exploration versus exploitation in crawler strategies has been addressed in a number of papers, more or less directly. Fish Search [12] limited exploration by bounding the depth along any path that appeared sub optimal. Cho et al. 9] found that exploratory crawling behaviors ....

CC Aggarwal, F Al-Garawi, and PS Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proc. 10th Intl. World Wide Web Conference, pages 96--105, 2001.


World Wide Web Search Technologies - Hu   (1 citation)  (Correct)

....synthesizes the relevance, authority, integrativity, and novelty of each Web resource, and can be computed efficiently through solving a group of linear equations. A variety of other improved ranking algorithms can be found in [16, 48] ffl Various enhanced crawlers can be found in the literature [1, 17, 43]. Some crawlers are extensible, personally customized, relocatable, scalable, and Web site specific [26, 42] Web viewers usually consider certain Web pages more important. A crawler which collects those important pages first is advantageous for users [13] ffl Artificial Intelligence (AI) can ....

Charu C. Aggarwal, Fatima Al-Garawi, and Philip S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proceedings of the 10 th International World Wide Web Conference, Hong Kong, May 2001. 15


Collaborative Crawling: Mining User Experiences for Topical.. - Aggarwal (2002)   (1 citation)  Self-citation (Aggarwal)   (Correct)

....topical locality on the web. This locality may be used in order to design effective techniques for resource discovery by starting at a few well chosen points and maintaining the crawler within the ranges of these known topics. A recent paper discusses the application of learning methodologies [3] in order to improve the effectiveness of the crawl. While the focussed crawling technique is designed for finding web pages belonging to one of the classes from a hierarchically arranged topical collection, the intelligent crawling technique is capable of more flexible and arbitrary predicates ....

....technique is capable of more flexible and arbitrary predicates such as a combination of different kinds of topical queries, keyword queries or other constraints on the content or metainformation about the web page such as its URL domain. Both the focussed crawling and intelligent crawling methods [3, 4] use only linkage based information in order to find topical resources. In the former case, the linkage based information is used directly under the assumption that web pages show topical locality, whereas in the latter case, a learning process is used to identify the most suitable web pages. It ....

[Article contains additional citation context not shown here]

C. C. Aggarwal, F. Al-Garawi, P. Yu. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. WWW Conference, 2001.


Semantic Web Mining and the Representation, - Analysis And Evolution   (Correct)

No context found.

C.C. Aggarwal, F. Al-Garawi, and P.S. Yu. Intelligent crawling on the world wide web with arbitrary predicates. In Proceedings of the WWW Conference, 2001.


Adaptive Web Search Based on a Colony of Cooperative.. - Gasparetti, Micarelli (2003)   (Correct)

No context found.

Aggarwal, C.C., Al-Garawi, F., Yu, P.S.: Intelligent Crawling on the World Wide Web with Arbitrary Predicates In Proc. of the WWW10 Conference (2001) 96--105.


Defining Evaluation Methodologies for Topical Crawlers - Srinivasan, Menczer, Pant   (Correct)

No context found.

CC Aggarwal, F Al-Garawi, and PS Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proc. 10th International World Wide Web Conference, pages 96--105, 2001.


Deriving Link-context from HTML Tag Tree - Gautam Pant Department (2003)   (Correct)

No context found.

C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW10, Hong Kong, May 2001.


Deriving Link-context from HTML Tag Tree - Pant (2003)   (Correct)

No context found.

C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW10, Hong Kong, May 2001.


Cooperative Crawling - Marina Buzzi Iit-Cnr   (Correct)

No context found.

C. Aggarwal, F. Al-Garawi, P. Yu, "Intelligent crawling on the World Wide Web with arbitrary predicates", WWW 2001, pp. 96-105.


Defining Evaluation Methodologies for Topical Crawlers - Padmini Srinivasan School   (Correct)

No context found.

CC Aggarwal, F Al-Garawi, and PS Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proc. 10th International World Wide Web Conference, pages 96--105, 2001.


Focused Search on the Web using WeQueL - Laboratoire   (Correct)

No context found.

Charu C. Aggarwal, Fatima Al-Garawi, and Philip S. Yu. Intelligent crawling on the world wide web with arbitrary predicates. In World Wide Web, pages 96--105, 2001.


Panorama: Extending Digital Libraries with Topical.. - Pant..   (Correct)

No context found.

C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In Proc. 10th Intl. World Wide Web conference, Hong Kong, May 2001.


Deriving Link-context from HTML Tag Tree - Gautam Pant Department (2003)   (Correct)

No context found.

C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. Intelligent crawling on the World Wide Web with arbitrary predicates. In WWW10, Hong Kong, May 2001.


Crawling on the World Wide Web - Wang, Fox   (Correct)

No context found.

Charu C. Aggarwal, Fatima A1-Garawi, Philip S. Yu. Intelligent Crawling on the World Wide web with Arbitrary Predicates, WWW10 May, 2001, Hong Kong.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC