• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 22
Next 10 →

Demonstrating Intelligent Crawling and Archiving of Web Applications

by Muhammad Faheem, Pierre Senellart, Télécom Paristech , 2013
"... We demonstrate here a new approach to Web archival crawling, based on an application-aware helper that drives crawls of Web applications according to their types (especially, according to their content management systems). By adapting the crawling strategy to the Web application type, one is able to ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We demonstrate here a new approach to Web archival crawling, based on an application-aware helper that drives crawls of Web applications according to their types (especially, according to their content management systems). By adapting the crawling strategy to the Web application type, one is able

Une démonstration d’un crawler intelligent pour les applications Web

by Muhammad Faheem, Pierre Senellart
"... We demonstrate here a new approach to Web archival crawling, based on an applicationaware helper that drives crawls of Web applications according to their types (especially, according to their content management systems). By adapting the crawling strategy to the Web application type, one is able to ..."
Abstract - Add to MetaCart
to crawl a given Web application (say, a given forum or blog) with fewer requests than traditional crawling techniques. Additionally, the application-aware helper is able to extract semantic content from the Web pages crawled, which results in a Web archive of richer value to an archive user. In our

Intelligent and adaptive crawling of Web applications for Web archiving

by Muhammad Faheem, Pierre Senellart - In ICWE , 2013
"... Abstract. Web sites are dynamic in nature with content and structure changing overtime. Many pages on the Web are produced by content management systems (CMSs) such as WordPress, vBulletin, or phpBB. Tools currently used by Web archivists to preserve the content of the Web blindly crawl and store We ..."
Abstract - Cited by 10 (4 self) - Add to MetaCart
Web pages, disregarding the CMS the site is based on (leading to suboptimal crawling strategies) and whatever structured content is contained in Web pages (resulting in page-level archives whose content is hard to exploit). We present in this paper an application-aware helper (AAH) that fits

Intelligent crawling of Web applications for Web archiving

by Muhammad Faheem, Supervised Pierre Senellart
"... The steady growth of the World Wide Web raises challenges regarding the preservation of meaningful Web data. Tools used currently by Web archivists blindly crawl and store Web pages found while crawling, disregarding the kind of Web site currently accessed (which leads to suboptimal crawling strateg ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
application that uses Web standards such as HTML and HTTP to publish information on the Web, accessible by Web browsers. Examples include Web forums, social networks, geolocation services, etc. We claim that the best strategy to crawl these applications is to make the Web crawler aware of the kind

AF T Intelligent and Adaptive Crawling of Web Applications for Web Archiving

by Muhammad Faheem, Pierre Senellart, Télécom Paristech , 2012
"... Web sites are dynamic in nature with their content and structure changing overtime. Many pages on the Web are produced by content management systems (CMSs) such as WordPress, vBulletin, or phpBB. Tools currently used by Web archivists to preserve the content of the Web blindly crawl and store Web pa ..."
Abstract - Add to MetaCart
pages, disregarding the CMS the site is based on (which leads to suboptimal crawling strategies) and whatever structured content is contained in Web pages (which results in page-level archives whose content is hard to exploit). We present in this paper an application-aware helper (AAH) that fits

Workload-Aware Web Crawling and Server Workload Detection

by Shaozhi Ye, Guohan Lu, Xing Li - In Network Research Workshop, 18th Asian Pacific Advanced Network Meeting (APAN 2004 , 2004
"... With the development of search engines, more and more web crawlers are used to gather web pages. The rising crawling tra#c has brought the concern that crawlers may impact web sites. On the other hand, more e#cient crawling strategy is required for the coverage and freshness of search engine index. ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
server workload-aware crawling strategy is proposed. By measuring the web service time with a hybrid back-to-back packets pair, server workload is detected on the client side, thus crawler can adapt its crawling speed to web server. The experiment results show the power of our workload detection approach

Rank-Aware Crawling of Hidden Web sites

by George Valkanas, Alexandros Ntoulas, Dimitrios Gunopulos
"... An ever-increasing amount of valuable information on the Web today is stored inside online databases and is accessible only after the users issue a query through a search interface. Such information is collectively called the“Hidden Web”and is mostly inaccessible by traditional search engine crawler ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
, if we can crawl a Hidden Web site in breadth, i.e. download just the top results for all potential queries, we can enable such applications without the need for allocating resources for fully crawling a potentially huge Hidden Web site. In this paper we present algorithms for crawling a Hidden Web site

E.: An agent-based focused crawling framework for topic- and genre-related web document discovery

by Georgios Katsimpras, Efstathios Stamatatos - In: Proc. of the 24th IEEE International Conference on Tools with Artificial Intelligence (ICTAI , 2012
"... Abstract—The discovery of web documents about certain topics is an important task for web-based applications including web document retrieval, opinion mining and knowledge extraction. In this paper, we propose an agent-based focused crawling framework able to retrieve topic- and genre-related web do ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract—The discovery of web documents about certain topics is an important task for web-based applications including web document retrieval, opinion mining and knowledge extraction. In this paper, we propose an agent-based focused crawling framework able to retrieve topic- and genre-related web

Towards Designing an Efficient Crawling Window to Analysis and Annotate Changes in Linked Data Sources

by David Urdiales-nieto, José F. Aldana-montes
"... Today the popularity of data quality is increasing in linked data, and its changes are being annotated. Linked data consuming applications need to be aware of changes in a dataset. Changes such as update, remove or creation links may occur for a time so it is necessary to detect them to update local ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Today the popularity of data quality is increasing in linked data, and its changes are being annotated. Linked data consuming applications need to be aware of changes in a dataset. Changes such as update, remove or creation links may occur for a time so it is necessary to detect them to update

Uncovering the relational web

by Michael J. Cafarella, Eugene Wu - In under review , 2008
"... The World-Wide Web consists of a huge number of unstructured hypertext documents, but it also contains structured data in the form of HTML tables. Many of these tables contain both relational-style data and a small “schema ” of labeled and typed columns, making each such table a small structured dat ..."
Abstract - Cited by 28 (8 self) - Add to MetaCart
to recover column label and type information. Our mix of hand-written detectors and statistical classifiers takes a raw Web crawl as input, and generates a collection of databases that is five orders of magnitude larger than any other collection we are aware of. Relation recovery achieves precision
Next 10 →
Results 1 - 10 of 22
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University