| Kahle, B. (1997). Preserving the Internet. Scientific American, 264(3). |
....sites were designed to attract visitors, and many other facets of this medium and society in general. Thus, these archives may well end up forming one of the most fascinating collections of popular digital cultural heritage in the future. While several initiatives are already building Web archives [1, 7, 10, 15], several significant challenges remain to be solved, requiring models for preserving the digital artifacts [5] or concepts for cost e#cient distributed storage [4] to name just a few. When it comes to the usage (or prospected usage, as many of these archives currently provide limited or no ....
....Section 6 gives initial results. We finally provide an outlook on future work in Section 7. 2 Related Work In the last years we have witnessed the creation of numerous initiatives building archives of the World Wide Web. Among the most famous of these we find, for example, the Internet Archive [9, 10], located in the US, which, among many other collections, has the largest archive of Web pages from all over the world, donated by the search engine Alexa. Within Europe, the leading project with respect to Web archiving is the Kulturaw3 project by the Swedish Royal National Library [1] Its ....
B. Kahle. Preserving the internet. Scientific American, March 1997. http://www. sciam.com/0397issue/0397kahle.html.
....directly to the Czech initiative. This should happen transparently, at best the user does not even learn where the document was retrieved from. Already, scientists from various backgrounds have emphasised their interest in such a collection representing a valuable resource for their studies [Kah97]. In order to allow analysis of the material in the archive particular tools can be provided. Even though specialised tools will be required for the individual research projects, basic facilities can be provided from the outset, o ering functionality as used in statistical analysis or data mining. ....
Brewster Kahle. Preserving the Internet. Scientic American, March 1997.
....of storing asserted facts and figures, but of preserving the big picture, of capturing the sociological and cultural dimension of the Internet and its inhabitants . Already now the material gathered by spear heading initiatives has proven a useful resource to historians analyzing e.g. elections [Kah97]. As we start to recognize the importance of preserving at least parts of the entities that make up web, more and more projects are initiated to address the technical challenges associated with such an endeavor. In this paper we provide an overview of issues pertaining to the archiving of digital ....
....documents. Thereby the archive may contain o#ensive material without being aware of it. This topic has to be handled with care and sensitivity in terms of access provision. Data in the Internet characteristically has a high volatility. Estimates put the average lifetime of a document at 44 days [Kah97]. Therefore, it is inevitable that intermediate versions of some documents are missed out, most actually will be lost at all. This again 4 impedes the striving for a solution as complete as possible. Reconsidering the initial motivation of wanting to convey a picture of the Internet at a given ....
[Article contains additional citation context not shown here]
B. Kahle. Preserving the internet. Scientific American, March 1997. http: //www.sciam.com/0397issue/0397kahle.html.
....to the web s sheer size (about 10 9 pages (Inktomi NEC January 19 2000; Lawrence Giles 1999) and growing) and its distributed and dynamic nature. Exhaustive enumeration of all web pages is a technically challenging and costly task, and any results become rapidly outdated (Brin Page 1998; Kahle 1997). A more reasonable approach is to infer statistics based on a random sample of web pages. Generating a uniform sample of web pages is itself a nontrivial problem. Several methods have been proposed, though no standard methodology has emerged. Lawrence and Giles (Lawrence Giles 1998) queried ....
Kahle, B. 1997. Preserving the Internet. Scientific American.
....due to the web s sheer size (about 172 pages (Inktomi NEC January 19 2000; Lawrence Giles 1999) and growing) and its distributed and dynamic nature. Exhaustive enumeration of all web pages is a technically challenging and costly task, and any results become rapidly outdated (Brin Page 1998; Kahle 1997). A more reasonable approach is to infer statistics based on a random sample of web pages. Generating a uniform sample of web pages is itself a nontrivial problem. Several methods have been proposed, though no standard methodology has emerged. Lawrence and Giles (Lawrence Giles 1998) queried ....
Kahle, B. 1997. Preserving the Internet. Scientific American.
....of storing asserted facts and gures, but of preserving the big picture, of capturing the sociological and cultural dimension of the Internet and its inhabitants . Already now the material gathered by spear heading initiatives has proven a useful resource to historians analyzing e.g. elections [Kah97]. As we start to recognize the importance of preserving at least parts of the entities that make up web, more and more projects are initiated to address the technical challenges associated with such an endeavor. In this paper we provide an overview of issues pertaining to the archiving of digital ....
....documents. Thereby the archive may contain o ensive material without being aware of it. This topic has to be handled with care and sensitivity in terms of access provision. Data in the Internet characteristically has a high volatility. Estimates put the average lifetime of a document at 44 days [Kah97]. Therefore, it is inevitable that intermediate versions of some documents are missed out, most actually will be lost at all. This again 4 impedes the striving for a solution as complete as possible. Reconsidering the initial motivation of wanting to convey a picture of the Internet at a given ....
[Article contains additional citation context not shown here]
B. Kahle. Preserving the internet. Scientic American, March 1997. http: //www.sciam.com/0397issue/0397kahle.html.
....a commitment to both maintaining the object and keeping it accessible. It is even possible that entire digital libraries will disappear if efforts are not made to maintain them. Several years ago, an analysis of existing Web sites found the average lifetime of a URL was only 44 days [4]. This discouraging statistic may be accounted for in a number of ways, including that data has been moved, not deleted, but also that we, as a community, are right to be concerned about these issues. Conclusion When creating digital library systems containing valuable content, we are ....
Kahle, B. Preserving the Internet. Sci. Am. (Mar. 1997); see www.sciam.com/0397issue/0397kahle.html.
....Finally, Krishnamurthy and Rexford [25] discuss robust mechanisms for cleaning HTTP logs. Problems with HTTP logs Inaccuracies HTTP logs provide snapshots of the use of web resources at a particular time. Unfortunately, since on average the lifetime of a web page is short (less than two months [33, 21, 24, 16]) any captured log loses its value quickly as more references within it are no longer valid, either by changing content or by becoming inaccessible. For example, when replaying the request trace P1 in late August, 1998 (requests only 1 4 months old) approximately 10 of the requests resulted ....
Brewster Kahle. Preserving the Internet. Scientific American, 276(3):82-83, March
....our civilization. Electronic media improves transmission of and access to information, but the important issue of its preservation has yet to be e ectively addressed. The importance of this issue is now apparent as the general objective of preservation has received considerable recent attention [6, 11, 2, 9, 7, 14, 10, 13]. In addition, other projects such as [3] appear to be at a formative stage. The notion of Archival Intermemory was introduced in [8] The authors are listed in alphabetical order. At the time this prototype was designed all authors were with NEC Research Institute, 4 Independence Way, ....
B. Kahle. Preserving the internet. Scientic American, pages 82-83, March 1997.
....is also aliated with Georgia Institute of Technology, the third is now with Intertrust Corporation, the fourth is also with New York University, and the fth with the University of Washington. Direct correspondence to the sixth author at pny research.nj.nec.com. considerable recent attention [5, 2, 7, 9, 10]. In addition, other projects such as [3] appear to be at a formative stage. The notion of Archival Intermemory was introduced in [6] The Intermemory project aims to develop largescale highly survivable and available storage systems made up of widely distributed processors that are individually ....
Kahle, B. Preserving the internet. Scientic American (March 1997), 82-83.
....search engines of today include 1 databases of about 150 million indexed Webpages, and they crawl more than 10 million webpages per day [9] which are stored in the database. This crawling speed will most likely have to be increased in future, as the average life time of a Webpage is only 44 days [10] 2 , the exponential growth of the total storage size of all Webpages will be sustained for quite some time. Most search engines store the page and perform some simple page relevance ranking. However, further postprocessing methods of the search information differ a lot. According to the ....
Kahle, B., Preserving the Internet, Scientific American, 1997, pp. 82--84.
....to this problem is the centralized archiving of document editions by a trusted third party (TTP) Through this approach document editions can be secured against tampering. The Xanadu project [4] among others, provides this capability. A further partial example is the Internet Archiving project [3], which captures snapshots of the Web. A decentralized approach is another alternative. It requires authors and publishers to maintain their own document edition archives. 2, 5, 6, 7] provide a few examples. The systems described, however, were not designed to detect or prevent version ....
B. Kahle. Preserving the internet. Scientific American, 276:82--83, March 1997.
....radically more convenient than a trip to the bookshelf or library, suggesting that influential digital libraries such as those of ACM and IEEE would benefit users by providing papers, or at least their reference lists, in HTML format. Web technologies such as archiving the Web (Kahle 1997 [19]) Web based document version retention (Simonson et al. 1998 [38] provides a survey; see also www.webdav.org) URNs (Sollin and Masinter 1994 [39] and PURLs (Permanent URLs, see www.purl.org) take on additional significance when viewed as ways to increase the reliability of citations ....
Kahle, B. Preserving the internet. Scientific American, 276 (March 1997), 82-83. (Cited in Sec. 3.4)
.... [CK99] CK00] EWCS96] FCAB98] FJCL99] FRC98] FRCar] Fel98] FCD 99] FVYI00] FB96] GRC97] GCR97] GCR98] GC97] Gla94] GPB98] Gri97] GB97] GPV98] GNPV98] GS96a] GS96b] GS97] HMY97] Hed98] HN96] HWMS98] HSY98] HK97] HJWC98] IKY97] IST98] JC98] JDB96] JK97] JK98] Kah97] KKO98] WMS98a] KS98a] KW97a] KW97b] KW98] KMK99] KR99] KW99] KA99] KS98b] KLM97] KSW98b] KSW98a] LG98] LHC 98] LSCH98] LWS 99] LD99] LAJF98] Liu98] LC97] LC98] TB97] LOG96] LA94] Luo98] Mah99] MWE00] MEW00] MLB95] MR97] MS97] MSC98] Mar96] MC98] ....
Brewster Kahle. Preserving the Internet. Scientific American, 276(3):82--83, March 1997.
....to the semantic model. Accounts of such systems cannot be found within the existing database research and therefore we provide an alternative transaction model (Section 4) which takes into account this characteristic. 7 For example, an average lifetime of a URL is estimated as 44 days [17]. 24 6 Conclusion This paper proposes a Web based application management system as a generic architecture offering specific functionalities for managing Web based applications and coping with semantic diversity among its information sources. These functionalities include facilities for ....
B. Kahle. Preserving the internet. http://www.sciam.com/0397issue/0397kahle.html.
....two distinct roles: maintenance of a historical record, and selection of appropriate materials. Web indexing approaches dispense with the second role and allow a user to view all the world has to offer. They fail, however, to deal with the first. Creating a record of everything on the Internet [9] represents, by contrast, an emphasis on the first role and eliminates entirely the second. Our concept of Intermemory combines the archival function with what amounts to a self selecting publication process. It is worthwhile noting that Intermemory solves the preservation problem associated with ....
B. Kahle. Preserving the internet. Scientific American, pages 82--83, March 1997.
....follow a particular distribution) which are necessary for testing, but not necessarily known to be true. Captured logs. Using actual client request logs can be more realistic than artificial workloads, but since on average the lifetime of a web page is short (less than two months [Wor94, GS96, Kah97, DFKM97] any captured log loses its value quickly as more references within it are no longer valid, either by becoming inaccessible or by changing content (i.e. looking at a page a short time later may not give the same content) In addition, unless the logs are recorded and processed ....
Brewster Kahle. Preserving the Internet. Scientific American, 276(3):82--83, March 1997.
No context found.
Kahle, B. (1997). Preserving the Internet. Scientific American, 264(3).
No context found.
Kahle, B. 1997. Preserving the Internet. Scientific American.
No context found.
Brewster Kahle, "Preserving the Internet", Scientific American, pp. 82--84, March 1997.
No context found.
Brewster Kahle, "Preserving the Internet", Scientific American, pp. 82--84, March 1997.
No context found.
Kahle, B.: Preserving the Internet. Scientific America, Mar 1997.
No context found.
B. Kahle. Preserving the internet. http://www.sciam.com/0397issue/0397kahle.html.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC