Results 1 -
8 of
8
The PageRank Citation Ranking: Bringing Order to the Web
- Stanford InfoLab
, 1999
"... The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a method for rating Web pages objectively ..."
Abstract
-
Cited by 1600 (1 self)
- Add to MetaCart
The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.
Building Domain-Specific Search Engines with Machine Learning Techniques
, 1999
"... Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by agegroup, size, location and cost over summer camps. Unfortunate ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by agegroup, size, location and cost over summer camps. Unfortunately, these domain-specific search engines are difficult and time consuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, text classification and information extraction that automates efficient spidering, populating topic hierarchies, and identifying informative text segments. Using these techniques, we have built a demonstration system: a search engine for computer science research papers. It already contains over 33,000 papers and is publicly available at www.cora.jprc.com. 1 Introduction As the amount of information on the World ...
Finding near-replicas of documents on the Web
- In International Workshop on the World Wide Web and Databases (WebDB’98
, 1998
"... We consider how to e ciently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archivers and in the presentation of search results, among others. We report statistics on how common replication is on the web, and on the cost of computing ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
We consider how to e ciently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archivers and in the presentation of search results, among others. We report statistics on how common replication is on the web, and on the cost of computing the above information for a relatively large subset of the web { about 24 million web pages which corresponds to about 150 Gigabytes of textual information. 1
Shaping the Web: why the politics of search engines matters. to appear, The Information Society 16
, 2000
"... This articleargues that searchengines raise not merelytechnical issues but also political ones. Our study of search engines suggests that they systematically exclude (in some cases by design and in some, accidentally) certain sites and certain types of sites in favor of others, systematically giving ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
This articleargues that searchengines raise not merelytechnical issues but also political ones. Our study of search engines suggests that they systematically exclude (in some cases by design and in some, accidentally) certain sites and certain types of sites in favor of others, systematically giving prominence to some at the expense of others. We argue that such biases, which would lead to a narrowing of the Web’s functioning in society, run counter to the basic architecture of the Web as well as to the values and ideals that have fueled widespread support for its growth and development. We consider ways of addressing the politics of search engines, raising doubts whether, in particular, the market mechanism could serve as an acceptable corrective. Keywords search engines, bias, values in design, World Wide Web, digital divide, information access The Internet, no longer merely an e-mail and � le-sharing system, has emerged as a dominant interactive medium. Received 17 July 1997; accepted 24 November 1998. We are indebted to many colleagues for commenting on and questioning earlier versions of this article: audiences at the conference “Computer Ethics: A Philosophical Enquiry, ” London; members of the seminars at the Kennedy School of Government, Harvard University,
Measuring Index Quality using Random Walks on the Web
- In Proceedings of the Eighth International World Wide Web Conference
, 1999
"... Recent research has studied how to measure the size of a search engine, in terms of the number of pages indexed. In this paper, we consider a different measure for search engines, namely the quality of the pages in a search engine index. We provide a simple, effective algorithm for approximating t ..."
Abstract
- Add to MetaCart
Recent research has studied how to measure the size of a search engine, in terms of the number of pages indexed. In this paper, we consider a different measure for search engines, namely the quality of the pages in a search engine index. We provide a simple, effective algorithm for approximating the quality of an index by performing a random walk on the Web, and we use this methodology to compare the index quality of several major search engines.
Crawling for Images on the WWW
"... Search engines are useful because they allow the user to #nd information of interest from the World-Wide Web. These engines use a crawler to gather information from Web sites. However, with the explosive growth of the World-Wide Web it is not possible for any crawler to gather all the informatio ..."
Abstract
- Add to MetaCart
Search engines are useful because they allow the user to #nd information of interest from the World-Wide Web. These engines use a crawler to gather information from Web sites. However, with the explosive growth of the World-Wide Web it is not possible for any crawler to gather all the information available. Therefore, an e#cient crawler tries to only gather important and popular information. In this paper we discuss a crawler that uses various heuristics to #nd sections of the WWW that are rich sources of images. This crawler is designed for AMORE, a Web search engine that allows the user to retrieve images from the Web by specifying relevantkeywords or a similar image.
The PageRank Citation Ranking: Bringing Order to the Web
, 1998
"... The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a method for rating Web pages objectively ..."
Abstract
- Add to MetaCart
The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.

