See this document in CiteSeerX!

Building a Distributed Full-Text Index for the Web (2000)  (Make Corrections)  (25 citations)
Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-Molina
World Wide Web



  Home/Search   Context   Related

 
View or download:
stanford.edu/~rsram/pub...www10paper.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  stanford.edu/~rsram/cv (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present... (Update)

Cited by:   More
A Scalable Topic-Based Open Source Search Engine - Buntine, Löfström, Perkiö..   (Correct)
On the Usage of Global Document Occurrences in.. - Papapetrou..   (Correct)
Optimized Index Structures for Querying RDF from the Web - Harth, Decker (2005)   (Correct)

Similar documents (at the sentence level):
57.2%:   Building a Distributed Full-Text Index for the Web - Melnik, Raghavan, Yang.. (2000)   (Correct)
8.4%:   Searching the Web - Arasu, Cho, Garcia-Molina, Paepcke.. (2001)   (Correct)

Active bibliography (related documents):   More   All
0.5:   Integrating Diverse Information Management Systems: A Brief .. - Raghavan, Garcia-Molina (2001)   (Correct)
0.5:   A Reliable Storage Management Layer for - Distributed Information Retrieval   (Correct)
0.5:   Comparing Hybrid Peer-to-Peer Systems - Yang, Garcia-Molina (2001)   (Correct)

Similar documents based on text:   More   All
0.3:   Philip A. Bernstein Philip A. Bernstein Sergey Melnik Sergey .. - Philbe Philbe Melnik   (Correct)
0.3:   WebBase : A repository of web pages - Hirai, Raghavan, Garcia-Molina.. (1999)   (Correct)
0.3:   Dynamic Maintenance of Web Indexes Using Landmarks - Lim, Wang, Padmanabhan, al. (2003)   (Correct)

Related documents from co-citation:   More   All
12:   Managing gigabytes: compressing and indexing documents and images - Witten, Moffat et al. - 1994
12:   The anatomy of a large-scale hypertextual Web search engine - Brin, Page
11:   Modern Information Retrieval - Baeza-Yates - 1999

BibTeX entry:   (Update)

S. Melnik et al. Building a distributed full-text index for the web. Technical report, Stanford Digital Library Project, July 2000. Available at wwwdiglib. stanford.edu/cgi-bin/get/SIDL-WP-2000-0140. http://citeseer.ist.psu.edu/melnik00building.html   More

@inproceedings{ melnik01building,
    author = "Sergey Melnik and Sriram Raghavan and Beverly Yang and Hector Garcia-Molina",
    title = "Building a distributed full-text index for the Web",
    booktitle = "World Wide Web",
    pages = "396-406",
    year = "2001",
    url = "citeseer.ist.psu.edu/melnik00building.html" }
Citations (may not include all citations):
280   Managing Gigabytes: Compressing and Indexing Documents and I.. - Witten, Mo et al. - 1999
150   Accessibility of information on the web (context) - Lawrence, Giles - 1999 - http://www.wwwmetrics.com/
71   The evolution of the web and implications for an incremental.. - Cho, Garcia-Molina - 2000
64   WebBase: A repository of web pages - Hirai, Raghavan et al. - 2000
50   Signature files: An access method for documents and its anal.. (context) - Faloutsos, Christodoulakis - 1984
47   Dissemination of collection wide information in a distribute.. - Viles, French - 1995
43   Incremental update of inverted list for text document retrie.. - Tomasic, Garcia-Molina et al.
36   Self-indexing inverted files for fast text retrieval - Mo, Zobel - 1996
34   arrays: A new method for on-line string searches (context) - Manber, Myers - 1990
30   Berkeley DB (context) - Olson, Bostic et al.
25   Overview of TREC-7 very large collection track - Hawking, Craswell - 1998
25   Building a distributed full-text index for the web - Melnik
21   Supporting full-text information retrieval with a persistent.. - Brown, Callan et al. - 1994
20   Database System Implementation (context) - Garcia-Molina, Ullman et al. - 2000
17   Resource scheduling for parallel database and scientific app.. - Chakrabarti, Muthukrishnan - 1996
13   Inverted file partitioning schemes in multiple disk systems - Jeong, Omiecinski - 1995
12   Compressed inverted files with reduced decoding overheads (context) - NgocVo, Mo - 1998
8   Query performance for tightly coupled distributed digital li.. (context) - Ribeiro-Neto, Barbosa - 1998
8   Personal Communication (context) - Burrows
7   Query processing and inverted indices in shared-nothing docu.. (context) - Tomasic, Garcia-Molina - 1993
6   A design of a distributed full text retrieval system (context) - Martin, MacLeod et al. - 1986
5   Structuring text within a relation system (context) - Gorssman, Driscoll - 1992
5   cient indexing technique for full-text database systems (context) - Zobel, Mo et al. - 1992
4   Information Retrieval: Data Structures and Algorithms (context) - Salton - 1989
2   cient distributed algorithms to build inverted files (context) - Ribeiro-Neto, Moura et al. - 1999
www.inktomi.com/webmap/



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www-db.stanford.edu/~rsram/cv.html):   More
Search Middleware and the Simple Digital Library.. - Paepcke, Brandriff.. (2000)   (Correct)
A Rearrangeable Algorithm for the Construction of.. - Raghavan, Manimaran (1999)   (Correct)
An Integrated Scheme for Establishing Dependable.. - Raghavan, Manimaran.. (1999)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC