| Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: A scalable mechanism for distributed information retrieval. Technical Report USC-TR 91-06, University of Southern California, Computer Science Department, 1991. |
....uniform user interface to a large number of databases accessible from several retrieval systems. User queries are translated into commands on the different systems. However, there is no support for the automatic selection of relevant databases for a users query. The Distributed Indexing mechanism [DANO91] is based on precomputed indices of databases that summarize the holdings on particular topics of other databases. The architecture has a three layer fixed structure which is suitable for incorporation of bibliographic databases, but it is not flexible enough for content based access to file ....
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: A scalable mechanism for distributed information retrieval. Technical Report USC-TR 91-06, University of Southern California, Computer Science Department, 1991.
....structure that ranked the relevance of each collection to a given query. Moffet et al. 1] qualified this position by assuming that the central server had complete statistics on each document collection but used it only as a filter in discarding or including specific collections. Danzig et al.[6] provided a mechanism for maintaining similar groupings automatically by using broker agents to maintain centralised indexing on a periodic basis by remote collection querying. 3.2 Ranking Ranking typically occurs post query processing at the collection server and again prior to the return of ....
Peter Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: A scalable mechanism for distributed information retrieval. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1991.
....a network of workstations. We evaluate the performance of different configurations and explore improvements. Under realistic workloads, we present a scalable architecture for distributed information retrieval. Other researchers have also designed distributed IR architectures [Harman et al. 1991, Danzig et al. 1991, Sheldon et al. 1994, Crowder and Nicholas, 1995] Our contribution is that we show how different system parameters affect performance and scalability of a distributed IR system. Also, we show that a simple distributed architecture performs well under large, realistic configurations. Several ....
Danzig, P. B., Ahn, J., Noll, J., and Obraczka, K. (1991). Distributed indexing: A scalable mechanism for distributed information retrieval. In Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 221--229, Chicago, IL. 38
....rules match the new document are sent a descriptor of the new document. The generator objects associated with the brokers are gathered by a directory of servers, which is queried initially by the users to obtain a list of the brokers whose generator rules match the given query. See also [Danzig et al. 1991]. Barbara and Clifton 1992] Ordille and Miller 1992] and [Simpson and Alonso 1989] are other examples of this type of approach in which users query meta information databases. A content based routing system is used in [Sheldon et al. 1994] to address the resource discovery problem. The ....
Danzig, P. B., Ahn, J., Noll, J., and Obraczka, K. 1991. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the Fourteenth ACM International Conference on Research and Development in Information Retrieval (SIGIR'91) (Oct. 1991).
....(the result merging problem or the collection fusion problem) The work on collection selection is directly beneficial to the execution performance of a distributed information system, since of course searching fewer collections takes less time. 2.3. 1 Automatic Collection Selection Danzig et al. [23] use a hierarchy of brokers to maintain indices for abstracts of primary databases (individual collections) and support Boolean keyword matching to locate the primary databases. This broker architecture is a component of the Harvest system [9] and uses the Essence system [35] to generate the ....
Danzig, P. B., Ahn, J., Noll, J., and Obraczka, K. Distributed indexing: A scalable mechanism for distributed information retrieval. In Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Chicago, IL, 1991), pp. 221--229.
....the work presented in this paper is the first that attacks this problem. The closest work to selecting a relevant partial replica which is a subset of the original collection is collection selection, i.e. locating the most relevant collections [Callan et al. 1995b, Chakravarthy and Haase, 1995, Danzig et al. 1991, Fuhr, 1996, Gravano et al. 1994, Voorhees et al. 1995] where collections are disjointed. Replica selection differs also because it directs as many queries as possible to relevant partial replicas in order to obtain performance improvements. Danzig et al. Danzig et al. 1991] use a hierarchy ....
....and Haase, 1995, Danzig et al. 1991, Fuhr, 1996, Gravano et al. 1994, Voorhees et al. 1995] where collections are disjointed. Replica selection differs also because it directs as many queries as possible to relevant partial replicas in order to obtain performance improvements. Danzig et al. [Danzig et al. 1991] use a hierarchy of brokers to maintain indices for document abstracts as a representation of the contents of primary collections, and support Boolean keyword matching to locate the primary collections. If users queries do not use keywords in the brokers, they have difficulty finding the right ....
Danzig, P. B., Ahn, J., Noll, J., and Obraczka, K. (1991). Distributed indexing: A scalable mechanism for distributed information retrieval. In Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 221--229, Chicago, IL.
....whose generator rules match the new document are sent a descriptor of the new document. The generator objects associated with the brokers are gathered by a directory of servers, that users query initially to obtain a list of the brokers whose generator rules match the given query. See also [DANO91] SA89] BC92] and [OM92] are other examples of this type of approach in which users query meta information databases. The content based routing system of [SDW 94, DS94] keeps a content label for each information server (or collection of objects, more generally) with attributes ....
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14 th Annual SIGIR Conference, October 1991.
....time to ensure consitency. Hence, Prospero meets all the goals mentioned in section 1.3 except goal (4) 28 In this chapter, we surveyed some existing systems that provide novel ways to access information. There are many others systems that have interesting ideas and discuss related issues [7, 10, 14, 16, 18, 24, 37, 42]. In general, systems that are very flexible and powerful like Prospero do not have a consistency model, and systems that are intuitive and simple like MIT SFS offer consistency guarantees but are not as powerful and do not allow users to organize the information retrieved by name and content ....
....This solution is certainly not scalable as the amount data in different file systems increases. However, F can summarize the information in F 0 and copy the summaries into F. The technique is not new. It has been adopted in wide area information retrieval systems like Essence [24] Indie [16, 18], Harvest [3] and the Synopsis file system [8] Some of these systems also have mechanisms to keep the summary of the information consistent with the actual information. These systems allow users to browse the summary (F) and filter out the information they are really interested in, and then ....
P. Danzig, J. Ahn, J. Noll, and K. Obraczka. Distributed indexing: A scalable mechanism for distributed information retrieval. Technical Report USC-TR 91-06, University of Southern California, Computer Science Dept., 1991.
....summary as the database changes. A GlOSS based meta information query facility has been implemented for WAIS servers. 11 System architectures range from centralized servers such as Lycos 12 and Yahoo 13 , to collections of brokers or mediators. In Indie (shorthand for Distributed Indexing ) [10, 9] and Harvest [5] brokers each know about some subset of the data sources, with a special broker that keeps informa9 The World Wide Web Worm is accessible at http: www.cs.colorado.edu home mcbryan WWWW.html. 10 ALIWEB is accessible at http: web.nexor.co.uk aliweb doc aliweb.html. 11 GlOSS ....
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14 th Annual SIGIR Conference, October 1991.
....into incremental queries and run periodically. Their work concentrates on relational databases, while ours is concerned with the dissemination of unstructured data (documents) using information retrieval techniques. Related to the idea of a profile index is that of the segment tree presented in [3]. There, Danzig et al. present a distributed indexing scheme as a way to provide efficient retrospective search of a large number of retrieval systems. Special sites, called index brokers, maintain indexes of remote retrieval systems. They subscribe generator queries that keep them informed of ....
DANZIG, P., AHN, J., NOLL, J., and OBRACZKA, K. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proc. ACM SIGIR Conference (Chicago, Oct. 1991), pp. 220-229.
....rules match the new document are sent a descriptor of the new document. The generator objects associated with the brokers are gathered by a directory of servers, that is queried initially by the users to obtain a list of the brokers whose generator rules match the given query. See also [DANO91] SA89] BC92] and [OM92] are other examples of this type of approach in which users query meta information databases. A content based routing system is used in [SDW 94] to address the database discovery problem. The content routing system keeps a content label for each information ....
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14 th Annual SIGIR Conference, October 1991.
....10 ALIWEB is accessible at http: web.nexor.co.uk aliweb doc aliweb.html. WAIS servers. 11 System architectures range from centralized servers such as Lycos 12 and Yahoo 13 , to collections of brokers or mediators. In Indie (shorthand for Distributed Indexing ) [10, 9] and Harvest [5] brokers each know about some subset of the data sources, with a special broker that keeps information about all other brokers. 32] and WHOIS [39] allow brokers (index servers in WHOIS ) to exchange information about sources they index, and to forward queries they receive to ....
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14 th Annual SIGIR Conference, October 1991.
....available nor practical in all environments (e.g. the Internet) Some commercial services group databases into sets with common themes, for example newspaper collections or appellate court decisions. Such grouping was a manual process for many years, but automatic methods are becoming more common [11, 2, 14]. Database grouping is an effective solution for information needs or information access patterns that can be anticipated. When information needs are diverse, content based database selection is a more effective solution. Contentbased database selection algorithms rank databases by their ....
P. B. Danzig, J. Ahn, J. Noll, and K. Obraczka. Distributed indexing: A scalable mechanism for distributed information retrieval. In Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pages 220--229, Chicago, IL, October 1991. ACM.
....Threshold PI SPI Figure 13: Disk I Os Per Document vs. Relevance Threshold 0 1 2 3 4 0 0.2 0.4 0.6 0.8 1 Relevance Threshold BF PI SPI Figure 14: Multiplications Per Document vs. Relevance Threshold Related to the idea of a profile index is that of the segment tree presented in [3]. There, Danzig et al. present a distributed indexing scheme as a way to provide efficient retrospective search of a large number of retrieval systems. Special sites, called index brokers, maintain indexes of remote retrieval systems. They subscribe generator queries that keep them informed of ....
DANZIG, P., AHN, J., NOLL, J., and OBRACZKA, K. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proc. ACM SIGIR Conference (Chicago, Oct. 1991), pp. 220-229.
....collaborate with other groups to extend it. We are in the process of adding support for the retrieval of documents maintained by the Wide Area Information Service (WAIS) 8] We plan to add filters that access directory information maintained by by semantic file systems [6] and distributed indices [4] when those systems are available. We plan to implement a new application interface for Prospero in order to allow use by existing applications without relinking. This will be accomplished by adding Prospero support to an NFS server [20] the same approach taken by semantic file systems [6] and ....
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: A scalable mechanism for distributed information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, October 1991.
....to organize information. These systems work by computing an index over the information in the database. In a large system, the amount of data to be indexed, the frequency of updates, and the need to cross administrative boundaries would preclude the use of a single index. Distributed indexing [Danzig et al. 91] is a proposed approach for generating and maintaining multiple indices. Each index would contain information on a particular topic, separate from the data being indexed. To use multiple indices for organizing information in a large system would still require the development of a mechanism to ....
P. B. Danzig, J. Ahn, J. Noll, and K. Obraczka. Distributed indexing: A scalable mechanism for distributed information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, October 1991.
....from information sources periodically and disseminates relevant information to passive users. User profiles, which consists of a number of example messages, are examined periodically and all profiles are examined. This falls into the brute force paradigm in our comparison. Danzig et al. [6] present a distributed indexing scheme as a way to provide efficient retrospective search of a large number of retrieval systems. Special sites, called index brokers, maintain indexes of remote retrieval systems. They subscribe generator queries (similar to profiles) that keep them informed of ....
P. Danzig, J. Ahn, J. Noll, K. Obraczka, "Distributed Indexing: A Scalable Mechanism for Distributed Information Retrieval," Proceedings of the ACM SIGIR Conference, Chicago, October 1991, 220-229.
....Menke et al. 1991, NASA 1990] U.S. government agencies have initiated efforts to support collaborative research and development [NASA 1991, Wulf 1989] Finally, the number and variety of network accessible resources motivates efforts into directory service and resource discovery [CCITT 1988, Danzig et al. 1991, Peterson 1988, Schwartz 1991c] Because of their scale of scope of distribution, wide area distributed applications exhibit many characteristics not found in locally distributed applications. Operations can experience additional failure modes stemming from the complexity of wide area internets, ....
P. B. Danzig, J. Ahn, J. Noll and K. Obraczka. Distributed Indexing: A Scalable Mechanism for Distributed Information Retrieval. Jan. 1991.
....and performance problems is to design a more sophisticated broker architecture. In principle, we can achieve greater effectiveness by creating brokers that specialize in a certain topic. Scalability comes from removing the central server bottleneck. In Indie (shorthand for Distributed Indexing ) [10, 9] and Harvest [5] each broker knows about some subset of the data sources, with a special broker that keeps information about all other brokers. Reference [34] and WHOIS [45] allow brokers (index servers in WHOIS ) to exchange information about sources they index, and to forward queries they ....
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14 th Annual SIGIR Conference, October 1991.
....5.1 Related Work Research on keyword based collection ranking is gaining some attention from the information retrieval community in the last few years. Some researchers have proposed the use of standard subject classification systems such as the U.S. Library of Congress subject numbering [2], Dewey Decimal Coding, and the ACM Computing Review Classification system, to categorize document collections. The main problem with this method is that it is not always easy to find which category or categories a user query falls into, unless a large and ever expanding online ....
P. Danzig, J. Ahn, J. Noll and K. Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 220-229, 1991.
....rules match the new document are sent a descriptor of the new document. The generator objects associated with the brokers are gathered by a directory of servers , that is queried initially by the users to obtain a list of the brokers whose generator rules match the given query. See also [DANO91] SA89] and [BC92] are other examples of this type of approach in which users query a meta information database. A content based routing system is used in [SDW ] to address the database discovery problem. The content routing system keeps a content label for each information server ....
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14 th Annual SIGIR Conference, October 1991.
....uniform user interface to a large number of databases accessible from several retrieval systems. User queries are translated into commands on the different systems. However, there is no support for the automatic selection of relevant databases for a user s query. The Distributed Indexing mechanism [4] is based on precomputed indices of databases that summarize the holdings on particular topics of other databases. The architecture hasa three layer fixed structure that is suitable for bibliographic databases, but it is not flexible enough for content based access to general information servers. ....
P. B. Danzig et al. Distributed indexing: A scalable mechanism for distributed information retrieval. Technical Report USC-TR 91-06, University of Southern California, Computer Science Department, 1991.
....network bandwidth, cost, and other factors will need support from the network layers responsible for routing, flow control, accounting, and policy considerations. A similar problem arises when distributing widely accessed documents [Malamud 1991, Postel 1982] directory information [CCITT 1988, Danzig et al. 1991, Schwartz 1990] and generally in any circumstance where a large volume of information is circulated among many recipients. Caching becomes particularly appealing if the information contains a moderate sized subset that is required at a large proportion of sites. For example, we conjecture that ....
P. B. Danzig, J. Ahn, J. Noll and K. Obraczka. Distributed Indexing: A Scalable Mechanism for Distributed Information Retrieval. Jan. 1991.
....about networks, such as topology, congestion, routing, and protocol usage, and a global electronic mail study [31] which investigates the organization of human social networks by analyzing mail logs collected from different Internet sites. Indie Distributed Indexing, or Indie for short [5, 6], is a distributed information discovery and retrieval architecture. Indie consists of a replicated directory of services and a collection of broker databases that automatically cluster references to related information by indexing their own data, and data stored in other brokers, databases, and ....
Peter Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: A scalable mechanism for distributed information retrieval. Proceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pages 220--229, October 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC