| L. Gravano and H. Garcia-Molina. Generalizing GloSS to Vector-Space Databases and Broker Hierarchies. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB), 1995. |
....based on the degree to which a collection ranking produced by an approachcanapproximate a desired collection ranking. Collection selection evaluation measures are discussed in detail in French and Powell [11] For these experiments, we use only the n measure defined byGravano and Garc ia Molina [21]. The n measure is calculated with respect to two rankings, a baseline ranking B that represents the desired collection ranking and an estimated ranking E produced by the collection selection approach. Our goal is to determine how well E approximates B. We assume that each collection C i has ....
....i and C e i denote the collection in the i th ranked position of rankings B and E respectively. Let B i = merit (q# C b i ) and E i = merit (q# C e i ) 4) denote the merit associated with the i th ranked collection in the baseline and estimated rankings respectively. Gravano and Garc ia Molina [21] defined n as follows. n = P i=1 E i P i=1 B i : 5) This is a measure of howmuch of the available merit in the top n ranked collections of the baseline hasbeenaccumulated via the top n collections in the estimated ranking. 23 6.6.2 Document Retrieval For the document retrieval ....
L. Gravano and H. Garc'ia-Molina. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. In Proc. of the 21st VLDB Conference, pages 78--89, 1995.
....left out due to space limitation. Experimental results will be presented in Section 6. We conclude the paper in Section 7. 2. RELATED WORK In the last several years, a large number of research papers on issues related to metasearch engines or distributed collections have been published (e.g. [1, 5, 8, 9, 10, 17, 18, 19, 26, 27, 31, 33, 30, 34]) Due to space limitation, we compare our approach only with the existing works which are closest to what is presented here. A classification of different approaches can be found in [22] In [31] it is shown that if databases are ranked in descending order of similarity of the most similar ....
....of m. Another possible definition to rank databases optimally is to have the highest ranked database having the largest sum of similarities for documents having similarities beyond certain threshold, the second ranked database having the second largest sum, etc. Such an approach was taken by [10]. It was shown in [30] that the definition suggested here yields significantly higher retrieval effectiveness than that given in [10] A different database selection algorithm is given in [28] However, without query expansion, substantial deterioration in retrieval effectiveness relative to ....
[Article contains additional citation context not shown here]
L. Gravano and H. Garcia-Molina. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. VLDB, 1995.
.... larger and of higher quality than the Surface Web indexed by search engines [1] In order to assist users accessing the information in the Hidden Web, recent efforts have focused on building a metasearcher or a mediator that automatically selects the most relevant databases to a user s query [2, 5, 14, 15, 16, 18, 21, 24, 25, 26]. In this framework, the metasearcher maintains a summary or statistics on each database, and consults the summary to estimate the relevancy of each database to a query. For example, Gravano et al. 14, 16] maintain (keyword, document frequency) pairs to estimate the databases with the most number ....
....if it contains the highest number of matching documents [14, 16] This number of matching documents is referred to as the document frequency of the query in the database. Document similarity based A database is considered the most relevant if it contains the most similar document(s) to the query [15, 21, 25]. Query document similarity is often computed using the standard Cosine function [22] Relevancy estimation A metasearcher has to estimate the approximate relevancy of a database to a query using a pre collected summary. Note that this estimate may or may not be the same as the actual relevancy ....
[Article contains additional citation context not shown here]
L. Gravano, and H. Garcia-Molina. Generalizing GLOSS to VectorSpace databases and Broker Hierarchies. In Proc. of VLDB '95, Switzerland, 1995
....assuming that we want submit an information retrieval query to a subset of the databases available only, decide which databases are more likely to contain the most relevant documents. A number of algorithms have been proposed and experimental results show that these algorithms achieve good results [3, 9, 11, 23, 22]. Recent work [14, 17] shows that good performance can be achieved in this setting, if the collections are conceptually separated. However, these algorithms assume that the querying party has some statistical knowledge about the contents of each database (for example, word frequencies in ....
L. Gravano, and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. In Proceedings of the 21st VLDB Conference (Zurich, Switchland, 1995).
....collections. 1 Moreover, it is a growing challenge to provide high precision search results given the vast scope of the Web or commercial databases. This situation introduces the need for reliable, high performance distributed searching. Because of the work of Gravano, Callan, French, and others [16, 5, 11, 26], aspects of distributed search have been divided into four principle activities: 1) collection ranking; 2) collection selection; 3) searching the chosen collections; 4) merging the results into a uniform set. Their approaches to these issues have made considerable progress in terms of ....
.... is evidence that improved collection recall can lead to improved overall distributed retrieval [36, 26] Gravano, et al. was one of the seminal investigators of the database selection problem for large data environments, including the Internet, by way of a Glos sary of Servers Server (GLOSS) [17, 16, 18]. Originally developed around a Boolean query retrieval model (bG1OSS) he subsequently generalized his approach to the vector space retrieval model (vG1OSS) 16] and demonstrated that this framework and its associated goodness metric delivers effective document retrieval in large data ....
[Article contains additional citation context not shown here]
L. Gravano and H. Garc/a-Molina. Generalizing GLOSS to vector-space databases and broker hierarchies. In Proc. of VLDB '95, pages 78-89. Morgan Kaufmann, Sept. 1995.
....lists of all sources. For retrieval from text, one of the methods for information source selection is use of automatically generated metadata from the content. Comparing the query to metadata about the sources can reveal the possible relevance of each source to the query. CaR1 [4] and gGloss [13] are examples of such metadata in infor mation source selection. The CaR1 model is based on inference networks. CaR1 creates a virtual document containing Document Frequency (DF) and Inverse Collection Frequency (ICF) The ICF indicates importance of the term across the collections and is ....
L. Gravano and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. In In Proceedings of the 21st VLDB Conference, Zurich, Switzerland, 1995.
.... document frequency (df) and collection term frequency (ctf) on the effectiveness of database selection. We introduce a family of database selection algorithms based on this information, and compare their effectiveness to two existing database selection approaches, CORI [3] and gGlOSS [9]. We demonstrate that a simple selection algorithm that uses only document frequency information is more effective than gGlOSS, and achieves effectiveness that is very close to that of CORI. 1 Introduction As the number of online databases or collections increases, database or collection ....
....of a selection technique is measured as how well the technique estimates the given baseline ranking. Database selection is also called collection selection [3] or text database resource discovery [10] Several algorithms for database selection have been proposed and independently evaluated [1 4, 8, 9, 12 14, 16 20]. Two well known selection algorithms, CORI [3] and gGlOSS [9] have been compared to each other, and CORI has been found to be more effective than gGlOSS when the goal is to locate collections containing the largest number of relevant documents [6] In this work we introduce a family of database ....
[Article contains additional citation context not shown here]
L. Gravano and H. Garcia-Molina, Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies, Proceedings of the 21st International Conference on Very Large Databases (VLDB), 1995.
....paper, we survey the measures for collection ranking used in the current research, discuss their strengthes and weaknesses, and propose new measures based on the concepts of precision and recall. In Section 2, we briefly survey the measures used in our previous research[1] NetSerf[g] and GLOSS[a][5]. In Section 3 we define the new measures, which are variations of precision and recall. In the final section, we summerize the discussions. 2 A Survey of Measures in Collection Ranking Evaluation 2.1 Ranking with INQUERY In our previous research on collection ranking [1] the mean square error ....
....documents, Ranks can retrieve 9 relevant document, and Rank4 can retrieve I relevant document. 2.3 CLOSS GLOSS defines its measures based on the well known precision and recall in conventional docu ment retrieval. GLOSS has two versions: the Boolean version[4] and the vector space version[5]. The definitions of the measures in these two versions have slight differences. GLOSS uses databases to refer to collections in our context. 2.3.1 The Boolean Version In the Boolean version of GLOSS, GLOSS defines Right(q, DB) where q is a query and DB is a set of databases. There are two ....
[Article contains additional citation context not shown here]
Luis Gravano and Hector Garcia-Molina. Generalizing GLOSS to Vector-Space Databases and Broker Hierarchies. Proceedings of the 21st VLDB Conference, Zurich, Switchland 1995.
....based on the degree to which a collection ranking produced by an approach can approximate a desired collection ranking. Col lection selection evaluation measures are discussed in detail in French and Powell [9] For these experiments, we use only the n measure defined by Gravano and Garcfa Molina [18]. The n measure is calculated with respect to two rankings, a baseline ranking B that represents the desired collection ranking and an estimated ranking E produced by the collection selection approach. Our goal is to determine how well E approximates B. We assume that each collection Ci has some ....
....respect to query q. Let Cb and C, denote the collection in the i th ranked position of rankings B and E respectively. Let Bi = merit (q, Cb ) and Ei = merit (q, C, 1) denote the merit associated with the i th ranked collection in the baseline and estimated rankings respectively. Gravano et a . [18] defined 7 as follows. n Z i (2) B i=1 i This is a measure of how much of the available merit in the top n ranked collections of the baseline has been accumulated via the top n collections in the estimated ranking. 5.6.2 Document Retrieval For the document retrieval experiments reported ....
L. Gravano and H. Garcla-Molina. Generalizing GLOSS to Vector-Space Databases and Broker Hierarchies. In Proc. of the 21st VLDB Conf., pages 78-89, 1995.
....on mediators. The concept of mediator was initially proposed by Wiederhold [23] Since then many different approaches have been proposed in order to build mediators over relational databases (e.g. see [13, 7, 8, 24] SGML documents (e.g. see [5] or information retrieval systems (e.g. see [22, 9, 6, 19, 16]) and web based sources (e.g. see [1, 3] Comparing to the integration approaches for relational databases, namely, federated databases, relational warehousing, and relational mediation, our model is similar in spirit to the relational mediators (see [8] for a review) 8 Roughly, relational ....
L. Gravano and H. Garcia-Molina. "Generalizing GlOSS To Vector-Space Databases and Broker Hierarchies". In Proc 21st VLDB Conf., Zurich, Switzerland, 1996.
....The conclusion is drawn in Section 5. 2 Related Work Meta search is also referred to as the hierarchical distributed search. Many papers addressing this issue are published recently. As mentioned before, server ranking is the most critical problem in meta search. 1. gGloss server ranking [5, 6] is often regarded as the baseline for other server ranking methods. For each document containing some query terms in a server s index database, it computes the similarity between the document and the query. In gGloss ranking, the ideal(0) goodness score of the server is the sum of these ....
....c, is defined as sim r times the number of documents that contain some query terms in c: sim c = p r Nsim r (1) 1 Gamma (1 Gamma p 1 ) 1 Gamma p 2 ) 1 Gamma pm ) N sim r : 3.3. 2 High Correlation Scenario Our high correlation scenario is more or less similar to that of gGloss [5] despite that our specification is slightly different. We make two assumptions in the same manner as those made in gGloss [5] 1. Assume p 1 p 2 . If a document contains t 1 , then it will also contain t 2 . 2. The weight of a term is distributed uniformly over all the documents containing the ....
[Article contains additional citation context not shown here]
L. Gravano and H. Garcia-Molina. Generalizing GLOSS to vector-space databases and broker hierarchies. In VLDB, pages 78--89, 1995.
.... and the contents of those collections cannot be easily ascertained without exhaustive measures, automatic collection selection algorithms can be used to assist in the choice of which collections to search by identifying those collections that are most likely to satisfy the information need[2,3,4,6,8,11,12,13,14]. Collection selection algorithms require information about the contents of the collections among which they are selecting. We will use the terminology of Callan et al. 1] and refer to the summary content information as a language model of the collection. For our purposes, a language model is ....
L. Gravano and H. Garca-Molina. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. In Proceedings of the 21st VLDB Conference, pages 78--89, 1995.
....engine has been accumulated in recent years. One of the main challenging problems is the database selection problem, which is to identify, for a given user query, the local search engines that are likely to contain useful documents (Baumgarten, 1997; Callan et al., 1995; Dreilinger and Howe, 1997; Gravano and Garcia Molina, 1995; Kahle and Medlar, 1991; Koster, 1994; Liu et al., 2001; Manber and Bigot, 1997; Meng et al., 1998; Selberg and Etzioni, 1997; Yu et al., 1999; Yuwono and Lee, 1997) The objective of performing database selection is to improve eciency as it enables the metasearch engine to send each query to ....
....methods rank databases for each query based on some quality measure. These measures are often based on the similarities between the query and the documents in each database. Similarities are computed based on the match of terms in the query and documents. For example, a measure used in gGlOSS (Gravano and Garcia Molina, 1995) to determine the quality of a database with respect to a given query is the sum of the similarities between the query and highly similar documents in the database when the similarity is greater than or equal to a threshold. As another example, we have developed a scheme to rank databases ....
[Article contains additional citation context not shown here]
Gravano L, Garcia-Molina H (1995) Generalizing GlOSS to vector-space databases and broker hierarchies. Proceedings of the International Conferences on Very Large Data Bases, Zurich, Switzerland, September 1995, pp. 78-89.
....technique based on this framework. Experimental results will be presented in Section 5. We conclude the paper in Section 6. 2 Related Work In the last several years, a large number of research papers on issues related to metasearch engines or distributed collections have been published (e.g. [1, 7, 10, 12, 13, 25, 27, 28, 29, 34, 35, 36, 37, 40, 43, 44]) For database selection, most approaches rank the databases for a given query based on certain usefulness measures. For example, gGlOSS uses the sum of document similarities that are higher than a threshold [12] CORI Net uses the probability that a database contains relevant documents due to ....
....have been published (e.g. 1, 7, 10, 12, 13, 25, 27, 28, 29, 34, 35, 36, 37, 40, 43, 44] For database selection, most approaches rank the databases for a given query based on certain usefulness measures. For example, gGlOSS uses the sum of document similarities that are higher than a threshold [12], CORI Net uses the probability that a database contains relevant documents due to the terms in a given query [7] D WISE uses the sum of the document frequencies of query terms weighted by the cue validity variance of the document frequencies of each query term [44] Q Pilot uses the dot product ....
L. Gravano, and H. Garcia-Molina. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. VLDB, 1995.
....However, when hundreds or thousands of searchable text databases are available, a person may need help deciding which databases to select. Automatic database selection algorithms assist with this choice by identifying the databases that best satisfy the information need, according to some metric [9, 4, 16, 15, 18, 12, 6, 17]. Database selection algorithms need to know what each database contains. This information is often derived from a unigram language model, which lists the words that occur in the database and their frequencies of occurrence. Two methods have been proposed for acquiring such metadata ....
....about the behavior of query based sampling and three well known ranking algorithms under a range of realistic conditions. Whatever the outcome, the results would provide information to guide future research. 3. 1 Database Ranking Algorithms Three database ranking algorithms were studied: gGlOSS [9, 7, 10], CORI [4, 7, 17] and CVV [18] These algorithms were chosen because they are relatively well known in database and or information retrieval research communities. All three algorithms are easy to implement. None of them require training data. Two other database rankings were included as ....
[Article contains additional citation context not shown here]
L. Gravano and H. Garc'ia-Molina. Generalizing GLOSS to vector-space databases and broker hierarchies. In Proceedings of the 21st International Conference on Very Large Databases (VLDB), pages 78--89, 1995.
....be presented in Section 5. We brie y describe our prototype system in Section 6. Finally,we conclude the paper in Section 7. 2. RELATED WORK In the last several years, alargenumber of research papers on issues related to metasearch engines or distributed collections have been published (e.g. [1, 4, 6, 8, 9, 17, 19, 20, 21, 26, 27, 28, 32, 37]) For database selection, most approaches rank the databases for a given query based on certain usefulness measures. For example, gGlOSS uses the sum of document similarities that are higher than a threshold [8] CORI Net uses the probability that a database contains relevant documents due to ....
....have been published (e.g. 1, 4, 6, 8, 9, 17, 19, 20, 21, 26, 27, 28, 32, 37] For database selection, most approaches rank the databases for a given query based on certain usefulness measures. For example, gGlOSS uses the sum of document similarities that are higher than a threshold [8], CORI Net uses the probability that a database contains relevant documents due to the terms in a given query [4] D WISE uses the sum of weighted document frequencies of query terms [37] Q Pilot uses the dot product similaritybetween an expansion query and a database description [27] and one of ....
[Article contains additional citation context not shown here]
L. Gravano, and H. Garcia-Molina. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. VLDB, 1995.
....Links between traders are treated as dynamic and can be created and destroyed randomly. While the previous mentioned approaches deal with object trading only and because of that cannot applied to our problem directly, Stanford suggests a model for federated trading for literature sources (Gravano and Garca Molina, 1995). Their main idea is the installation of glossary services (GlOSS) that summarise the content of text databases. Accordingly, they suggest hierarchies of glossary services by summarising the content of several glossaries and selecting appropriate glossary services to forward queries. However, the ....
L. Gravano and H. Garca-Molina. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. In: Proceedings of the 21st Very Large Data Bases Conference, Zrich, Switzerland, 1995.
.... or thousands of searchable databases and the content of those databases cannot be easily ascertained, automatic database selection algorithms can be used to assist in the choice of which databases to search by identifying those databases that are most likely to satisfy the information need [GGM95, CLC95, FPV98, FPC 99b, FPC99a, XC99] Database selection algorithms require information about the contents of those databases over which they are selecting. A number of ways have been proposed for defining what information is required and how to acquire it, cf. GCGMP97, HT99, CCD99] ....
L. Gravano and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. In Proceedings of the 21st International Conference on Very Large Databases, 1995.
....invoking useless search engines for a query, we should first identify those search engines that are most likely to provide useful results to the query and then pass the query to only the identified search engines. Examples of systems that employ this approach include WAIS [10] ALIWEB [14] gGlOSS [6], SavvySearch [8] ProFusion [5] and D WISE [26] The problem of identifying potentially useful databases to search is known as the database selection problem. Database selection is done by comparing each query with all database representatives. If a user only wants the m most desired documents ....
....paper to incorporate the linkage information among documents. In order that the linkage information is utilized properly, 3 the database representatives as well as the procedure to estimate the similarity of the most similar document in each database need to be modified. 3. The gGlOSS project [6] is similar in the sense that it ranks databases according to some measure. However, there is no necessary and sufficient condition for optimal ranking of databases; there is no precise algorithm for determining which documents from which databases are to be returned. 4. A necessary and ....
[Article contains additional citation context not shown here]
L. Gravano, and H. Garcia-Molina. Generalizing GlOSS to Vector-Space databases and Broker Hierarchies. International Conferences on Very Large Data Bases, 1995.
....University of New York at Binghamton, Binghamton, NY 13902 Abstract We consider the processing of digital library queries, consisting of a text component and a structured component in distributed environments. The text component can be processed using techniques given in previous papers such as [8, 11, 7]. In this paper, we concentrate on the processing of the structured component of a distributed query. Histograms are constructed and algorithms are given to provide estimates of the desirabilities of the databases with respect to the given query. Databases are selected in descending order of ....
....of digital video and even library to a less extent) This component can be matched against the title field with each title being treated as a document. Methods for selecting potentially useful databases for textual queries in distributed environments are given in various papers such as [7, 8, 11]. In [11] an intermediate result of processing such a text query is a ranking of the databases in descending order of similarity. Each database is associated with a similarity (which is an inverse of distance) with respect to the query. The second component of the query is around 1998 and can ....
L. Gravano and H. Garcia-Molina. Generalizing GlOSS to Vector-Space databases and Broker Hierarchies, VLDB, 1995.
....suggest to employ an inference network to rank not only documents at the provider sites, but also subcollections at the broker site, and to use the subcollection ranking for selecting subcollections and for fusing local rankings. A third example is the further development of the GlOSS approach [GGM95] described by Meng et al. MLY 98] Their approach to overcome the subcollection fusion problem shows some similarities to the one presented in this paper and first outlined in [Bau97] However, the metadata to be maintained at the broker site is more extensive and the agreement on one all ....
L. Gravano and H. Garcia-Molina. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. In Proc. 21th VLDB Conf., 1995.
....preliminary version of this paper appears as a poster paper in ACM DL 99 latest technical reports on a particular topic from the databases of physics research laboratories across the country. There are various approaches to facilitate the retrieval of information scattered in distributed sources [4, 8, 13, 14]. An approach is to construct a global search engine, also known as metabroker or metasearch engine, on top of a group of local search engines (databases) However, the local search engines underlying a global search engine are usually autonomous. With respect to a given query Q, the similarity of ....
L. Gravano and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. In VLDB, 1995.
....collects and reorganizes the results from its underlying search engines. A simple two level architecture of a metasearch engine is depicted in Figure 1. This two level architecture can be generalized to a hierarchy of more than two levels when the number of local search engines becomes 2 large [2, 26, 77]. Global Interface Search Engine 1 Search Engine 2 Search Engine n Figure 1: A Simple Metasearch Architecture There are a number of reasons for the development of a metasearch engine and we discuss these reasons below. 1. Increase the search coverage of the Web. A recent study ....
....and stored in each database representative and queries are expanded based on a technique called local context analysis [72] then the CORI Net approach can select useful databases more accurately. gGlOSS approach The gGlOSS (generalized Glossary Of Servers Server) system is a research prototype [26]. In gGlOSS, each component database is represented by a set of pairs (df i ; W i ) where df i is the document frequency of the ith term and W i is the sum of the weights of the ith term over all documents in the component database. A threshold is associated with each query in gGlOSS to indicate ....
[Article contains additional citation context not shown here]
L. Gravano, and H. Garcia-Molina. Generalizing GlOSS to Vector-Space databases and Broker Hierarchies. VLDB, 1995.
....technique based on this framework. Experimental results will be presented in Section 5. We conclude the paper in Section 6. 2 Related Work In the last several years, a large number of research papers on issues related to metasearch engines or distributed collections have been published (e.g. [1, 4, 6, 8, 9, 17, 19, 20, 21, 26, 27, 28, 31, 36]) For database selection, most approaches rank the databases for a given query based on certain usefulness measures. For example, gGlOSS uses the sum of document similarities that are higher than a threshold [8] CORI Net uses the probability that a database contains relevant documents due to the ....
....collections have been published (e.g. 1, 4, 6, 8, 9, 17, 19, 20, 21, 26, 27, 28, 31, 36] For database selection, most approaches rank the databases for a given query based on certain usefulness measures. For example, gGlOSS uses the sum of document similarities that are higher than a threshold [8], CORI Net uses the probability that a database contains relevant documents due to the terms in a given query [4] D WISE uses the sum of the document frequencies of query terms weighted by the cue validity variance of the document frequencies of each query term [36] Q Pilot uses the dot product ....
[Article contains additional citation context not shown here]
L. Gravano, and H. Garcia-Molina. Generalizing GlOSS to Vector-Space databases and Broker Hierarchies. VLDB, 1995. 23
.... collections [4] others require a set of reference queries or topics with relevance judgements, and select those collections with the largest numbers of relevant documents for topics that are similar to the new query [17] We are interested in the class of approaches including CORI [1] gGlOSS [6], and others [8] 20] that characterize different collections using collection statistics like term frequencies. These statistics, which are used to select or rank the available collections relevance to a query, are usually assumed to be available from cooperative providers. Alternatively, ....
Gravano, L., and Garcia-Molina, H. Generalizing GLOSS to vector-space databases and broker hierarchies. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB), pages 78-89, 1995.
....1999; French et al. 1998) Indeed, there are few widely available alternative sources of data for creating resource selection testbeds. The alternative data used most widely, created at Stanford as part of research on the GlOSS and gGlOSS resource selection algorithms (Gravano et al. 1994; Gravano and Garc ia Molina, 1995), is large and realistic, but does not provide the same breadth of relevance judgements. 3. RESOURCE DESCRIPTION The first tasks in an environment containing many databases is to discover and represent what each database contains. Discovery and representation are closely related tasks, because ....
....it contains about U.S. politics, international affairs, wine, and other information of general interest. A simple and robust solution is to to represent each database by a description consisting of the words that occur in the database, and their frequencies of occurrence (Gravano et al. 1994; Gravano and Garc ia Molina, 1995; Callan et al. 1995b) or statistics derived from frequencies of occurrence (Voorhees et al. 1995a) We call this type of representation a unigram language model. Unigram language models are compact and can be obtained automatically by examining the documents in a database or the document ....
[Article contains additional citation context not shown here]
Gravano, L. and Garc'ia-Molina, H. (1995). Generalizing GLOSS to vector-space databases and broker hierarchies. In Proceedings of the 21st International Conference on Very Large Databases (VLDB), pages 78--89.
....of these resources, and the de facto requirement of pruning the resource set of interest to manageable size has increased attention on retrieval in the distributed environment. Distributed information retrieval encompasses many important problems, including: ffl database or collection selection [7, 11, 15, 13, 18, 24, 25, 14]; ffl collection fusion or results merging [2, 7, 8, 23, 22] and ffl dissemination of collection information to increase retrieval effectiveness [10, 21, 20] 1 In this paper we focus on database selection as a fundamental problem in the distributed environment, providing the first direct ....
....[9] and Xu and Callan [24] has focused on the importance of including relevance information either in the evaluation of the ranked list of collections to send the query to, or in the eventual lists of documents returned by all collections. The DB approach, embodied in the work of Gravano et al.[13, 14] does not include relevance information and compares the selection technique against an ideal ordering that represents the behavior of the individual search systems. The rationale is that a selection technique can t do better than the underlying constituent systems. Several techniques for ....
[Article contains additional citation context not shown here]
L. Gravano and H. Garcia-Molina. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. In Proceedings of the 21st International Conference on Very Large Databases (VLDB), 1995.
....the overhead in locating a specialized engine are too great. The existence of better methods for locating specialized search engines can help, and much research has been done in this area. Several methods of selecting search engines based on user queries have been proposed, for example GlOSS [33, 34] maintains word statistics on available database, in order to estimate which databases are most useful for a given query. Related research includes [19, 24, 26, 32, 46, 49, 61, 62] It would be of great benefit if the major web search engines attempted to direct users to the best specialized ....
Luis Gravano and Hector Garc ia-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In International Conference on Very Large Databases, VLDB, pages 78--89, 1995.
....for their selection is GlOSS [30] It focuses on the identification of relevant text databases for a given query and uses the word frequencies to estimate the result sizes of the query. The hard problem of modeling a user s information need is not tackled in GlOSS. The generalized version gGlOSS [29] also deals with vector space databases and queries, but at the expense of additionally required statistical information about the databases. The Stanford Proposal for Internet Metasearching STARTS 24 tries to facilitate the three main tasks a metasearcher has to perform: the selection of the ....
Luis Gravano and H'ector Garc'ia-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In Proc. of the 21st Int. Conf. on Very Large Data Bases (VLDB), pages 78--89, September 1995. !URL: http://pub-db.stanford.edu/ or http://www-db.stanford.edu/pub/gravano/1995/vldb95.ps?
....However, we would like to point out that the Potential function as defined above is used only to rank anchor points. It is not used to compute an accurate estimate of the number of relevant pages. In fact, this independent assumption is used in other information retrieval techniques, such as Gloss [8, 9, 10]. A Gloss server is one that maintains certain statistics about a number of information sources such as document libraries. Given a user query (conjunctive or disjunctive) the Gloss server estimates, based on the statistics, which information source is most likely to contain the largest number of ....
L. Gravano et. al. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. VLDB'95, May 1995.
....access. Furthermore, presumably only a few collections contain useful items for a given query. Therefore, a crucial component of a Digital Library is a tool that assists users in discovering the useful resources for their queries. Finding the best collections for a query is the goal of GlOSS [5, 6], a resource discovery service within our Digital Library testbed. A user submits a query to GlOSS, and GlOSS returns a rank of the available collections. This rank is based on estimates of the expected number of hits for the query at each collection (Fig. 1) Then, the user submits the query to ....
Luis Gravano and He ctor Garc a-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In Proceedings of VLDB '95, pages 7889, September 1995
No context found.
L. Gravano, and H. Garcia-Molina. Generalizing GlOSS to Vector-Space databases and Broker Hierarchies. International Conferences on Very Large Data Bases, 1995.
....query. If high values of both Rn and Pn are of interest, then vGlOSS should produce ranks based on the high correlation assumption of Section 2.3.1: rank Max (l) is the best candidate for rank Ideal(l) with l 0. If only high values of Rn are of interest, then vGlOSS can do without matrix L. Gravano et al. 15 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1 3 5 7 9 11 13 15 Rn n Max(0.2) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Sum(0.2) 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Max(0) Sum(0) # # # # # # # # # # # # # # # # Fig. 2. Parameter Rn as a function of n, the number of databases examined ....
Gravano, L. and Garc a-Molina, H. 1995b. Generalizing GlOSS to vector-space databases and broker hierarchies. Technical Report STAN-CS-TN-95-21 (May), Computer Science Department, Stanford University.
No context found.
L. Gravano and H. Garcia-Molina. Generalizing GloSS to Vector-Space Databases and Broker Hierarchies. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB), 1995.
No context found.
L. Gravano and H. Garc a-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In International Conference on Very Large Databases, VLDB, pages 78--89, 1995.
No context found.
L. Gravano and H. Garcia-Molina. "Generalizing GlOSS To Vector-Space Databases and Broker Hierarchies". In Proc 21st VLDB Conf., Zurich, Switzerland, 1996.
No context found.
L. Gravano and H. Garc a-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In International Conference on Very Large Databases, VLDB, pages 78--89, 1995.
No context found.
L. Gravano and H. Garc a-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In International Conference on Very Large Databases, VLDB, pages 78--89, 1995.
No context found.
L. Gravano and H. Garcia-Molina. Generalizing GloSS to Vector-Space Databases and Broker Hierarchies. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB), 1995.
No context found.
GRAVANO, L. and GARC IA-MOLINA, H., Generalizing GlOSS to vector-space databases and broker hierarchies, in: Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95), pp. 78--89 (1995).
No context found.
L. Gravano and H. Garcia-Molina. "Generalizing GlOSS To Vector-Space Databases and Broker Hierarchies". In Proc 21st VLDB Conf., Zurich, Switzerland, 1996.
No context found.
GRAVANO, L., AND GARC IA-MOLINA, H. Generalizing GlOSS to vectorspace databases and broker hierarchies. In International Conference on Very Large Databases, VLDB (1995), pp. 78--89.
No context found.
L. Gravano and H. Garcia-Molina. Generalizing GloSS to Vector-Space Databases and Broker Hierarchies. In Proceedings of the 21 st International Conference on Very Large Databases (VLDB), 1995.
No context found.
Gravano L., and Garcia-Molina H. "Generalizing gloss to vector-space databases and broker hierarchies". In Proceedings of the 21st VLDB Conference (Zurich, Switchland, 1995).
No context found.
L. Gravano and H. Garcia-Molina, "Generalizing gloss to vector-space databases and broker hierarchies". Proc. of VLDB'95, Zurich, Switzerland, 1995.
No context found.
L. Gravano, and H. Garcia-Molina. Generalizing GlOSS to VectorSpace databases and Broker Hierarchies. In Proc. of VLDB '95, Switzerland, 1995
No context found.
L. Gravano and H. Garc a-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In International Conference on Very Large Databases, VLDB, pages 78--89, 1995.
No context found.
L. Gravano and H. Garc a-Molina. Generalizing GlOSS to vector-space databases and broker hierarchies. In International Conference on Very Large Databases, VLDB, pages 78--89, 1995.
No context found.
L. Gravano and H. Garcia-Molina. "Generalizing GlOSS To Vector-Space Databases and Broker Hierarchies". In Proc 21st VLDB Conf., Zurich, Switzerland, 1996.
No context found.
L. Gravano and H. Garca-Molina. Generalizing GLOSS to vector-space databases and broker hierarchies. In Proceedings of the 21st International Conference on Very Large Databases (VLDB), pages 78-89, 1995.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC