| IPEIROTIS,P.AND GRAVANO, L. 2002. Distributed search over the Hidden-Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB). |
....to detect possible compromises [13] Schema Discovery and Heterogeneity We do not address the question of how to find which database contains which tables and what the attribute names are; we assume that the database schemas are known. We also do not address issues of schema heterogeneity. See [21, 29] and references therein for some approaches to these problems. 2.4 Related Work In [35] the authors consider the problem of finding the intersection of two lists while revealing only the intersection. They present two solutions: the first involves oblivious evaluations of n polynomials of ....
P. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In 28th Int'l Conference on Very Large Databases, Hong Kong, China, August 2002.
.... larger and of higher quality than the Surface Web indexed by search engines [1] In order to assist users accessing the information in the Hidden Web, recent efforts have focused on building a metasearcher or a mediator that automatically selects the most relevant databases to a user s query [2, 5, 14, 15, 16, 18, 21, 24, 25, 26]. In this framework, the metasearcher maintains a summary or statistics on each database, and consults the summary to estimate the relevancy of each database to a query. For example, Gravano et al. 14, 16] maintain (keyword, document frequency) pairs to estimate the databases with the most number ....
....distributed, db will have 20,000 2,000 lo,000 1,000 documents with both 20,000 20,000 words breast and cancer. Similarly, db2 will have 20,000 3,500 . 5,000 = 875 matching documents. Based on this esti 20,000 20,000 marion, the metasearcher returns db to the user. 2References [18] explain in detail how we may construct this table from hidden Web databases. In this paper, we improve upon this existing framework by introducing the concept of probabilistic correctness and dynamic probing for Hidden Web database selection. One of the main weaknesses of the existing method is ....
[Article contains additional citation context not shown here]
P.G. Ipeirotis, L. Gravano. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection. In Proc' of VLDB 02, China, 2002
No context found.
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB, 2002.
No context found.
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB 2002.
....task that typically relies on statistical summaries of the database contents. Unfortunately, web accessible text databases do not generally export summaries of their contents. In the past, query based algorithms have been proposed to automatically build such summaries for web accessible databases [4, 8]. The goal is to construct an augmented dictionary of all words that appear in the database, and their frequency. One of the algorithms described in [4] automatically discovers the content of a text database by first querying the database with some seed words, and then extracting new words from ....
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB 2002.
..... breast 133,680 . cancer 101,423 . diabetes 11,344 . WebMD NumDocs: 3,346,639 . Category: Health NumDBs: 5 NumDocs: 3,747,366 . Figure 4: Associating content summaries with categories. for this statement in [17]. Hence, we can view the (potentially incomplete) content summaries of all databases in a category as complementary, and exploit this view for better database selection. For example, consider the CANCERLIT database and its associated content summary in Figure 4. As we can see, CANCERLIT was ....
....6. 5.1 Experimental Setting To evaluate the algorithms described in this paper, we use two data sets: one set of Controlled databases that we assembled locally with newsgroup articles, and another set of Web databases, which we could only access through their web search interface. In [17],we also report experiments involving the three databases used in [3] to validate our comparison further. We use a 3 level subset of the Yahoo topic hierarchy consisting of 72 categories, with 54 leaf and 18 internal topics. Controlled Database Set: We gathered 500,000 newsgroup articles ....
[Article contains additional citation context not shown here]
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. Technical report, Columbia University, Computer Science Department, 2002.
No context found.
IPEIROTIS,P.AND GRAVANO, L. 2002. Distributed search over the Hidden-Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).
No context found.
P. Ipeirotis and L. Gravano. (2002). Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).
No context found.
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proc. VLDB, pages 394--405, 2002.
No context found.
P. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB, 2002.
No context found.
IPEIROTIS,P.AND GRAVANO, L. 2002. Distributed search over the Hidden-Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).
No context found.
P. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the Twenty-eighth International Conference on Very Large Databases, 2002.
No context found.
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden-web: Hierarchical sampling and selection. In VLDB, 2002.
No context found.
P. Ipeirotis and L. Gravano. Distributed search over the hidden-web: Hierarchical database sampling and selection. In VLDB Conference, 2002.
No context found.
P. G. Ipeirotis and L. Gravano. "Distributed search over hidden web: hierarchical database sampling and selection," Proc. 28 VLDB Conf., 2002.
No context found.
P. Ipeirotis and L. Gravano. Distributed Search over the Hidden-Web: Hierarchical Database Sampling and Selection. In Proc. of VLDB, 2002.
No context found.
P. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In 28th Int'l Conference on Very Large Databases, Hong Kong, China, August 2002.
No context found.
P.G. Ipeirotis, L. Gravano. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection. In Proc' of VLDB 02, China, 2002
No context found.
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB '02.
No context found.
P. Ipeirotis and L. Gravano. (2002). Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC