20 citations found. Retrieving documents...
IPEIROTIS,P.AND GRAVANO, L. 2002. Distributed search over the Hidden-Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Information Sharing Across Private Databases - Agrawal, Evfimievski, Srikant (2003)   (16 citations)  (Correct)

....to detect possible compromises [13] Schema Discovery and Heterogeneity We do not address the question of how to find which database contains which tables and what the attribute names are; we assume that the database schemas are known. We also do not address issues of schema heterogeneity. See [21, 29] and references therein for some approaches to these problems. 2.4 Related Work In [35] the authors consider the problem of finding the intersection of two lists while revealing only the intersection. They present two solutions: the first involves oblivious evaluations of n polynomials of ....

P. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In 28th Int'l Conference on Very Large Databases, Hong Kong, China, August 2002.


D_Pro: A Probabilistic Approach for Hidden Web Database.. - Liu, Luo, Cho, Chu   (Correct)

.... larger and of higher quality than the Surface Web indexed by search engines [1] In order to assist users accessing the information in the Hidden Web, recent efforts have focused on building a metasearcher or a mediator that automatically selects the most relevant databases to a user s query [2, 5, 14, 15, 16, 18, 21, 24, 25, 26]. In this framework, the metasearcher maintains a summary or statistics on each database, and consults the summary to estimate the relevancy of each database to a query. For example, Gravano et al. 14, 16] maintain (keyword, document frequency) pairs to estimate the databases with the most number ....

....distributed, db will have 20,000 2,000 lo,000 1,000 documents with both 20,000 20,000 words breast and cancer. Similarly, db2 will have 20,000 3,500 . 5,000 = 875 matching documents. Based on this esti 20,000 20,000 marion, the metasearcher returns db to the user. 2References [18] explain in detail how we may construct this table from hidden Web databases. In this paper, we improve upon this existing framework by introducing the concept of probabilistic correctness and dynamic probing for Hidden Web database selection. One of the main weaknesses of the existing method is ....

[Article contains additional citation context not shown here]

P.G. Ipeirotis, L. Gravano. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection. In Proc' of VLDB 02, China, 2002


Modeling and Managing Content Changes in Text Databases - Ipeirotis, Ntoulas, Cho..   Self-citation (Ipeirotis Gravano)   (Correct)

No context found.

P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB, 2002.


When one Sample is not Enough: Improving Text Database.. - Ipeirotis, Gravano (2004)   Self-citation (Ipeirotis Gravano)   (Correct)

No context found.

P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB 2002.


Modeling Query-Based Access to Text Databases - Agichtein, Ipeirotis, Gravano (2003)   (2 citations)  Self-citation (Ipeirotis Gravano)   (Correct)

....task that typically relies on statistical summaries of the database contents. Unfortunately, web accessible text databases do not generally export summaries of their contents. In the past, query based algorithms have been proposed to automatically build such summaries for web accessible databases [4, 8]. The goal is to construct an augmented dictionary of all words that appear in the database, and their frequency. One of the algorithms described in [4] automatically discovers the content of a text database by first querying the database with some seed words, and then extracting new words from ....

P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB 2002.


Distributed Search over the Hidden Web: Hierarchical.. - Ipeirotis, Gravano (2002)   (16 citations)  Self-citation (Ipeirotis Gravano)   (Correct)

..... breast 133,680 . cancer 101,423 . diabetes 11,344 . WebMD NumDocs: 3,346,639 . Category: Health NumDBs: 5 NumDocs: 3,747,366 . Figure 4: Associating content summaries with categories. for this statement in [17]. Hence, we can view the (potentially incomplete) content summaries of all databases in a category as complementary, and exploit this view for better database selection. For example, consider the CANCERLIT database and its associated content summary in Figure 4. As we can see, CANCERLIT was ....

....6. 5.1 Experimental Setting To evaluate the algorithms described in this paper, we use two data sets: one set of Controlled databases that we assembled locally with newsgroup articles, and another set of Web databases, which we could only access through their web search interface. In [17],we also report experiments involving the three databases used in [3] to validate our comparison further. We use a 3 level subset of the Yahoo topic hierarchy consisting of 72 categories, with 54 leaf and 18 internal topics. Controlled Database Set: We gathered 500,000 newsgroup articles ....

[Article contains additional citation context not shown here]

P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. Technical report, Columbia University, Computer Science Department, 2002.


A Semisupervised Learning Method to Merge Search Engine Results - Si, Callan (2003)   (1 citation)  (Correct)

No context found.

IPEIROTIS,P.AND GRAVANO, L. 2002. Distributed search over the Hidden-Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).


Relevant Document Distribution Estimation Method for Resource.. - Si, Callan (2003)   (3 citations)  (Correct)

No context found.

P. Ipeirotis and L. Gravano. (2002). Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).


Search in Peer-to-Peer File-Sharing System: Like Metasearch.. - Yee, Jia, Nguyen (2005)   (Correct)

No context found.

P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proc. VLDB, pages 394--405, 2002.


Downloading Hidden Web Content - Ntoulas, Zerfos, Cho (2004)   (Correct)

No context found.

P. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB, 2002.


A Semisupervised Learning Method to Merge Search Engine Results - Si, Callan (2003)   (1 citation)  (Correct)

No context found.

IPEIROTIS,P.AND GRAVANO, L. 2002. Distributed search over the Hidden-Web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).


Frontiers in Web Data Management - Junghoo John Cho   (Correct)

No context found.

P. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the Twenty-eighth International Conference on Very Large Databases, 2002.


Query Routing: Finding Ways in the Maze of the Deep Web - Govind Kabra Chengkai (2005)   (Correct)

No context found.

P. G. Ipeirotis and L. Gravano. Distributed search over the hidden-web: Hierarchical sampling and selection. In VLDB, 2002.


Organizing Structured Web Sources by Query Schemas: A.. - Bin He Tao (2004)   (Correct)

No context found.

P. Ipeirotis and L. Gravano. Distributed search over the hidden-web: Hierarchical database sampling and selection. In VLDB Conference, 2002.


Information Discovery, Extraction and Integration for the Hidden Web - Wang   (Correct)

No context found.

P. G. Ipeirotis and L. Gravano. "Distributed search over hidden web: hierarchical database sampling and selection," Proc. 28 VLDB Conf., 2002.


A Frequency-based Approach for Mining Coverage Statistics in .. - Nie, Kambhampati (2004)   (Correct)

No context found.

P. Ipeirotis and L. Gravano. Distributed Search over the Hidden-Web: Hierarchical Database Sampling and Selection. In Proc. of VLDB, 2002.


Information Sharing across Private Databases - Agrawal, Evfimievski, Srikant (2003)   (16 citations)  (Correct)

No context found.

P. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In 28th Int'l Conference on Very Large Databases, Hong Kong, China, August 2002.


A Probabilistic Approach to Metasearching with Adaptive Probing - Zhenyu Liu Chang (2004)   (1 citation)  (Correct)

No context found.

P.G. Ipeirotis, L. Gravano. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection. In Proc' of VLDB 02, China, 2002


Probe, Cluster, and Discover: Focused Extraction of.. - Caverlee, Liu, Buttler (2004)   (Correct)

No context found.

P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB '02.


Relevant Document Distribution Estimation Method for Resource.. - Si, Callan (2003)   (3 citations)  (Correct)

No context found.

P. Ipeirotis and L. Gravano. (2002). Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB).

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC