34 citations found. Retrieving documents...
J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97-130, 2001.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Extending SDARTS: Extracting Metadata from Web Databases .. - Ipeirotis, Barry.. (2002)   (Correct)

....summaries, we resort to document sampling. A good quality content summary of a collection can be derived from a small, representative document sample from the collection [3] Earlier research has shown that we can extract such a document sample with a relatively small number of query probes [3, 4, 12]. An approximate content summary can then be built from the documents that best match each query probe at the collection in question. Interestingly, the e#ectiveness of the best database selection algorithms does not su#er significantly from using approximate content summaries extracted in this ....

....probe at the collection in question. Interestingly, the e#ectiveness of the best database selection algorithms does not su#er significantly from using approximate content summaries extracted in this way, rather than the complete content summaries for which the algorithms were originally designed [6, 4]. To make the generation of content summaries in SDARTS completely automatic we adopted the method we described in [12] which uses a small number of topically focused query probes to produce highly accurate approximate content summaries. Our probing method relies on short query probes generated ....

Jamie Callan and Margaret Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


Query- vs. Crawling-based Classification of Searchable.. - Gravano, Ipeirotis.. (2002)   (Correct)

....web databases. The average number of queries sent to each database was 182, and no documents needed to be retrieved from the databases. Furthermore, the number of words per query ranged between just one and four words. Further details of our algorithm and evaluation are described in [8] See [2, 3, 10, 6, 7] 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CNN Sports Illustrated Johns Hopkins AIDS Service Tom s Hardware Guide Office of Scientific and Technical Information Duke University Rare Books Specificity Arts Computers Health Science Sports Figure 2: Distribution of documents in the ....

Jamie Callan and Margaret Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


Modeling Query-Based Access to Text Databases - Agichtein, Ipeirotis, Gravano (2003)   (2 citations)  (Correct)

....way to retrieve their documents is via querying. Second, applications often require only a small fraction of a database s contents, so retrieving relevant documents via querying is an attractive choice from an efficiency viewpoint, even for crawlable databases. Various querybased methods (e.g. [4, 1]) have been proposed in the past for retrieving and extracting the information stored in databases via querying. These algorithms share the general approach of starting with a small set of queries, retrieving some documents from the database, extracting some information from them, and potentially ....

....task that typically relies on statistical summaries of the database contents. Unfortunately, web accessible text databases do not generally export summaries of their contents. In the past, query based algorithms have been proposed to automatically build such summaries for web accessible databases [4, 8]. The goal is to construct an augmented dictionary of all words that appear in the database, and their frequency. One of the algorithms described in [4] automatically discovers the content of a text database by first querying the database with some seed words, and then extracting new words from ....

[Article contains additional citation context not shown here]

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


Query- vs. Crawling-based Classification - Of Searchable Web (2002)   (Correct)

....Johns Hopkins AIDS Service Tom s Hardware Guide Office of Scientific and Technical Information Duke University Rare Books Specificity Arts Computers Health Science Sports Figure 2: Distribution of documents in the top level categories for five searchable web databases. in [8] See [2, 3, 10, 6, 7] for other related work relevant to database classification. As we discussed in the introduction, our technique can be also applied to the classification of any database that offers a search interface for its contents, no matter if its contents are hidden or not. 3.2 Crawling based ....

Jamie Callan and Margaret Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


Using Sampled Data and Regression to Merge Search Engine Results - Si, Callan (2002)   (2 citations)  Self-citation (Callan)   (Correct)

No context found.

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97-130, 2001.


Resource Selection and Data Fusion in Multimedia - Distributed Digital Libraries   Self-citation (Callan)   (Correct)

No context found.

J.P. Callan and M.E. Connell. Query-based sampling of text databases. ACM TOIS, 19(2):97--130, 2001.


The Effect of Database Size Distribution on Resource Selection.. - Si, Callan (2003)   Self-citation (Callan)   (Correct)

No context found.

Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems (2001) 97-130


The Effect of Database Size Distribution on Resource Selection.. - Si, Callan (2003)   Self-citation (Callan)   (Correct)

No context found.

Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems (2001) 97-130


Using Sampled Data and Regression to Merge Search Engine Results - Si, Callan (2002)   (2 citations)  Self-citation (Callan)   (Correct)

No context found.

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97-130, 2001.


Collaborative Research - Digital Government: A Language.. - Croft, Callan (2004)   Self-citation (Callan)   (Correct)

No context found.

Callan, J., and Connell, M., "Query-based sampling of text databases." ACM Transactions on Information Systems, 19(2), pp. 97-130. 2001.


The Effect of Database Size Distribution on Resource Selection.. - Si, Callan (2003)   Self-citation (Callan)   (Correct)

No context found.

Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems (2001) 97-130


Using Sampled Data and Regression to Merge Search Engine Results - Callan (2002)   (2 citations)  Self-citation (Callan)   (Correct)

No context found.

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97-130, 2001.


Resource Selection and Data Fusion in Multimedia - Distributed Digital Libraries   Self-citation (Callan)   (Correct)

No context found.

J.P. Callan and M.E. Connell. Query-based sampling of text databases. ACM TOIS, 19(2):97--130, 2001.


sPLMap: A probabilistic approach to schema matching - Henrik Nottelmann And (2005)   (Correct)

No context found.

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


The Personalized, Collaborative Digital Library Environment.. - Candela, Straccia (2003)   (Correct)

No context found.

Jamie Callan and Margaret Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


User Awareness in Cyclades: A Collaborative and.. - Avancini, Straccia (2004)   (Correct)

No context found.

Callan, J. and Connell, M. 2001. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97_130.


Downloading Hidden Web Content - Ntoulas, Zerfos, Cho (2004)   (Correct)

No context found.

J. P. Callan and M. E. Connell. Query-based sampling of text databases. Information Systems, 19(2):97--130, 2001.


Automated Discovery of Search Interfaces on the Web - Jared Cope Australian (2003)   (1 citation)  (Correct)

No context found.

Callan, J. & Connell, M. (2001), Query-based sampling of text databases, in `ACM Transactions on Information Systems', Vol. 19, pp. 97--130.


Modeling and Managing Content Changes in Text Databases - Ipeirotis, Ntoulas, Cho..   (Correct)

No context found.

J. P. Callan and M. Connell. Query-based sampling of text databases. ACM TOIS, 19(2), 2001.


From Retrieval Status Values to Probabilities of Relevance.. - Nottelmann, Fuhr   (Correct)

No context found.

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


Discovering and Ranking Data Intensive Web Services: - Source-Biased Approach James (2003)   (Correct)

No context found.

J. P. Callan and M. E. Connell. Query-based sampling of text databases. Information Systems, 19(2):97--130, 2001.


When one Sample is not Enough: Improving Text Database.. - Ipeirotis, Gravano (2004)   (Correct)

No context found.

J. P. Callan and M. Connell. Query-based sampling of text databases. ACM TOIS, 19(2), 2001.


The MIND Architecture for Heterogeneous Multimedia - Federated Digital Libraries   (Correct)

No context found.

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


Decision-Theoretic Resource Selection for - Different Data Types   (Correct)

No context found.

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.


Distributed Search in the Semantic Web - Umberto Straccia Isti-Cnr   (Correct)

No context found.

J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC