Results 1 - 10
of
62
ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
- In WebDB
, 2003
"... this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/ ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/
Improving Text Collection Selection with Coverage and Overlap Statistics
, 2005
"... In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when the ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.
Bookmark-driven query routing in peer-to-peer web search
- Proceedings of the SIGIR Workshop on Peer-to-Peer Information Retrieval. (2004) 46–57
, 2004
"... Abstract: We consider the problem of collaborative Web search and query routing strategies in a peer-to-peer (P2P) environment. In our architecture every peer has a full-fledged search engine with a (thematically focused) crawler and a local index whose contents may be tailored to the user’s specifi ..."
Abstract
-
Cited by 19 (12 self)
- Add to MetaCart
Abstract: We consider the problem of collaborative Web search and query routing strategies in a peer-to-peer (P2P) environment. In our architecture every peer has a full-fledged search engine with a (thematically focused) crawler and a local index whose contents may be tailored to the user’s specific interest profile. Peers are autonomous and post meta-information about their bookmarks and index lists to a global directory, which is efficiently implemented in a decentralized manner using Chordstyle distributed hash tables. A query posed by one peer is first evaluated locally; if the result is unsatisfactory the query is forwarded to selected peers. These peers are chosen based on a benefit/cost measure where benefit reflects the thematic similarity of peers ’ interest profiles, derived from bookmarks, and cost captures estimated peer load and response time. The meta-information that is needed for making these query routing decisions is efficiently looked up in the global directory; it can also be cached and proactively disseminated for higher availability and reduced network load. 1
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
, 2005
"... We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of t ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of the Cosine and Okapi measures. The main bottleneck in such a scenario is the amount of communication required during query evaluation. We propose several efficient query evaluation schemes and evaluate their performance. Our results on real search engine query traces and over 120 million web pages show that after careful optimization such queries can be evaluated at a reasonable cost, while challenges remain for even larger collections and more general classes of ranking functions. 1.
Unified Utility Maximization Framework for Resource Selection
- In Proc. ACM CIKM Conf
, 2004
"... This paper presents a unified utility framework for resource selection of distributed text information retrieval. This new framework shows an efficient and effective way to infer the probabilities of relevance of all the documents across the text databases. With the estimated relevance information, ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
This paper presents a unified utility framework for resource selection of distributed text information retrieval. This new framework shows an efficient and effective way to infer the probabilities of relevance of all the documents across the text databases. With the estimated relevance information, resource selection can be made by explicitly optimizing the goals of different applications. Specifically, when used for database recommendation, the selection is optimized for the goal of highrecall (include as many relevant documents as possible in the selected databases); when used for distributed document retrieval, the selection targets the high-precision goal (high precision in the final merged list of documents). This new model provides a more solid framework for distributed information retrieval. Empirical studies show that it is at least as effective as other state-of-the-art algorithms.
Beauty and the beast: The theory and practice of information integration
- In ICDT
, 2007
"... Abstract. Information integration is becoming a critical problem for businesses and individuals alike. Data volumes are sky-rocketing, and new sources and types of information are proliferating. This paper briefly reviews some of the key research accomplishments in information integration (theory an ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Abstract. Information integration is becoming a critical problem for businesses and individuals alike. Data volumes are sky-rocketing, and new sources and types of information are proliferating. This paper briefly reviews some of the key research accomplishments in information integration (theory and systems), then describes the current state-of-the-art in commercial practice, and the challenges (still) faced by CIOs and application developers. One critical challenge is choosing the right combination of tools and technologies to do the integration. Although each has been studied separately, we lack a unified (and certainly, a unifying) understanding of these various approaches to integration. Experience with a variety of integration projects suggests that we need a broader framework, perhaps even a theory, which explicitly takes into account requirements on the result of the integration, and considers the entire end-to-end integration process.
Modeling Search Engine Effectiveness for Federated Search
"... Federated search links multiple search engines into a single, virtual search system. Most prior research of federated search focused on selecting search engines that have the most relevant contents, but ignored the retrieval effectiveness of individual search engines. This omission can cause serious ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Federated search links multiple search engines into a single, virtual search system. Most prior research of federated search focused on selecting search engines that have the most relevant contents, but ignored the retrieval effectiveness of individual search engines. This omission can cause serious problems when federating search engines of different qualities.
Discovering and exploiting keyword and attribute-value co-occurrences to improve p2p routing indices
- In CIKM
, 2006
"... Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organ ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the
Effective keyword-based selection of relational databases
- In Proceedings of SIGMOD
, 2007
"... over World Wide Web has fueled the demand for incorporating keyword-based search over structured databases. However, most of the current research work focuses on keywordbased searching over a single structured data source. With the growing interest in distributed databases and service oriented archi ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
over World Wide Web has fueled the demand for incorporating keyword-based search over structured databases. However, most of the current research work focuses on keywordbased searching over a single structured data source. With the growing interest in distributed databases and service oriented architecture over the Internet, it is important to extend such a capability over multiple structured data sources. One of the most important problems for enabling such a query facility is to be able to select the most useful data sources relevant to the keyword query. Traditional database summary techniques used for selecting unstructured data sources developed in IR literature are inadequate for our problem, as they do not capture the structure of the data sources. In this paper, we study the database selection problem for relational data sources, and propose a method that effectively summarizes the relationships between keywords in a relational database based on its structure. We develop effective ranking methods based on the keyword relationship summaries in order to select the most useful databases for a given keyword query. We have implemented our system on PlanetLab. In that environment we use extensive experiments with real datasets to demonstrate the effectiveness of our proposed summarization method.

