Results 1 -
9 of
9
Comparing the performance of collection selection algorithms
- ACM Transactions on Information Systems
, 2003
"... The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multicollection environment. Multicollection searching is cast in three parts: collection selection (also referred to as database selection), query processing and results ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multicollection environment. Multicollection searching is cast in three parts: collection selection (also referred to as database selection), query processing and results merging. In this work, we focus our attention on the evaluation of the first step, collection selection. In this article, we present a detailed discussion of the methodology that we used to evaluate and compare collection selection approaches, covering both test environments and evaluation measures. We compare the CORI, CVV and gGlOSS collection selection approaches using six test environments utilizing three document testbeds. We note similar trends in performance among the collection selection approaches, but the CORI approach consistently outperforms the other approaches, suggesting that effective collection selection can be achieved using limited information about each collection. The contributions of this work are both the assembled evaluation methodology as well as the application of that methodology to compare collection selection approaches in a standardized environment.
Generalizing GIOSS to vector-space databases and broker hierarchies
- VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases
, 1995
"... As large numbers of text databases have be-come available on the Internet, it is harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statis-t,ics on the available databases to estimate which databases are the pot ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
As large numbers of text databases have be-come available on the Internet, it is harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statis-t,ics on the available databases to estimate which databases are the potentially most use-ful for a given query. gGlOSS extends our pre-vious work [l], which focused on databases us-ing the boolean model of document retrieval, to cover databases using the more sophisti-cated vector-space retrieval model. We evalu-ate our new techniques using real-user queries and 53 databases. Finally, we further gener-alize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations. *This research was sponsored by the Advanced Research
Ontological Approach for Information Discovery in Internet Databases
"... The Internet has solved the age-old problem of network connectivity and thus enabling the potential access to, and data sharing among large numbers of databases. However, enabling users to discover useful information requires an adequate metadata infrastructure that must scale with the diversity an ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The Internet has solved the age-old problem of network connectivity and thus enabling the potential access to, and data sharing among large numbers of databases. However, enabling users to discover useful information requires an adequate metadata infrastructure that must scale with the diversity and dynamism of both users' interests and Internet accessible databases. In this paper, we present a model that partitions the information space into a distributed, highly specialized domain ontologies. We also introduce inter-ontology relationships to cater for userbased interests across ontologies defined over Internet databases. We also describe an architecture that implements these two fundamental constructs over Internet databases. The aim of the proposed model and architecture is to eventually facilitate data discovery and sharing for Internet databases.
Query-driven document partitioning and collection selection
- in INFOSCALE 2006: Proceedings of the first International Conference on Scalable Information Systems
, 2006
"... Abstract — We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a l ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract — We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a list recording the queries for which the document itself is a match, along with their ranks. To both partition the collection and build the collection selection function, we co-cluster queries and documents. The document clusters are then assigned to the underlying IR servers, while the query clusters represent queries that return similar results, and are used for collection selection. We show that this document partition strategy greatly boosts the performance of standard collection selection algorithms, including CORI, w.r.t. a round-robin assignment. Secondly, we show that performing collection selection by matching the query to the existing query clusters and successively choosing only one server, we reach an average precision-at-5 up to 1.74 and we constantly improve CORI precision of a factor between 11 % and 15%. As a side result we show a way to select rarely asked-for documents. Separating these documents from the rest of the collection allows the indexer to produce a more compact index containing only relevant documents that are likely to be requested in the future. In our tests, around 52 % of the documents (3,128,366) are not returned among the first 100 top-ranked results of any query. I.
A.: Broker's Lounge - an Environment for MultiDimensional User-Adaptive Knowledge Management
- In HICSS-34: 34th Hawaii International Conference on System Siences, Maui, Hawaii
, 2001
"... The Broker's Lounge is a shell for knowledge structuring and dynamic user interface generation which supports the personalisation of both information structures and user interfaces, emphasising options for context change and multi-dimensional constraint propagation. Two major applications have been ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
The Broker's Lounge is a shell for knowledge structuring and dynamic user interface generation which supports the personalisation of both information structures and user interfaces, emphasising options for context change and multi-dimensional constraint propagation. Two major applications have been developed so far: ELFI, an advisor that manages knowledge about research programs in Germany such that proposers can identify appropriate funding schemes; and MarketMonitor, a tool that helps companies monitor the web pages of competitors, suppliers, and customers for early detection of changes in the market situation.
Supporting Information Brokers with an Organisational Memory
- Würzburg University
, 1999
"... . Information Brokers perform knowledge-intensive tasks that require support in form of an Organisational Memory (OM). In this paper we define the information broking process and analyse involved roles and their corresponding tasks. We present an architecture for information broking environments wi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
. Information Brokers perform knowledge-intensive tasks that require support in form of an Organisational Memory (OM). In this paper we define the information broking process and analyse involved roles and their corresponding tasks. We present an architecture for information broking environments with OM and describe bizzyB, a broking application in use at the Milan Chamber of Commerce that instantiates this architecture. 1. Introduction The role of electronic markets and market intermediaries is emerging (cf. [2]) and changing rapidly. Workloads increase as information as a tradable good is becoming more important and selling the right information selected from an increasing amount of (online) information sources is a complex task. Rapidly changing information sources available over networks raise the need for new techniques to support brokers in their daily work. These techniques have to (1) unburden qualified brokers from routine tasks to save their time for intellectually challeng...
Generalizing GlOSS To Vector-Space Databases and . . .
, 1995
"... As large numbers of text databases have become available on the Internet, it is harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the poten ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
As large numbers of text databases have become available on the Internet, it is harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work [1], which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations.
REFEREE
, 2004
"... For hundreds of years the mankind has organized information in order to make it more accessible to the others. The last media born to globally provide information is the Internet. With the Web, in particular, the name of the Internet has spread all over the World. Due to its impressive size and its ..."
Abstract
- Add to MetaCart
For hundreds of years the mankind has organized information in order to make it more accessible to the others. The last media born to globally provide information is the Internet. With the Web, in particular, the name of the Internet has spread all over the World. Due to its impressive size and its high dinamicity, when we need to search for information on the Web, usually we begin by querying a Web Search Engine. A Web Search Engine maintains and catalogs the content of Web pages in order to make them easier to find and browse. Even though the various Search Engines are similar, each one of them differentiates from the other by the methods for scouring, storing, and retrieving information from

