Results 1 -
6 of
6
Optimized Seamless Integration of Biomolecular Data
- IEEE symposium on Bio-Informatics and Biomedical Engineering (BIBE’2001), Washington DC
, 2001
"... Today, scientific data is inevitably digitized, stored in a variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Today, scientific data is inevitably digitized, stored in a variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further analysis and visualization to support the task of scientific discovery. Building a digital library for scientific discovery requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that is locally materialized in warehouses or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges to be addressed include capturing and representing source capabilities; developing a methodology to acquire and represent metadata about source contents and access costs; and decision support to select sources and capabilities using cost based and semantic knowledge, and generating low cost query evaluation plans.
Biological Data Integration: Wrapping Data and Tools
, 2002
"... Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. Building a digital library for ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web as well as data generated by software. We present an approach to wrapping web data sources, databases, flat files, or data generated by tools through a database view mechanism. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, second builds the expected output with respect to the virtual structure. Our wrappers are composed of a retrieval component based on an intermediate object view mechanism called search views mapping the source capabilities to attributes, and an eXtensible Markup Language (XML) engine, respectively, to perform these two tasks. The originality of the approach consists of: 1) a generic view mechanism to access seamlessly data sources with limited capabilities and 2) the ability to wrap data sources as well as the useful specific tools they may provide. Our approach has been developed and demonstrated as part of the multidatabase system supporting queries via uniform object protocol model (OPM) interfaces.
Embedded Databases for Embedded Real-Time Systems: A Component-Based Approach
, 2002
"... In the last years the deployment of embedded real-time systems has increased dramatically. At the same time, the amount of data that needs to be managed by embedded real-time systems is increasing, thus requiring an efficient and structured data management. Hence, database functionality is needed to ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
In the last years the deployment of embedded real-time systems has increased dramatically. At the same time, the amount of data that needs to be managed by embedded real-time systems is increasing, thus requiring an efficient and structured data management. Hence, database functionality is needed to provide support for storage and manipulation of data in embedded real-time systems. However, a database that can be used in an embedded real-time system must fulfill requirements both from an embedded system and from a realtime system, i.e., at the same time the database needs to be an embedded and a real-time database. The real-time database must handle transactions with temporal constraints, as well as maintain consistency as in a conventional database. The main objectives for an embedded database are low memory usage, i.e., small memory footprint, portability to different operating system platforms, efficient resource management, e.g., minimization of the CPU usage, ability to run for long periods of time without administration, and ability to be tailored for different applications. In addition, development costs must be kept as low as possible, with short time-to-market and a reliable software. In this report we survey embedded and real-time database platforms developed in industrial and research environments. This survey represents the state-of-the-art in the area of embedded databases for embedded real-time systems. The survey enables us to identify a gap between embedded systems, real-time systems and database systems, i.e., embedded databases suitable for real-time systems are sparse. Furthermore, it is observed that there is a need for a more generic embedded database that can be tailored, such that the application designer can get an optimized database for a specif...
Complex Query Formulation Over Diverse Information Sources in TAMBIS
, 1999
"... Biologists increasingly need to ask complex questions over the large number of data and analysis tools that are available on the Internet. To do this, the individual resources need to be made to work together. The knowledge needed to accomplish this, for example about the locations of the sources ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Biologists increasingly need to ask complex questions over the large number of data and analysis tools that are available on the Internet. To do this, the individual resources need to be made to work together. The knowledge needed to accomplish this, for example about the locations of the sources and their capabilities, places barriers between biologists and the questions they would like to ask. The TAMBIS project (Transparent Access to Multiple Bioinformatics Information Sources) has sought to remove some of these barriers, thereby making the process of asking questions against multiple sources more straightforward. Central to the TAMBIS system is an ontology of bioinformatics and biological terms. Users express retrieval requests in terms of the concepts and relationships described in the ontology, rather than by making direct reference to individual sources. This allows TAMBIS to be used to formulate rich, declarative queries over multiple sources. The ontology is constructed in a manner that ensures only biologically meaningful queries can be posed. User's queries are constructed using an interactive ontology browsing and query construction tool, and are rewritten by a query planner for evaluation using a wrapper layer. This paper provides an overview of the TAMBIS approach to source integration, focusing on the way the ontology is used to support query formulation and refinement.
COMMUNICATION COST MODELING FOR FEDERATED DATABASE SYSTEMS
"... I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research. I further authorize the University of Waterloo to reproduce this thesis by pho-tocopying or by other means ..."
Abstract
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research. I further authorize the University of Waterloo to reproduce this thesis by pho-tocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. ii Federated database systems are a useful tool for businesses and researchers around the world. These systems allow data from multiple remote data sources to be logically combined into one unified local data source. Using this system, queries that would traditionally require query fragments to be submitted to multiple sites can be performed by submitting one query to a central site. This central site can make use of data stored at the different remote sources as though the central site were simply an application requesting data. Currently, the performance of global queries in most federated database systems is much worse than the performance of local queries. These so-called global queries must be optimized, but many additional factors combine to make global query optimization complicated. Beyond the problems of local query optimization, additional costs, including the cost of communication and the cost of remote site optimization must be factored into cost models. This thesis presents benchmarking experiments performed at iAnywhere Solu-tions, Inc. during a cooperative work term and at the University of Waterloo. A discussion of the results of this benchmarking as well as a model for estimating communication cost is presented. The results of testing the model are also provided. iii
Links and Paths
- In Proc. International Workshop on Data Integration in the Life Sciences (DILS 2004
, 2004
"... An abundance of biological data sources contain data on classes of scientific entities, such as genes and sequences. Logical relationships between scientific objects are implemented as URLs and foreign IDs. Query processing typically involves traversing links and paths (concatenation of links) t ..."
Abstract
- Add to MetaCart
An abundance of biological data sources contain data on classes of scientific entities, such as genes and sequences. Logical relationships between scientific objects are implemented as URLs and foreign IDs. Query processing typically involves traversing links and paths (concatenation of links) through these sources. We model the data objects in these sources and the links between objects as an object graph.

