Results 1 - 10
of
226
Models and issues in data stream systems
- IN PODS
, 2002
"... In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work releva ..."
Abstract
-
Cited by 786 (19 self)
- Add to MetaCart
(Show Context)
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.
Tinydb: An acquisitional query processing system for sensor networks
- ACM Trans. Database Syst
, 2005
"... We discuss the design of an acquisitional query processor for data collection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often data is physically acquired (sampled) and delivered to query processing operators. By focusing on the locations and costs of acq ..."
Abstract
-
Cited by 626 (8 self)
- Add to MetaCart
We discuss the design of an acquisitional query processor for data collection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often data is physically acquired (sampled) and delivered to query processing operators. By focusing on the locations and costs of acquiring data, we are able to significantly reduce power consumption over traditional passive systems that assume the a priori existence of data. We discuss simple extensions to SQL for controlling data acquisition, and show how acquisitional issues influence query optimization, dissemination, and execution. We evaluate these issues in the context of TinyDB, a distributed query processor for smart sensor devices, and show how acquisitional techniques can provide significant reductions in power consumption on our sensor devices. Categories and Subject Descriptors: H.2.3 [Database Management]: Languages—Query languages; H.2.4 [Database Management]: Systems—Distributed databases; query processing
The design of an acquisitional query processor for sensor networks
- In SIGMOD
, 2003
"... We discuss the design of an acquisitional query processor for data collection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often data is physically acquired (sampled) and delivered to query processing operators. By focusing on the locations and costs of acq ..."
Abstract
-
Cited by 523 (25 self)
- Add to MetaCart
(Show Context)
We discuss the design of an acquisitional query processor for data collection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often data is physically acquired (sampled) and delivered to query processing operators. By focusing on the locations and costs of acquiring data, we are able to significantly reduce power consumption over traditional passive systems that assume the a priori existence of data. We discuss simple extensions to SQL for controlling data acquisition, and show how acquisitional issues influence query optimization, dissemination, and execution. We evaluate these issues in the context of TinyDB, a distributed query processor for smart sensor devices, and show how acquisitional techniques can provide significant reductions in power consumption on our sensor devices. 1.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach
- In SIGMOD Conference
, 2001
"... A data-integration system provides access to a multitude of data sources through a single mediated schema. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the source schemas and the mediated schema. We describe LSD, a system that empl ..."
Abstract
-
Cited by 424 (50 self)
- Add to MetaCart
(Show Context)
A data-integration system provides access to a multitude of data sources through a single mediated schema. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the source schemas and the mediated schema. We describe LSD, a system that employs and extends current machine-learning techniques to semi-automatically find such mappings. LSD first asks the user to provide the semantic mappings for a small set of data sources, then uses these mappings together with the sources to train a set of learners. Each learner exploits a different type of information either in the source schemas or in their data. Once the learners have been trained, LSD nds semantic mappings for a new data source by applying the learners, then combining their predictions using a meta-learner. To further improve matching accuracy, we extend machine learning techniques so that LSD can incorporate domain constraints as an additional source of knowledge, and develop a novel learner that utilizes the structural information in XML documents. Our approach thus is distinguished in that it incorporates multiple types of knowledge. Importantly, its architecture is extensible to additional learners that may exploit new kinds of information. We describe a set of experiments on several real-world domains, and show that LSD proposes semantic mappings with a high degree of accuracy.
Aurora: a new model and architecture for data stream management
, 2003
"... This paper describes the basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications. Monitoring applications differ substantially from conventional business data processing. The fact that a software system must process and react to continual in ..."
Abstract
-
Cited by 401 (31 self)
- Add to MetaCart
(Show Context)
This paper describes the basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications. Monitoring applications differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS currently under construction at Brandeis University, Brown University, and M.I.T. We first provide an overview of the basic Aurora model and architecture and then describe in detail a stream-oriented set of operators.
Fjording the Stream: An Architecture for Queries over Streaming Sensor Data
, 2002
"... If industry visionaries are correct, our lives will soon be full of sensors, connected together in loose conglomerations via wireless networks, each monitoring and collecting data about the environment at large. These sensors behave very differently from traditional database sources: they have inter ..."
Abstract
-
Cited by 281 (8 self)
- Add to MetaCart
(Show Context)
If industry visionaries are correct, our lives will soon be full of sensors, connected together in loose conglomerations via wireless networks, each monitoring and collecting data about the environment at large. These sensors behave very differently from traditional database sources: they have intermittent connectivity, are limited by severe power constraints, and typically sample periodically and push immediately, keeping no record of historical information. These limitations make traditional database systems inappropriate for queries over sensors. We present the Fjords architecture for managing multiple queries over many sensors, and show how it can be used to limit sensor resource demands while maintaining high query throughput. We evaluate our architecture using traces from a network of traffic sensors deployed on Interstate 80 near Berkeley and present performance results that show how query throughput, communication costs, and power consumption are necessarily coupled in sensor environments.
Continuously Adaptive Continuous Queries over Streams
- In SIGMOD
, 2002
"... We present a continuously adaptive, continuous query (CACQ) implementation based on the eddy query processing framework. We show that our design provides significant performance benefits over existing approaches to evaluating continuous queries, not only because of its adaptivity, but also because o ..."
Abstract
-
Cited by 270 (8 self)
- Add to MetaCart
(Show Context)
We present a continuously adaptive, continuous query (CACQ) implementation based on the eddy query processing framework. We show that our design provides significant performance benefits over existing approaches to evaluating continuous queries, not only because of its adaptivity, but also because of the aggressive crossquery sharing of work and space that it enables. By breaking the abstraction of shared relational algebra expressions, our Telegraph CACQ implementation is able to share physical operators -- both selections and join state -- at a very fine grain. We augment these features with a grouped-filter index to simultaneously evaluate multiple selection predicates. We include measurements of the performance of our core system, along with a comparison to existing continuous query approaches.
Evaluating Window Joins over Unbounded Streams
- IN ICDE
, 2003
"... We investigate algorithms for evaluating sliding window joins over pairs of unbounded streams. We introduce a unittime -basis cost model to analyze the expected performance of these algorithms. Using this cost model, we propose strategies for maximizing the efficiency of processing joins in thre ..."
Abstract
-
Cited by 161 (3 self)
- Add to MetaCart
(Show Context)
We investigate algorithms for evaluating sliding window joins over pairs of unbounded streams. We introduce a unittime -basis cost model to analyze the expected performance of these algorithms. Using this cost model, we propose strategies for maximizing the efficiency of processing joins in three scenarios. First, we consider the case where one stream is much faster than the other. We show that asymmetric combinations of join algorithms, (e.g., hash join on one input, nested-loops join on the other) can outperform symmetric join algorithm implementations. Second, we investigate the case where system resources are insufficient to keep up with the input streams. We show that we can maximize the number of join result tuples produced in this case by properly allocating computing resources across the two input streams. Finally, we investigate strategies for maximizing the number of result tuples produced when memory is limited, and show that proper memory allocation across the two input streams can result in significantly lower resource usage and/or more result tuples produced.
Semantic integration research in the database community: A brief survey
- AI Magazine
, 2005
"... Semantic integration has been a long-standing challenge for the database community. It has received steady attention over the past two decades, and has now become a prominent area of database research. In this article, we first review database applications that require semantic integration, and disc ..."
Abstract
-
Cited by 145 (4 self)
- Add to MetaCart
(Show Context)
Semantic integration has been a long-standing challenge for the database community. It has received steady attention over the past two decades, and has now become a prominent area of database research. In this article, we first review database applications that require semantic integration, and discuss the difficulties underlying the integration process. We then describe recent progress and identify open research issues. We will focus in particular on schema matching, a topic that has received much attention in the database community, but will also discuss data matching (e.g., tuple deduplication), and open issues beyond the match discovery context (e.g., reasoning with matches, match verification and repair, and reconciling inconsistent data values). For previous surveys of database research on semantic integration, see (Rahm & Bernstein 2001;
Rate-Based Query Optimization for Streaming Information Sources
- In Proc. of the 2002 ACM SIGMOD Intl. Conf. on Management of Data
, 2002
"... Relational query optimizers have traditionally relied upon table cardinalities when estimating the cost of the query plans they consider. While this approach has been and continues to be successful, the advent of the Internet and the need to execute queries over streaming sources requires a differen ..."
Abstract
-
Cited by 139 (2 self)
- Add to MetaCart
(Show Context)
Relational query optimizers have traditionally relied upon table cardinalities when estimating the cost of the query plans they consider. While this approach has been and continues to be successful, the advent of the Internet and the need to execute queries over streaming sources requires a different approach, since for streaming inputs the cardinality may not be known or may not even be knowable (as is the case for an unbounded stream.) In view of this, we propose shifting from a cardinality-based approach to a rate-based approach, and give an optimization framework that aims at maximizing the output rate of query evaluation plans. This approach can be applied to cases where the cardinality-based approach cannot be used. It may also be useful for cases where cardinalities are known, because by focusing on rates we are able not only to optimize the time at which the last result tuple appears, but also to optimize for the number of answers computed at any specified time after the query evaluation commences. We present a preliminary validation of our rate-based optimization framework on a prototype XML query engine, though it is generic enough to be used in other database contexts. The results show that rate-based optimization is feasible and can indeed yield correct decisions.