• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

The CQL continuous query language: semantic foundations and query execution. (2006)

by A Arasu, S Babu, J Widom
Venue:The VLDB Journal
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 354
Next 10 →

The End of an Architectural Era (It's Time for a Complete Rewrite

by Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos - Proceedings of the 31st international , 2005
"... In previous papers [SC05, SBC+07], some of us predicted the end of “one size fits all ” as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in ..."
Abstract - Cited by 200 (23 self) - Add to MetaCart
In previous papers [SC05, SBC+07], some of us predicted the end of “one size fits all ” as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets. Assuming that specialized engines dominate these markets over time, the current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store, which we have built at M.I.T. to a popular RDBMS on the standard transactional benchmark, TPC-C. We conclude that the current RDBMS code lines, while attempting to be a “one size fits all ” solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of “from scratch ” specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs. 1.
(Show Context)

Citation Context

...L, a generalization of SQL that allows a programmer to mix stored tables and streams in the FROM clause of a SQL statement. This work has evolved from the pioneering work of the Stanford Stream group =-=[ABW06]-=- and is being actively discussed for standardization. Of course, StreamSQL supports relational schemas for both tables and streams. However, commercial feeds, such as Reuters, Infodyne, etc., have all...

Frenetic: A Network Programming Language

by Nate Foster, Rob Harrison, Michael J. Freedman, Jennifer Rexford, Alec Story, David Walker
"... Modern networks provide a variety of interrelated services including routing, traffic monitoring, load balancing, and access control. Unfortunately, the languages used to program today’s networks lack modern features—they are usually defined at the low level of abstraction supplied by the underlying ..."
Abstract - Cited by 128 (23 self) - Add to MetaCart
Modern networks provide a variety of interrelated services including routing, traffic monitoring, load balancing, and access control. Unfortunately, the languages used to program today’s networks lack modern features—they are usually defined at the low level of abstraction supplied by the underlying hardware and they fail to provide even rudimentary support for modular programming. As a result, network programs tend to be complicated, error-prone, and difficult to maintain. This paper presents Frenetic, a high-level language for programming distributed collections of network switches. Frenetic provides a declarative query language for classifying and aggregating network traffic as well as a functional reactive combinator library for describing high-level packet-forwarding policies. Unlike prior work in this domain, these constructs are—by design—fully compositional, which facilitates modular reasoning and enables code reuse. This important property is enabled by Frenetic’s novel runtime system which manages all of the details related to installing, uninstalling, and querying low-level packet-processing rules on physical switches. Overall, this paper makes three main contributions: (1) We analyze the state-of-the art in languages for programming networks and identify the key limitations; (2) We present a language design that addresses these limitations, using a series of examples to motivate and validate our choices; (3) We describe an implementation of the language and evaluate its performance on several benchmarks.
(Show Context)

Citation Context

...ge and run-time system, which uses the capabilities of switches to avoid sending packets to the controller. At a high level, Frenetic is also similar to streaming languages such as StreamIt [38], CQL =-=[6]-=-, Esterel [7], Brooklet [37], etc. The FRP operators used in Frenetic are more to our taste, but one could easily build a system that retained the main elements of our design (e.g., the query language...

Adaptive cleaning for rfid data streams

by Shawn R Jeffery , Minos Garofalakis , Michael J Franklin , 2006
"... ABSTRACT To compensate for the inherent unreliability of RFID data streams, most RFID middleware systems employ a "smoothing filter", a sliding-window aggregate that interpolates for lost readings. In this paper, we propose SMURF, the first declarative, adaptive smoothing filter for RFID ..."
Abstract - Cited by 101 (0 self) - Add to MetaCart
ABSTRACT To compensate for the inherent unreliability of RFID data streams, most RFID middleware systems employ a "smoothing filter", a sliding-window aggregate that interpolates for lost readings. In this paper, we propose SMURF, the first declarative, adaptive smoothing filter for RFID data cleaning. SMURF models the unreliability of RFID readings by viewing RFID streams as a statistical sample of tags in the physical world, and exploits techniques grounded in sampling theory to drive its cleaning processes. Through the use of tools such as binomial sampling and π-estimators, SMURF continuously adapts the smoothing window size in a principled manner to provide accurate RFID data to applications.

Fault-tolerance in the Borealis distributed stream processing system

by Magdalena Balazinska, Hari Balakrishnan, Samuel R. Madden, Michael Stonebraker - In Proc. of the 2005 ACM SIGMOD International Conference on Management of Data , 2005
"... Over the past few years, Stream Processing Engines (SPEs) have emerged as a new class of software systems, enabling low latency processing of streams of data arriving at high rates. As SPEs mature and get used in monitoring applications that must continuously run (e.g., in network security monitorin ..."
Abstract - Cited by 97 (9 self) - Add to MetaCart
Over the past few years, Stream Processing Engines (SPEs) have emerged as a new class of software systems, enabling low latency processing of streams of data arriving at high rates. As SPEs mature and get used in monitoring applications that must continuously run (e.g., in network security monitoring), a significant challenge arises: SPEs must be able to handle various software and hardware faults that occur, masking them to provide high availability (HA). In this paper, we develop, implement, and evaluate DPC (Delay, Process, and Correct), a protocol to handle crash failures of processing nodes and network failures in a distributed SPE. Like previous approaches to HA, DPC uses replication and masks many types of node and network failures. In the presence of network partitions, the designer of any replication system faces a choice between providing availability or data consistency across the replicas. In DPC, this choice is made explicit: the user specifies an availability bound (no result should be delayed by more than a specified delay threshold even under failure if the corresponding input is available), and DPC attempts to minimize the resulting inconsistency between replicas (not all of which might have seen the input data) while meeting the given delay threshold. Although conceptually simple, the DPC protocol tolerates the occurrence of multiple simultaneous failures as well as any
(Show Context)

Citation Context

...anagers [1, 30] or continuous query processors [14]) are a class of software systems that handle the data processing requirements mentioned above. Much work has been done on data models and operators =-=[1, 6, 16, 28, 41]-=-, efficient processing [7, 8, 12, 30], and resource management [13, 17, 30, 35, 38] for SPEs. Stream processing applications are inherently distributed, both because input streams often arrive from ge...

Towards Expressive Publish/Subscribe Systems

by Alan Demers, Johannes Gehrke, Mingsheng Hong, Mirek Riedewald, Walker White - In Proc. EDBT , 2006
"... Abstract. Traditional content based publish/subscribe (pub/sub) systems allow users to express stateless subscriptions evaluated on individual events. However, many applications such as monitoring RSS streams, stock tickers, or management of RFID data streams require the ability to handle stateful s ..."
Abstract - Cited by 96 (10 self) - Add to MetaCart
Abstract. Traditional content based publish/subscribe (pub/sub) systems allow users to express stateless subscriptions evaluated on individual events. However, many applications such as monitoring RSS streams, stock tickers, or management of RFID data streams require the ability to handle stateful subscriptions. In this paper, we introduce Cayuga, a stateful pub/sub system based on nondeterministic finite state automata (NFA). Cayuga allows users to express subscriptions that span multiple events, and it supports powerful language features such as parameterization and aggregation, which significantly extend the expressive power of standard pub/sub systems. Based on a set of formally defined language operators, the subscription language of Cayuga provides non-ambiguous subscription semantics as well as unique opportunities for optimizations. We experimentally demonstrate that common optimization techniques used in NFA-based systems such as state merging have only limited effectiveness, and we propose novel efficient indexing methods to speed up subscription processing. In a thorough experimental evaluation we show the efficacy of our approach. 1
(Show Context)

Citation Context

...haps the most powerful formal approach is STREAM’s CQL query language [25], which extends SQL with support for window queries. Like SQL itself, CQL is declarative and admits of a formal specification =-=[6]-=-; and there are some initial results characterizing a sub-class of queries that can be computed with bounded memory [28, 5]. However, as we pointed out in the introduction, it is not clear whether SQL...

Spade: the system s declarative stream processing engine

by Bu ˘gra Gedik, Ibm Thomas, Philip S. Yu, Henrique Andrade, Ibm Thomas, Myungcheol Doo, Kun-lung Wu - in SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
"... In this paper, we present Spade − the System S declarative stream processing engine. System S is a large-scale, distributed data stream processing middleware under development at IBM T. J. Watson Research Center. As a front-end for rapid application development for System S, Spade provides (1) an in ..."
Abstract - Cited by 79 (8 self) - Add to MetaCart
In this paper, we present Spade − the System S declarative stream processing engine. System S is a large-scale, distributed data stream processing middleware under development at IBM T. J. Watson Research Center. As a front-end for rapid application development for System S, Spade provides (1) an intermediate language for flexible composition of parallel and distributed data-flow graphs, (2) a toolkit of type-generic, built-in stream processing operators, that support scalar as well as vectorized processing and can seamlessly inter-operate with user-defined operators, and (3) a rich set of stream adapters to ingest/publish data from/to outside sources. More importantly, Spade automatically brings performance optimization and scalability to System S applications. To that end, Spade employs a code generation framework to create highly-optimized applications that run natively on the Stream Processing Core (SPC), the execution and communication substrate of System S, and take full advantage of other System S services. Spade allows developers to construct their applications with fine granular stream operators without worrying about the performance implications that might exist, even in a distributed system. Spade’s optimizing compiler automatically maps applications into appropriately sized execution units in order to minimize communication overhead, while at the same time exploiting available parallelism. By virtue of the scalability of the System S runtime and Spade’s effective code generation and optimization, we can scale applications to a large number of nodes. Currently, we can run Spade jobs on ≈ 500 processors within more than 100 physical nodes in a tightly connected cluster environment. Spade has been in use at IBM Research to create real-world streaming appli-
(Show Context)

Citation Context

...anding about the collection of operators they intend to use. For example, database engineers typically conceive their applications in terms of the operators available in the stream relational algebra =-=[5, 23]-=-. Likewise, MATLAB [20] programmers have several toolkits at their disposal, from numerical optimization to symbolic manipulation to signal processing, which, depending on the application domain, are ...

High-availability algorithms for distributed stream processing

by Jeong-hyon Hwang, Magdalena Balazinska, Er Rasin, Michael Stonebraker, Stan Zdonik , 2004
"... Stream-processing systems are designed to support an emerging class of applications that require sophisticated and timely processing of high-volume data streams, often originating in distributed environments. Unlike traditional dataprocessing applications that require precise recovery for correctnes ..."
Abstract - Cited by 74 (15 self) - Add to MetaCart
Stream-processing systems are designed to support an emerging class of applications that require sophisticated and timely processing of high-volume data streams, often originating in distributed environments. Unlike traditional dataprocessing applications that require precise recovery for correctness, many stream-processing applications can tolerate and benefit from weaker recovery guarantees. In this paper, we study various recovery guarantees and pertinent recovery techniques that can meet the correctness and performance requirements of stream-processing applications. We discuss the design and algorithmic challenges associated with the proposed recovery techniques and describe how each can provide different guarantees with proper combinations of redundant processing, checkpointing, and remote logging. Using analysis and simulations, we quantify the cost of our recovery guarantees and examine the performance and applicability of the recovery techniques. We also analyze how the knowledge of query network properties can help decrease the cost of high availability.
(Show Context)

Citation Context

...ry (including nondeterministic), deterministic, convergent-capable, and repeatable. Figure 3 depicts the containment relationship among these operator types and the classification of Aurora operators =-=[1, 2]-=-. The type of a query network is determined by the type of its most general operator. An operator is deterministic if it produces the same output stream every time it starts from the same initial stat...

Stream: The stanford data stream management system

by Arvind Arasu, Brian Babcock, Shivnath Babu, John Cieslewicz, Keith Ito, Rajeev Motwani, Utkarsh Srivastava, Jennifer Widom , 2004
"... Traditional database management systems are best equipped to run onetime queries over finite stored data sets. However, many modern applications such as network monitoring, financial analysis, manufacturing, and sensor networks require long-running, or continuous, queries over continuous unbounded ..."
Abstract - Cited by 71 (0 self) - Add to MetaCart
Traditional database management systems are best equipped to run onetime queries over finite stored data sets. However, many modern applications such as network monitoring, financial analysis, manufacturing, and sensor networks require long-running, or continuous, queries over continuous unbounded
(Show Context)

Citation Context

...the addition of aggregation, subqueries, windowing constructs, and joins of streams and relations, the semantics of a conventional relational language applied to these queries quickly becomes unclear =-=[3]-=-. To address this problem, we have defined a formal abstract semantics for continuous queries, and we have designed CQL, a concrete declarative query language that implements the abstract semantics. 2...

Efficient Computation of Frequent and Top-k Elements in Data Streams

by Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi - IN ICDT , 2005
"... We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and top-k elements with tight guarantees on errors. For ..."
Abstract - Cited by 71 (7 self) - Add to MetaCart
We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and top-k elements with tight guarantees on errors. For general data distributions, our top-k algorithm returns k elements that have roughly the highest frequencies; and it uses limited space for calculating frequent elements. For realistic Zipfian data, the space requirement of the proposed algorithm for solving the exact frequent elements problem decreases dramatically with the parameter of the distribution; and for top-k queries, the analysis ensures that only the top-k elements, in the correct order, are reported. The experiments, using real and synthetic data sets, show space reductions with no loss in accuracy. Having proved the effectiveness of the proposed approach through both analysis and experiments, we extend it to be able to answer continuous queries about frequent and top-k elements. Although the problems of incremental reporting of frequent and top-k elements are useful in many applications, to the best of our knowledge, no solution has been proposed.

Processing flows of information: from data stream to complex event processing

by Gianpaolo Cugola, Alessandro Margara - ACM COMPUTING SURVEYS , 2011
"... A large number of distributed applications requires continuous and timely processing of information as it flows from the periphery to the center of the system. Examples include intrusion detection systems which analyze network traffic in real-time to identify possible attacks; environmental monitori ..."
Abstract - Cited by 67 (11 self) - Add to MetaCart
A large number of distributed applications requires continuous and timely processing of information as it flows from the periphery to the center of the system. Examples include intrusion detection systems which analyze network traffic in real-time to identify possible attacks; environmental monitoring applications which process raw data coming from sensor networks to identify critical situations; or applications performing online analysis of stock prices to identify trends and forecast future values. Traditional DBMSs, which need to store and index data before processing it, can hardly fulfill the requirements of timeliness coming from such domains. Accordingly, during the last decade, different research communities developed a number of tools, which we collectively call Information flow processing (IFP) systems, to support these scenarios. They differ in their system architecture, data model, rule model, and rule language. In this article, we survey these systems to help researchers, who often come from different backgrounds, in understanding how the various approaches they adopt may complement each other. In particular, we propose a general, unifying model to capture the different aspects of an IFP system and use it to provide a complete and precise classification of the systems and mechanisms proposed so far.
(Show Context)

Citation Context

...guages embed simple operators for pattern detection, blurring the distinction between transforming and detecting languages. As a representative example for declarative languages, let us consider CQL [=-=Arasu et al. 2006-=-], created within the Stream project [Arasu et al. 2003] and currently adopted by Oracle [Oracle 2010]. CQL defines three classes of operators: relationto-relation operators are similar to the standar...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University