Results 1 - 10
of
49
PERF Join: An Alternative To Two-way Semijoin And Bloomjoin
- In Proceedings of the International Conference on Information and Knowledge Management (CIKM
, 1995
"... This paper presents "Positionally Encoded Record Filters " (PERFs) and describes their use in a distributed query processing technique called PERF join. A PERF is a novel two-way join reduction implementation primitive. While having the same storage and transmission efficiency as a hash fi ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
analytical studies that PERF join performs significantly better than two-way Bloomjoin and two-way semijoin variants under a wide range of relevant cost parameter values. For the large number of distributed query processing algorithms relying on Bloomjoin or semijoin variants to reduce their network cost, we
Join Processing Using Bloom Filter in MapReduce
"... MapReduce is a programming model which is extensively used for large-scale data analysis. The join operation is one of the essential operations for the data analysis. However, MapReduce is not very efficient to perform the join oper-ation since it always processes all records in the datasets even in ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
in the cases that only small fraction of datasets are relevant for the join operation. We alleviate this problem by applying bloomjoin algorithm, a classic distributed join algorithm. We improve the join performance using Bloom filters in MapReduce. In our approach, the Bloom filters are constructed
Starburst Mid-Flight: As the Dust Clears
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... ter, is improving the design of relational database management sys-tems and enhancing their performance, while building an extensible system to better support nontraditional applications (such as engineer-ing, geographic, office, etc.), and to serve as a testbed for future im-provements in database ..."
Abstract
-
Cited by 120 (4 self)
- Add to MetaCart
ter, is improving the design of relational database management sys-tems and enhancing their performance, while building an extensible system to better support nontraditional applications (such as engineer-ing, geographic, office, etc.), and to serve as a testbed for future im-provements in database technology. As of November 1989, we have an initial prototype of our system up and running. In this paper, we reflect on the design and implementation of the Starburst system to date. We examine some key design decisions, and how they affect the goal of improved structure and performance. We also examine how well we have met our goal of extensibility: what aspects of the system are ex-tensible, how extensions can be done, and how easy it is to add exten-sions. We discuss some actual extensions to the system, including the experiences of our first real customizers. Index Terms-Access methods, data structures, extensibility, plan optimization, query processing, relational database system, rule sys-tems.
Reed: Robust, efficient filtering and event detection in sensor networks
- In VLDB
, 2005
"... This paper presents a set of algorithms for efficiently evaluating join queries over static data tables in sensor networks. We describe and evaluate three algorithms that take advantage of distributed join techniques. Our algorithms are capable of running in limited amounts of RAM, can distribute th ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
This paper presents a set of algorithms for efficiently evaluating join queries over static data tables in sensor networks. We describe and evaluate three algorithms that take advantage of distributed join techniques. Our algorithms are capable of running in limited amounts of RAM, can distribute
Notes on Randomized Algorithms CS 469/569: Fall 2014
, 2014
"... Table of contents i List of figures xi List of tables xii List of algorithms xiii ..."
Abstract
- Add to MetaCart
Table of contents i List of figures xi List of tables xii List of algorithms xiii
Real-time Approximate Range Motif Discovery & Data Redundancy Removal Algorithm
"... Removing redundancy in the data is an important problem as it helps in resource and compute efficiency for downstream processing of massive (10 million to 100 million records) datasets. In application domains such as IR, stock markets, telecom and others there is a strong need for real-time data red ..."
Abstract
- Add to MetaCart
nearest neighbour search but is more computationally expensive. Real-time scalable approximate Range Motif discovery on massive datasets is a challenging problem. We present the design of novel sequential and parallel approximate Range Motif discovery and data de-duplication algorithms using Bloom filters
Abstract R * Optimizer Validation and Performance Evaluation for Distributed Queries
"... Few database query optimizer models have been validated against actual performance. This paper extends an earlier optimizer validation and per-formance evaluation of R ’ to di.rfribu & queries, i.e. single SQL statements having tables at multiple sites. Actual R * message, I/O, and CPU resources ..."
Abstract
- Add to MetaCart
of the inner table must be transferred to the join site, shipping the whole inner table dominated the strategy of fetching only those inner tuples that matched each outer-table value, even though the former strategy may require ad-ditional I/O. Bloomjoins (hashed semijoins) consistently performed better than
Advanced Bloom Filter Based Algorithms for Efficient Approximate Data De-Duplication in Streams
"... Data intensive applications and computing has emerged as a central area of mod-ern research with the explosion of data stored world-wide. Applications involv-ing telecommunication call data records, web pages, online transactions, med-ical records, stock markets, climate warning systems, etc., neces ..."
Abstract
- Add to MetaCart
Data intensive applications and computing has emerged as a central area of mod-ern research with the explosion of data stored world-wide. Applications involv-ing telecommunication call data records, web pages, online transactions, med-ical records, stock markets, climate warning systems, etc., necessitate efficient management and processing of such massively exponential amount of data from diverse sources. Duplicate detection and removal of redundancy from such multi-billion datasets helps in resource and compute efficiency for downstream process-ing. De-duplication or Intelligent Compression in streaming scenarios for ap-proximate identification and elimination of duplicates from such unbounded data stream is a greater challenge given the real-time nature of data arrival. Stable Bloom Filters (SBF) addresses this problem to a certain extent. However, SBF suffers from a high false negative rate and slow convergence rate, thereby render-ing it inefficient for applications with low false negative rate tolerances. ∗This work was completed at IBM Research, India.
Efficient Processing Distributed Joins with Bloomfilter using MapReduce y
"... The MapReduce framework has been widely used to process and analyze large-scale datasets over large clusters. As an essential problem, join operation among large clusters attracts more and more attention in recent years due to the utilization of MapReduce. Many strategies have been proposed to impro ..."
Abstract
- Add to MetaCart
Reduce. Based on these strategies, we design two algorithms for two-way join and one algorithm for multi-way join. The experimental results show that our algorithms can significantly improve the efficiency of current join algorithm. Moreover, cost models of these algorithms are characterized in order to find
Results 1 - 10
of
49