Results 1 - 10
of
10
Querying uncertain spatio-temporal data
- In Proc. ICDE
, 2012
"... Abstract — The problem of modeling and managing uncertain data has received a great deal of interest, due to its manifold applications in spatial, temporal, multimedia and sensor databases. There exists a wide range of work covering spatial uncertainty in the static (snapshot) case, where only one p ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
Abstract — The problem of modeling and managing uncertain data has received a great deal of interest, due to its manifold applications in spatial, temporal, multimedia and sensor databases. There exists a wide range of work covering spatial uncertainty in the static (snapshot) case, where only one point of time is considered. In contrast, the problem of modeling and querying uncertain spatio-temporal data has only been treated as a simple extension of the spatial case, disregarding time dependencies between consecutive timestamps. We present a framework for efficiently modeling and querying uncertain spatio-temporal data. The key idea of our approach is to model possible object trajectories by stochastic processes. This approach has three major advantages over previous work. First it allows answering queries in accordance with the possible worlds model. Second, dependencies between object locations at consecutive points in time are taken into account. And third it is possible to reduce all queries on this model to simple matrix multiplications. Based on these concepts we propose efficient solutions for different probabilistic spatio-temporal queries for a particular stochastic process, the Markov chain. In an experimental evaluation we show that our approaches are several order of magnitudes faster than state-of-the-art competitors. I.
Uncertain Time-Series Similarity: Return to the Basics
"... In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and engineering facilities to ensure efficiency, product qual ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
(Show Context)
In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and engineering facilities to ensure efficiency, product quality and safety, hydrologic and geologic observing systems, pollution management, and others. Due to the inherent imprecision of sensor observations, many investigations have recently turned into querying, mining and storing uncertain data. Uncertainty can also be due to data aggregation, privacy-preserving transforms, and error-prone mining algorithms. In this study, we survey the techniques that have been proposed specifically for modeling and processing uncertain time series, an important model for temporal data. We provide an analytical evaluation of the alternatives that have been proposed in the literature, highlighting the advantages and disadvantages of each approach, and further compare these alternatives with two additional techniques that were carefully studied before. We conduct an extensive experimental evaluation with 17 real datasets, and discuss some surprising results, which suggest that a fruitful research direction is to take into account the temporal correlations in the time series. Based on our evaluations, we also provide guidelines useful for the practitioners in the field. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present
DUST: a generalized notion of similarity between uncertain time series
- In SIGKDD
, 2010
"... Large-scale sensor deployments and an increased use of pri-vacy-preserving transformations have led to an increasing in-terest in mining uncertain time series data. Traditional dis-tance measures such as Euclidean distance or dynamic time warping are not always effective for analyzing uncertain time ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Large-scale sensor deployments and an increased use of pri-vacy-preserving transformations have led to an increasing in-terest in mining uncertain time series data. Traditional dis-tance measures such as Euclidean distance or dynamic time warping are not always effective for analyzing uncertain time series data. Recently, some measures have been proposed to account for uncertainty in time series data. However, we show in this paper that their applicability is limited. In spe-cific, these approaches do not provide an intuitive way to compare two uncertain time series and do not easily accom-modate multiple error functions. In this paper, we provide a theoretical framework that generalizes the notion of similarity between uncertain time series. Secondly, we propose DUST, a novel distance mea-sure that accommodates uncertainty and degenerates to the Euclidean distance when the distance is large compared to the error. We provide an extensive experimental validation of our approach for the following applications: classification, top-k motif search, and top-k nearest-neighbor queries.
Real-time data analytics in sensor networks
- in Managing and Mining Sensor Data
, 2012
"... Abstract. The proliferation of Wireless Sensor Networks (WSNS) in the past decade has provided the bridge between the physical and digital worlds, enabling the monitoring and study of physical phenomena at a granularity and level of detail that was never before possible. In this study, we review the ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract. The proliferation of Wireless Sensor Networks (WSNS) in the past decade has provided the bridge between the physical and digital worlds, enabling the monitoring and study of physical phenomena at a granularity and level of detail that was never before possible. In this study, we review the efforts of the research community with respect to two important problems in the context of WSNS: real-time collection of the sensed data, and real-time processing of these data series.
Top-k Nearest Neighbor Search In Uncertain Data Series
"... Many real applications consume data that is intrinsically uncertain, noisy and error-prone. In this study, we investigate the problem of finding the top-k nearest neighbors in uncertain data series, which occur in several different domains. We formalize the top-k nearest neighbor problem for uncerta ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Many real applications consume data that is intrinsically uncertain, noisy and error-prone. In this study, we investigate the problem of finding the top-k nearest neighbors in uncertain data series, which occur in several different domains. We formalize the top-k nearest neighbor problem for uncertain data series, and describe a model for uncertain data series that captures both uncertainty and correla-tion. This distinguishes our approach from prior work that com-promises the accuracy of the model by assuming independence of the value distribution at neighboring time-stamps. We intro-duce theHolistic-PkNN algorithm, which uses novel metric bounds for uncertain series and an efficient refinement strategy to reduce the overall number of required probability estimates. We evalu-ate our proposal under a variety of settings using a combination of synthetic and 45 real datasets from diverse domains. The results demonstrate the significant advantages of the proposed approach. 1.
Random Error Reduction in Similarity Search on Time Series: A Statistical Approach
"... Errors in measurement can be categorized into two types: systematic errors that are predictable, and random errors that are inherently unpredictable and have null expected value. Random error is always present in a measurement. More often than not, readings in time series may contain inherent random ..."
Abstract
- Add to MetaCart
(Show Context)
Errors in measurement can be categorized into two types: systematic errors that are predictable, and random errors that are inherently unpredictable and have null expected value. Random error is always present in a measurement. More often than not, readings in time series may contain inherent random errors due to causes like dynamic error, drift, noise, hysteresis, digitalization error and limited sampling frequency. Random errors may affect the quality of time series analysis substantially. Unfortunately, most of the existing time series analysis methods do not address random errors, possibly because random error in a time series, which can be modeled as a random variable of unknown distribution, is hard to handle. In this paper, we tackle this challenging problem. Taking similarity search as an example, which is an essential task in time series analysis, we develop MISQ, a statistical approach for random error reduction in time series analysis. The major intuition in our method is to use only the readings at different time instants in a time series to reduce random errors. We achieve a highly desirable property in MISQ: it can ensure that the recall is above a user-specified threshold. An extensive empirical study on 20 benchmark real data sets clearly shows that our method can lead to better performance than the baseline method without random error reduction in real applications such as classification. Moreover, MISQ achieves good quality in similarity search. I.
Managing Uncertain Spatio-Temporal Data
"... Many spatial query problems defined on uncertain data are computationally expensive, in particular, if in addition to spatial attributes, a time component is added. Although there exists a wide range of applications dealing with uncertain spatio-temporal data, there is no solution for efficient mana ..."
Abstract
- Add to MetaCart
(Show Context)
Many spatial query problems defined on uncertain data are computationally expensive, in particular, if in addition to spatial attributes, a time component is added. Although there exists a wide range of applications dealing with uncertain spatio-temporal data, there is no solution for efficient management of such data available yet. This paper is the first work to propose general models for spatiotemporal uncertain data that have the potential to allow efficient processing on a wide range of queries. The main challenge here is to unfold this potential by developing new algorithms based on these models. In addition, we give examples of interesting spatiotemporal queries on uncertain data. 1.
Top-k Similarity Join over Multi-valued Objects
"... Abstract. The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects U and V, a top-k similarity join returns k pairs of most similar objects fro ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects U and V, a top-k similarity join returns k pairs of most similar objects from U ×V. In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity between two objects is usually measured by distance metrics such as Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join queries over multi-valued objects. We apply quantile based distance to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques. 1
Under consideration for Knowledge and Information Systems Sliding Windows over Uncertain Data Streams
, 2013
"... Abstract. Uncertain data streams can have tuples with both value and existential un-certainty. A tuple has value uncertainty when it can assume multiple possible values. A tuple is existentially uncertain when the sum of the probabilities of its possible values is less than 1. A situation where exis ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Uncertain data streams can have tuples with both value and existential un-certainty. A tuple has value uncertainty when it can assume multiple possible values. A tuple is existentially uncertain when the sum of the probabilities of its possible values is less than 1. A situation where existential uncertainty can arise is when applying rela-tional operators to streams with value uncertainty. Several prior works have focused on querying and mining data streams with both value and existential uncertainty. How-ever, none of them have studied, in depth, the implications of existential uncertainty on sliding window processing, even though it naturally arises when processing uncertain data. In this work, we study the challenges arising from existential uncertainty, more specifically the management of count-based sliding windows, which are a basic building block of stream processing applications. We extend the semantics of sliding window to define the novel concept of uncertain sliding windows, and provide both exact and approximate algorithms for managing windows under existential uncertainty. We also show how current state-of-the-art techniques for answering similarity join queries can be easily adapted to be used with uncertain sliding windows. We evaluate our proposed techniques under a variety of configurations using real data. The results show that the algorithms used to maintain uncertain sliding windows can efficiently operate while providing a high quality approximation in query answering. In addition, we show that
ARC DP120104168, and NSFC61021004.
"... Efficient top-k similarity join processing ..."
(Show Context)