Results 1 - 10
of
12
Mining Probabilistically Frequent Sequential Patterns in Uncertain Databases
"... Data uncertainty is inherent in many real-world applications such as environmental surveillance and mobile tracking. As a result, mining sequential patterns from inaccurate data, such as sensor readings and GPS trajectories, is important for discovering hidden knowledge in such applications. Previou ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Data uncertainty is inherent in many real-world applications such as environmental surveillance and mobile tracking. As a result, mining sequential patterns from inaccurate data, such as sensor readings and GPS trajectories, is important for discovering hidden knowledge in such applications. Previous work uses expected support as the measurement of pattern frequentness, which has inherent weaknesses with respect to the underlying probability model, and is therefore ineffective for mining high-quality sequential patterns from uncertain sequence databases. In this paper, we propose to measure pattern frequentness based on the possible world semantics. We establish two uncertain sequence data models abstracted from many real-life applications involving uncertain sequence data, and formulate the problem of mining probabilistically frequent sequential patterns (or p-FSPs) from data that conform to our models. Based on the prefix-projection strategy of the famous PrefixSpan algorithm, we develop two new algorithms, collectively called U-PrefixSpan, for p-FSP mining. U-PrefixSpan effectively avoids the problem of “possible world explosion”, and when combined with our three pruning techniques and one validating technique, achieves good performance. The efficiency and effectiveness of U-PrefixSpan are verified through extensive experiments on both real and synthetic datasets.
Community Trend Outlier Detection using Soft Temporal Pattern Mining
"... Abstract. Numerous applications, such as bank transactions, road traffic, and news feeds, generate temporal datasets, in which data evolves continuously. To understand the temporal behavior and characteristics of the dataset and its elements, we need effective tools that can capture evolution of the ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
(Show Context)
Abstract. Numerous applications, such as bank transactions, road traffic, and news feeds, generate temporal datasets, in which data evolves continuously. To understand the temporal behavior and characteristics of the dataset and its elements, we need effective tools that can capture evolution of the objects. In this paper, we propose a novel and important problem in evolution behavior discovery. Given a series of snapshots of a temporal dataset, each of which consists of evolving communities, our goal is to find objects which evolve in a dramatically different way compared with the other community members. We define such objects as community trend outliers. It is a challenging problem as evolutionary patterns are hidden deeply in noisy evolving datasets and thus it is difficult to distinguish anomalous objects from normal ones. We propose an effective two-step procedure to detect community trend outliers. We first model the normal evolutionary behavior of communities across time using soft patterns discovered from the dataset. In the second step, we propose effective measures to evaluate chances of an object deviating from the normal evolutionary patterns. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting community trend outliers. 1
Mining Probabilistic Frequent Spatio-Temporal Sequential Patterns with Gap Constraints from Uncertain Databases
"... Abstract—Uncertainty is common in real-world applications, for example, in sensor networks and moving object tracking, resulting in much interest in itemset mining for uncertain transaction databases. In this paper, we focus on pattern mining for uncertain sequences and introduce probabilistic frequ ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Uncertainty is common in real-world applications, for example, in sensor networks and moving object tracking, resulting in much interest in itemset mining for uncertain transaction databases. In this paper, we focus on pattern mining for uncertain sequences and introduce probabilistic frequent spatial-temporal sequential patterns with gap constraints. Such patterns are important for the discovery of knowledge given uncertain trajectory data. We propose a dynamic programming approach for computing the frequentness probability of these patterns, which has linear time complexity, and we explore its embedding into pattern enumeration algorithms using both breadth-first search and depth-first search strategies. Our extensive empirical study shows the efficiency and effectiveness of our methods for synthetic and real-world datasets. Keywords-Uncertain databases, Uncertain pattern mining, Sequential patterns, Spatial-temporal data I.
On probabilistic models for uncertain sequential pattern mining
, 2010
"... We study uncertainty models in sequential pattern mining. We consider situations where there is uncertainty either about a source or an event. We show that both these types of uncertainties could be modelled using probabilistic databases, and give possible-worlds semantics for both. We then describ ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We study uncertainty models in sequential pattern mining. We consider situations where there is uncertainty either about a source or an event. We show that both these types of uncertainties could be modelled using probabilistic databases, and give possible-worlds semantics for both. We then describe ”interestingness ” criteria based on two notions of frequentness (previously studied for frequent itemset mining) namely expected support [C. Aggarwal et al. KDD’09;Chui et al., PAKDD’07,’08] and probabilistic frequentness [Bernecker et al., KDD’09]. We study the interestingness criteria from a complexity-theoretic perspective, and show that in case of source-level uncertainty, evaluating probabilistic frequentness is #P-complete, and thus no polynomial time algorithms are likely to exist, but evaluate the interestingness predicate in polynomial time in the remaining cases.
Efficient Matching of Substrings in Uncertain Sequences
"... Substring matching is fundamental to data mining methods for sequential data. It involves checking the existence of a short subsequence within a longer sequence, ensuring no gaps within a match. Whilst a large amount of existing work has focused on substring matching and mining techniques for certai ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Substring matching is fundamental to data mining methods for sequential data. It involves checking the existence of a short subsequence within a longer sequence, ensuring no gaps within a match. Whilst a large amount of existing work has focused on substring matching and mining techniques for certain sequences, there are only a few results for uncertain sequences. Uncertain sequences provide powerful representations for modelling sequence behavioural characteristics in emerging domains, such as bioinformatics, sensor streams and trajectory analysis. In this paper, we focus on the core problem of computing substring matching probability in uncertain sequences and propose an efficient dynamic programming algorithm for this task. We demonstrate our approach is both competitive theoretically, as well as effective and scalable experimentally. Our results contribute towards a foundation for adapting classic sequence mining methods to deal with uncertain data.
Mining 3D Key-Pose-Motifs for Action Recognition
"... Abstract Recognizing an action from a sequence of 3D skeletal poses is a challenging task. First ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Recognizing an action from a sequence of 3D skeletal poses is a challenging task. First
Mining Order-preserving SubMatrices from Probabilistic Matrices
, 2014
"... The Order-Preserving SubMatrices (OPSMs) capture consensus trends over columns shared by rows in a data matrix. Mining OPSM patterns discovers important and interesting local correlations in many real ap-plications, such as those involving biological data or sensor data. The prevalence of uncertain ..."
Abstract
- Add to MetaCart
The Order-Preserving SubMatrices (OPSMs) capture consensus trends over columns shared by rows in a data matrix. Mining OPSM patterns discovers important and interesting local correlations in many real ap-plications, such as those involving biological data or sensor data. The prevalence of uncertain data in various applications, however, poses new challenges for OPSM mining, since data uncertainty must be incorporated into OPSM modeling and the algorithmic aspects. In this paper, we define new probabilistic matrix representations to model uncertain data with continuous distributions. A novel Probabilistic Order-Preserving SubMatrix (POPSM) model is formalized in order to capture similar local correlations in probabilistic matrices. The POPSM model adopts a new probabilistic support measure that evaluates the extent to which a row belongs to a POPSM pattern. Due to the intrinsic high computational complexity of the POPSM mining problem, we utilize the anti-monotonic property of the probabilistic support measure and propose an efficient Apriori-based mining framework called PROBAPRI to mine POPSM patterns. The framework consists of two mining methods, UNIAPRI and NORMAPRI, which are developed for mining POPSM patterns respectively from two representative types of probabilistic matrices, the UniDist matrix (assuming uniform data distributions) and the NormDist matrix (assuming normal data distributions). We show that the NORMAPRI method is practical enough for mining POPSM patterns from
Uncertainty in Sequential Pattern Mining
"... Abstract. We study uncertainty models in sequential pattern mining. We discuss some kinds of uncertainties that could exist in data, and show how these uncertainties can be modelled using probabilistic databases. We then obtain possible world semantics for them and show how frequent sequences could ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We study uncertainty models in sequential pattern mining. We discuss some kinds of uncertainties that could exist in data, and show how these uncertainties can be modelled using probabilistic databases. We then obtain possible world semantics for them and show how frequent sequences could be mined using the probabilistic frequentness measure.
OUTLIER DETECTION FOR INFORMATION NETWORKS
, 2013
"... The study of networks has emerged in diverse disciplines as a means of analyzing complex relation-ship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and ma ..."
Abstract
- Add to MetaCart
(Show Context)
The study of networks has emerged in diverse disciplines as a means of analyzing complex relation-ship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and many other forms of network analysis. Only recently has there been some work in the area of outlier detection for information network data. Outlier (or anomaly) detection is a very broad field and has been studied in the context of a large number of application domains. Many algorithms have been proposed for outlier detection in high-dimensional data, uncertain data, stream data and time series data. By its inherent nature, network data provides very different challenges that need to be addressed in a special way. Network data is gigantic, contains nodes of different types, rich nodes with associated attribute data, noisy attribute data, noisy link data, and is dynamically evolving in multiple ways. This thesis focuses on outlier detection for such networks with respect to two interesting perspectives: (1) community based outliers and (2) query based outliers. For community based outliers, we discuss the problem in both static as well as dynamic settings.
An Effective Approach to Mine Frequent Sequential Pattern over Uncertain dataset
"... Abstract — In recent years, due to the wide applications of uncertain data, mining frequent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. There are several ..."
Abstract
- Add to MetaCart
Abstract — In recent years, due to the wide applications of uncertain data, mining frequent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. There are several application in which the data mining on uncertain data is useful like sensor network monitoring, moving object search etc. In this paper we are focusing on mining frequent sequential pattern using SeqU-PrefixSpan algorithm. Using this algorithm we can find the frequent sequential pattern from an uncertain database. We are proposing one incremental approach for this algorithm, which will also find the patterns and reduce the time of execution by dividing the data into parts. Data input will be in parts which are done randomly. Due to this the execution time will be less as compared to existing algorithm.