Results 11 - 20
of
70
Generating semantic annotations for frequent patterns with context analysis
- In Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’06
, 2006
"... As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on developing efficient algorithms to discover various kinds of frequent patterns, but little attention has been paid to the ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on developing efficient algorithms to discover various kinds of frequent patterns, but little attention has been paid to the important next step – interpreting the discovered frequent patterns. Although some recent work has studied the compression and summarization of frequent patterns, the proposed techniques can only annotate a frequent pattern with non-semantical information (e.g. support), which provides only limited help for a user to understand the patterns. In this paper, we propose the novel problem of generating semantic annotations for frequent patterns. The goal is to annotate a frequent pattern with in-depth, concise, and structured information that can better indicate the hidden meanings of the pattern. We propose a general approach to generate such an annotation for a frequent pattern by constructing its context model, selecting informative context indicators, and extracting representative transactions and semantically similar patterns. This general approach has potentially many applications such as generating a dictionarylike description for a pattern, finding synonym patterns, discovering semantic relations, and summarizing semantic classes of a set of frequent patterns. Experiments on different datasets show that our approach is effective in generating semantic pattern annotations.
A better tool than Allen’s relations for expressing temporal knowledge in interval data
, 2006
"... Temporal patterns composed of symbolic intervals are commonly formulated with Allen’s interval relations originating in temporal reasoning. We show that this representation has severe disadvantages for knowledge discovery. The patterns are not robust, in the sense that small disturbances of interva ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Temporal patterns composed of symbolic intervals are commonly formulated with Allen’s interval relations originating in temporal reasoning. We show that this representation has severe disadvantages for knowledge discovery. The patterns are not robust, in the sense that small disturbances of interval boundaries lead to different patterns for similar situations. The representation is ambiguous since the same pattern can have quantitatively widely varying appearances. For all but very simple cases the patterns are not understandable because the textual descriptions are lengthy and unstructured. We present the Time Series Knowledge Representation (TSKR), a new hierarchical language for interval patterns to express the temporal concepts of coincidence and partial order. We demonstrate the superiority of this novel form of representing temporal knowledge over Allen’s relations for data mining. Results on a real data set support our claims and show a successful application.
Swarm: Mining Relaxed Temporal Moving Object Clusters
"... Recent improvements in positioning technology make massive moving object data widely available. One important analysis is to find the moving objects that travel together. Existing methods put a strong constraint in defining moving object cluster, that they require the moving objects to stick togethe ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Recent improvements in positioning technology make massive moving object data widely available. One important analysis is to find the moving objects that travel together. Existing methods put a strong constraint in defining moving object cluster, that they require the moving objects to stick together for consecutive timestamps. Our key observation is that the moving objects in a cluster may actually diverge temporarily and congregate at certain timestamps. Motivatedbythis, wepropose theconceptofswarm which capturesthemovingobjectsthatmovewithinarbitraryshape of clusters for certain timestamps that are possibly nonconsecutive. The goal of our paper is to find all discriminative swarms, namely closed swarm. While the search space for closed swarms is prohibitively huge, we design a method, ObjectGrowth, to efficiently retrieve the answer. In ObjectGrowth, two effective pruning strategies are proposed to greatly reduce the search space and a novel closure checking rule is developed to report closed swarms on-thefly. Empirical studies on the real data as well as large synthetic data demonstrate the effectiveness and efficiency of our methods. 1.
Mining frequent closed unordered trees through natural representations
- Proceedings of ICCS 2007, 15th International Conference on Conceptual Structures
, 2007
"... Abstract. Many knowledge representation mechanisms consist of linkbased structures; they may be studied formally by means of unordered trees. Here we consider the case where labels on the nodes are nonexistent or unreliable, and propose data mining processes focusing on just the link structure. We p ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Abstract. Many knowledge representation mechanisms consist of linkbased structures; they may be studied formally by means of unordered trees. Here we consider the case where labels on the nodes are nonexistent or unreliable, and propose data mining processes focusing on just the link structure. We propose a representation of ordered trees, describe a combinatorial characterization and some properties, and use them to propose an efficient algorithm for mining frequent closed subtrees from a set of input trees. Then we focus on unordered trees, and show that intrinsic characterizations of our representation provide for a way of avoiding the repeated exploration of unordered trees, and then we give an efficient algorithm for mining frequent closed unordered trees. 1
Algorithms For Time Series Knowledge Mining
, 2006
"... Temporal patterns composed of symbolic intervals are commonly formulated with Allen’s interval relations originating in temporal reasoning. This representation has severe disadvantages for knowledge discovery. The Time Series Knowledge Representation (TSKR) is a new hierarchical language for interva ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Temporal patterns composed of symbolic intervals are commonly formulated with Allen’s interval relations originating in temporal reasoning. This representation has severe disadvantages for knowledge discovery. The Time Series Knowledge Representation (TSKR) is a new hierarchical language for interval patterns expressing the temporal concepts of coincidence and partial order. We present effective and efficient mining algorithms for such patterns based on itemset techniques. A novel form of search space pruning effectively reduces the size of the mining result to ease interpretation and speed up the algorithms. On a real data set a concise set of TSKR patterns can explain the underlying temporal phenomena, whereas the patterns found with Allen’s relations are far more numerous yet only explain fragments of the data.
Parallel mining of closed sequential patterns
- In: Proc. of the 11th ACM SIGKDD Int’l Conf. on Knowledge Discovery in Data Mining
, 2005
"... Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it. Unfort ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it. Unfortunately, there is no parallel closed sequential pattern mining method proposed yet. In this paper we develop an algorithm, called Par-CSP (Parallel Closed Sequential Pattern mining), to conduct parallel mining of closed sequential patterns on a distributed memory system. Par-CSP partitions the work among the processors by exploiting the divide-and-conquer property so that the overhead of interprocessor communication is minimized. Par-CSP applies dynamic scheduling to avoid processor idling. Moreover, it employs a technique, called selective sampling, to address the load imbalance problem. We implement Par-CSP using MPI on a 64-node Linux cluster. Our experimental results show that Par-CSP attains good parallelization efficiencies on various input datasets.
Mining control flow abnormality for logic error isolation
- In Proceedings of 2006 SIAM International Conference on Data Mining (SDM’06
, 2006
"... Analyzing the executions of a buggy program is essentially a data mining process: Tracing the data generated during program executions may disclose important patterns and outliers that could eventually reveal the location of software errors. In this paper, we investigate program logic errors, which ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Analyzing the executions of a buggy program is essentially a data mining process: Tracing the data generated during program executions may disclose important patterns and outliers that could eventually reveal the location of software errors. In this paper, we investigate program logic errors, which rarely incur memory access violations but generate incorrect outputs. We show that through mining program control flow abnormality, we could isolate many logic errors without knowing the program semantics. In order to detect the control abnormality, we propose a hypothesis testing-like approach that statistically contrasts the evaluation probability of condition statements between correct and incorrect executions. Based on this contrast, we develop two algorithms that effectively rank functions with respect to their likelihood of containing the hidden error. We evaluated these two algorithms on a set of standard test programs, and the result clearly indicates their effectiveness.
Efficient mining of closed repetitive gapped subsequences from a sequence database
- IN: ICDE 2009: PROC. OF THE 25TH INT. CONF. ON DATA ENGINEERING. IEEE COMPUTER SOCIETY, LOS ALAMITOS
, 2009
"... There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in sequ ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in sequences, we introduce the problem of mining closed repetitive gapped subsequences and propose efficient solutions. Given a database of sequences where each sequence is an ordered list of events, the pattern we would like to mine is called repetitive gapped subsequence, which is a subsequence (possibly with gaps between two successive events within it) of some sequences in the database. We introduce the concept of repetitive support to measure how frequently a pattern repeats in the database. Different from the sequential pattern mining problem, repetitive support captures not only repetitions of a pattern in different sequences but also the repetitions within a sequence. Given a userspecified support threshold min sup, we study finding the set of all patterns with repetitive support no less than min sup. Toobtain a compact yet complete result set and improve the efficiency, we also study finding closed patterns. Efficient mining algorithms to find the complete set of desired patterns are proposed based on the idea of instance growth. Our performance study on various datasets shows the efficiency of our approach. A case study is also performed to show the utility of our approach.
B.: Mining Long Sharable Patterns in Trajectories of Moving Objects
- In Proc. of STDBM
, 2006
"... Abstract. The efficient analysis of spatio–temporal data, generated by moving objects, is an essential requirement for intelligent location–based services. Spatiotemporal rules can be found by constructing spatio–temporal baskets, from which traditional association rule mining methods can discover s ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. The efficient analysis of spatio–temporal data, generated by moving objects, is an essential requirement for intelligent location–based services. Spatiotemporal rules can be found by constructing spatio–temporal baskets, from which traditional association rule mining methods can discover spatio–temporal rules. When the items in the baskets are spatio–temporal identifiers and are derived from trajectories of moving objects, the discovered rules represent frequently travelled routes. For some applications, e.g., an intelligent ridesharing application, these frequent routes are only interesting if they are long and sharable, i.e., can potentially be shared by several users. This paper presents a database projection based method for efficiently extracting such long, sharable frequent routes. The method prunes the search space by making use of the minimum length and sharable requirements and avoids the generation of the exponential number of sub–routes of long routes. Considering alternative modelling options for trajectories, leads to the development of two effective variants of the method. SQL–based implementations are described, and extensive experiments on both real life – and large–scale synthetic data show the effectiveness of the method and its variants. 1
Sequential pattern mining: A survey on issues and approaches
- in Encyclopedia of Data Warehousing and Mining, nformation Science Publishing
, 2005
"... ..."

