Results 1 - 10
of
12
Mining Closed Episodes with Simultaneous Events
"... Sequential pattern discovery is a well-studied field in data mining. Episodes are sequential patterns describing events that often occur in the vicinity of each other. Episodes can impose restrictions to the order of the events, which makes them a versatile technique for describing complex patterns ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Sequential pattern discovery is a well-studied field in data mining. Episodes are sequential patterns describing events that often occur in the vicinity of each other. Episodes can impose restrictions to the order of the events, which makes them a versatile technique for describing complex patterns in the sequence. Most of the research on episodes deals with special cases such as serial, parallel, and injective episodes, while discovering general episodes is understudied. In this paper we extend the definition of an episode in order to be able to represent cases where events often occur simultaneously. We present an efficient and novel miner for discovering frequent and closed general episodes. Such a task presents unique challenges. Firstly, we cannot define closure based on frequency. We solve this by computing a more conservative closure that we use to reduce the search space and discover the closed episodes as a postprocessing step. Secondly, episodes are traditionally presented as directed acyclic graphs. We argue that this representation has drawbacks leading to redundancy in the output. We solve these drawbacks by defining a subset relationship in such a way that allows us to remove the redundant episodes. We demonstrate the efficiency of our algorithm and the need for using closed episodes empirically on synthetic and real-world datasets.
The long and the short of it: summarising event sequences with serial episodes.
- In KDD,
, 2012
"... ABSTRACT An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand. Standard frequent pattern miners do not achieve this goal, as due to the pattern explosion typically very large numbers of hi ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand. Standard frequent pattern miners do not achieve this goal, as due to the pattern explosion typically very large numbers of highly redundant patterns are returned. We pursue the ideal for sequential data, by employing a pattern set mining approach-an approach where, instead of ranking patterns individually, we consider results as a whole. Pattern set mining has been successfully applied to transactional data, but has been surprisingly under studied for sequential data. In this paper, we employ the MDL principle to identify the set of sequential patterns that summarises the data best. In particular, we formalise how to encode sequential data using sets of serial episodes, and use the encoded length as a quality score. As search strategy, we propose two approaches: the first algorithm selects a good pattern set from a large candidate set, while the second is a parameterfree any-time algorithm that mines pattern sets directly from the data. Experimentation on synthetic and real data demonstrates we efficiently discover small sets of informative patterns.
Mining Train Delays
"... Abstract. The Belgian railway network has a high traffic density with Brussels as its gravity center. The star-shape of the network implies heavily loaded bifurcations in which knock-on delays are likely to occur. Knock-on delays should be minimized to improve the total punctuality in the network. B ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The Belgian railway network has a high traffic density with Brussels as its gravity center. The star-shape of the network implies heavily loaded bifurcations in which knock-on delays are likely to occur. Knock-on delays should be minimized to improve the total punctuality in the network. Based on experience, the most critical junctions in the traffic flow are known, but others might be hidden. To reveal the hidden patterns of trains passing delays to each other, we study, adapt and apply the state-of-the-art techniques for mining frequent episodes to this specific problem. 1
MARBLES: Mining Association Rules Buried in Long Event Sequences
"... Sequential pattern discovery is a well-studied field in data mining. Episodes are sequential patterns that describe events that often occur in the vicinity of each other. Episodes can impose restrictions on the order of the events, which makes them a versatile technique for describing complex patter ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Sequential pattern discovery is a well-studied field in data mining. Episodes are sequential patterns that describe events that often occur in the vicinity of each other. Episodes can impose restrictions on the order of the events, which makes them a versatile technique for describing complex patterns in the sequence. Most of the research on episodes deals with special cases such as serial and parallel episodes, while discovering general episodes is surprisingly understudied. This is particularly true when it comes to discovering association rules between them. In this paper we propose an algorithm that mines association rules between two general episodes. On top of the traditional definitions of frequency and confidence, we introduce two novel confidence measures for the rules. The major challenge in mining these association rules is pattern explosion. To limit the output, we aim to eliminate all redundant rules. We define the class of closed association rules, and show that this class contains all non-redundant output. To make the algorithm efficient, we use further pruning steps along the way. First of all, we generate only free and closed frequent episodes from which we create candidate rules, we speed up the evaluation of the rules, and finally prune the remaining non-closed rules from the output. 1
Mining and Classification of Multivariate Sequential Data
"... This thesis would not have been possible without the help of many people along the way. First and foremost I would like to thank my supervisors Prof. Gal A. Kaminka and Prof. Sarit Kraus for their endless guidance, insight and encouragement. Gal and Sarit provided a unique environment, and like pare ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
This thesis would not have been possible without the help of many people along the way. First and foremost I would like to thank my supervisors Prof. Gal A. Kaminka and Prof. Sarit Kraus for their endless guidance, insight and encouragement. Gal and Sarit provided a unique environment, and like parents complete one other to create the perfect team. It was an honor and a pleasure to have them as my supervisors. Many thanks are due to all members of the MAS group. Especially to Noa Agmon, Tammar Shrot and Galit Haim for their friendship and advise. To Yael Ejgenberg, Anat Sadeh-Or, Yoav Schwartz and Yael Blumberg for their assistance. I would like to thank Prof. Patrice L. Weiss and Dr. Sara Rosenblum for introducing the world of handwriting deficiencies. It was a pleasure working together. I gratefully acknowledge the financial support of the Israeli Ministry of Industry and Trade under the NEGEV project. I would like to thank to my family. Thanks to my parents Joan and Robert, without their upbringing and support, I would never be who I am. To my one and only sister Tammy, who never fails to listen and always has something sensible to
Mining Complex Event Patterns in Computer Networks
"... Abstract. More and more ubiquitous and mobile computer networks are becom-ing available, which leads to a massive growth in the amount of traffic and accord-ing log messages. For handling and managing networks efficiently, sophisticated approaches for network management and analysis are necessary. I ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. More and more ubiquitous and mobile computer networks are becom-ing available, which leads to a massive growth in the amount of traffic and accord-ing log messages. For handling and managing networks efficiently, sophisticated approaches for network management and analysis are necessary. In this paper, we show how to use temporal data mining in a declarative framework for analysing log files for computer networks. From a sequence of network management pro-tocol messages, we derive temporal association rules, which state frequent de-pendencies between events. We also present methods for extendable and modular parsing of text messages and their analysis in log files based on XML. 1
Noname manuscript No. (will be inserted by the editor) Mining Closed Strict Episodes ⋆
"... the date of receipt and acceptance should be inserted later Abstract Discovering patterns in a sequence is an important aspect of data mining. One popular choice of such patterns are episodes, patterns in sequential data describing events that often occur in the vicinity of each other. Episodes also ..."
Abstract
- Add to MetaCart
the date of receipt and acceptance should be inserted later Abstract Discovering patterns in a sequence is an important aspect of data mining. One popular choice of such patterns are episodes, patterns in sequential data describing events that often occur in the vicinity of each other. Episodes also enforce in which order the events are allowed to occur. In this work we introduce a technique for discovering closed episodes. Adopting existing approaches for discovering traditional patterns, such as closed itemsets, to episodes is not straightforward. First of all, we cannot define a unique closure based on frequency because an episode may have several closed superepisodes. Moreover, to define a closedness concept for episodes we need a subset relationship between episodes, which is not trivial to define. We approach these problems by introducing strict episodes. We argue that this class is general enough, and at the same time we are able to define a natural subset relationship within it and use it efficiently. In order to mine closed episodes we define an auxiliary closure operator. We show that this closure satisfies the needed properties so that we can use the existing framework for mining closed patterns. Discovering the true closed episodes can be done as a post-processing step. We combine these observations into an efficient mining algorithm and demonstrate empirically its performance in practice.
unknown title
, 2011
"... An unsupervised learning method for human activity recognition based on a temporal qualitative model ..."
Abstract
- Add to MetaCart
(Show Context)
An unsupervised learning method for human activity recognition based on a temporal qualitative model
unknown title
"... Le développement de méthodes d’analyse dynamique de l’information, comme le clus-tering incrémental et les méthodes de détection de nouveauté, devient une préoccupa-tion centrale dans un grand nombre d’applications dont le but principal est de traiter de larges volumes d’information variant au cours ..."
Abstract
- Add to MetaCart
Le développement de méthodes d’analyse dynamique de l’information, comme le clus-tering incrémental et les méthodes de détection de nouveauté, devient une préoccupa-tion centrale dans un grand nombre d’applications dont le but principal est de traiter de larges volumes d’information variant au cours du temps. Ces applications se rap-portent à des domaines très variés et hautement stratégiques, tels que l’exploration du Web et la recherche d’information, l’analyse du comportement des utilisateurs et les systèmes de recommandation, la veille technologique et scientifique, ou encore, l’analyse de l’information génomique en bioinformatique... Pour ne prendre en exemple qu’un type d’application sur des données textuelles, force est de constater que les publications sur des méthodes permettant de détecter les ruptures technologiques, les thématiques novatrices, sont très présentes dans les congrès et revues. Cet intérêt est souligné par la mise en place par la Commission
REEF: Resolving Length Bias in Frequent Sequence Mining
, 2013
"... Classic support based approaches efficiently ad-dress frequent sequence mining. However, support based mining has been shown to suffer from a bias towards short sequences. In this paper, we propose a method to resolve this bias when mining the most frequent sequences. In order to resolve the length ..."
Abstract
- Add to MetaCart
(Show Context)
Classic support based approaches efficiently ad-dress frequent sequence mining. However, support based mining has been shown to suffer from a bias towards short sequences. In this paper, we propose a method to resolve this bias when mining the most frequent sequences. In order to resolve the length bias we define norm-frequency, based on the statistical z-score of support, and use it to replace support based frequency. Our approach mines the subsequences that are frequent relative to other subsequences of the same length. Unfortunately, naive use of norm-frequency hinders mining scalability. Using norm-frequency breaks the anti-monotonic property of support, an important part in being able to prune large sets of candidate sequences. We describe a bound that enables pruning to provide scalability. Experimental results on textual and computer user input data establish that we manage to overcome the short sequence bias successfully, and to illustrate the production of meaningful sequences with our mining algorithm.