| Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1997. |
....a framework, first described in (Lee Stolfo 1998) of applying data mining techniques to build intrusion detection models. This framework consists of classification (and meta classification (Chan Stolfo 1993) association rule (Agrawal, Imielinski, Swami 1993) and frequent episode (Mannila, Toivonen, Verkamo 1995) programs, as well as a support environment that enables system builders to interactively and iteratively drive the process of constructing and evaluating detection models. The end product is concise and intuitive classificaCopyright c #1998, American Association for Artificial Intelligence ....
Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1995.
....the structure of the combination in the particular match. However a different approach would be to look for frequent combinations of sites in a particular order from the very beginning. A data mining software that can be adjusted for ordered combination discovery is so called episode analysis tool (Mannila, Toivonen, Verkamo 1995; Hatonen et al. 1996) We are currently implementing a transcription site combination analysis tool based on Find the occurrences of the factors within close range (600) Factors considered: BAF1 CBF1 RAP1 SBF E TUF TFIID . Chromosome: VII . ....
Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1995.
....quantities. A similar approach has been undertaken in the machine learning community, where it is usually called discrete sequence prediction (Laird Saul 1994) In both cases, the goal is to predict a forthcoming event. A third, more recent approach, is the frequent episodes discovery approach (Mannila, Toivonen, Verkamo 1995): the problem here is to find frequently appearing sequences given a stream of events labeled with a finite number of discrete symbols. This last approach is much more similar to the one described in this paper, which is to cluster whole sequences. The field of knowledge discovery in databases ....
Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1995.
....and relatively efficient algorithms exist for finding such patterns, but these common patterns are not guaranteed to be useful for prediction. Nonetheless, such algorithms have been used to identify regularities in telecommunication network alarm sequences in order to help predict future faults (Manilla, Toivonen Verkamo 1995) and to find sequential patterns in a database of customer transactions (Agrawal Srikant 1995) The Event Prediction Problem This section defines the event prediction problem, since our formulation differs in several key ways from the traditional time series prediction problem. Basic Problem ....
Manilla, H., Toivonen, H., and Verkamo, A. 1995.
....is the standard deviation of the sequence) forcing the mean to be 0 and the variance 1. Recently, more sophisticated time series distance measures have been investigated, such as the dynamic time warping (Berndt Clifford 1994) measure, the longest common subsequence measure (Das, Gunopulos, Mannila 1997; Bollob as et al. 1997) and various probabilistic distance measures (Keogh Smyth 1997) Due to space limitations we omit the details of their use but note that the results below can be easily generalized to handle any such distance measures. Clustering methods The first step in the ....
Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1997.
....sets and the smallest support thresholds. Our results support the notion that dbms techniques can be used profitably in building data mining tools (Holsheimer et al. 1995) We are currently investigating how this approach works on other topics, e.g. for finding integrity constraints on databases (Mannila Raiha 1994). While our goal was not to develop a yet faster association rule finding method, the approach described above gives some possibilities even for that. For example, if the construction of the tree in Section 4 succeeds in an optimal way, there will be very few alarms. While an optimal construction ....
Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1994.
....some t 1 ; t 2 ; t 0 1 ; t 0 2 2 [t; t 0 ] we have [t 1 ; t 0 1 ] 2 mo(P 1 ) and [t 2 ; t 0 2 ] 2 mo(P 2 ) and furthermore t = minft 1 ; t 2 g and t 0 = maxft 0 1 ; t 0 2 g. 2 We use the same algorithm skeleton as in the discovery of association rules (Agrawal Srikant 1994; Mannila, Toivonen, Verkamo 1994). Namely, having found the set L k of frequent simple episodes of size k, we form the set C k 1 of candidate episodes of size k 1, i.e. episodes whose all subepisodes are frequent, and then find out which candidate episodes P 2 C k 1 are really frequent by forming the set mo(P ) Algorithm 9, ....
Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1994.
....approach. In data mining, related problems in the area of discovering frequent patterns include association rules (Agrawal et al. 1996) episodes in sequences (Mannila, Toivonen, Verkamo 1997) and sequential patterns (Agrawal Srikant 1995) a family of problems discussed in more general in (Mannila Toivonen 1997). Within ILP, a closely related problem is the discovery of queries in first order logic that succeed with respect to a sufficient number of examples (Dehaspe De Raedt 1997) In (Dehaspe Toivonen 1998) we discuss the relationship of ILP to frequent pattern discovery, and relate data mining ....
....rent compound substructures and properties. Since the properties are also a result of the structure of a compound, for the rest of the paper we just talk collectively about (sub)structure discovery. This problem is an instance of the generic problem of finding all potentially interesting sentences (Mannila Toivonen 1997). Given a database r, a class L of sentences (patterns) and a selection predicate q which is used for evaluating whether a sentence Q 2 L defines a potentially interesting pattern in r. The task is to find the theory of r with respect to L and q, i.e. the set Th(L; r; q) fQ 2 L j q(r; Q) is ....
[Article contains additional citation context not shown here]
Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC