| MANNILA, H., 1997, "Methods and problems in data mining", In: Proceedings of the 6th International Conference on Database Theory (ICDT'97), LNCS - Lecture Notes in Computer Science, v. 1186, Springer-Verlag, pp. 41-55, Delphi, Greece, Jan. |
....values of l and m have been proposed [11, 23, 31] PPM is useful to nd patterns in sequences and particularly in hyperlink traversals. In contrast, our technique to generate prediction rules is aimed to nd semantic links among queries. Data Mining is about techniques for nding patterns in data [24, 2, 15, 28]. An important technique of data mining is the generation of association rules, which nd intra session patterns. Association rules were rst introduced in [3] and have motivated a considerable amount of work, as presented in [4] The algorithms presented in this paper are inspired from those ....
H. Mannila. Methods and problems in data mining. In Proceedings of the 6th International Conference on Database Theory ICDT '97, pages 41-55, Delphi, Greece, January 1997.
.... D, the problem of mining association rules is to generate all association rules that have support and confidence greater than a user specified minimum support (minsup) and minimum confidence (minconf) Generating association rules involves looking for so called frequent itemsets in the data [19]. Indeed, the support of the rule X Y equals the frequency of the itemset X, Y . Thus by looking for frequent itemsets, we can determine the support of each rule. Definition 1 Frequency of an itemset (adapted from [19] s(X, D) represents the frequency of itemset X in D, i.e. the fraction of ....
....rules involves looking for so called frequent itemsets in the data [19] Indeed, the support of the rule X Y equals the frequency of the itemset X, Y . Thus by looking for frequent itemsets, we can determine the support of each rule. Definition 1 Frequency of an itemset (adapted from [19]) s(X, D) represents the frequency of itemset X in D, i.e. the fraction of transactions in D that contain X. Definition 2 Frequent itemset (adapted from [19] An itemset X is called frequent in D, if s(X, D) with the minsup. A typical approach [2] to discover all frequent sets X is to ....
[Article contains additional citation context not shown here]
Mannila, H. Methods and problems in data mining. In Afrati, F., and Kolaitis, P. (eds.). Proceedings of the International Conference on Database Theory, 41-55, 1997.
....language in Section 3 using examples of which some are classical and others novel. Section 4 then gives a brief description of the theoretical models that the language is built upon. Finally, we summarize our work in Section 5. 2. Related Work The work in this paper was motivated from Mannila s [9] discussion on a theoretical framework for data mining. He commented on the ad hoc situation of data mining research and called for a systematic framework to develop KDD applications. A framework for mining rules was discussed but lacked concrete details. Our work continues from where Mannila ....
H. Mannila. Methods and Problems in Data Mining. In Proc. of the Int. Conf. on Database Theory, Delphi, Greece, Jan. 1997.
....semantics of their composition and the objective of the domain analyst. In summary, the characteristics observed above should provide su#cient expressiveness in mining intra transaction rules current considered by the framework. 6. Related Work The work in this paper was motivated from Mannila s [15, 21, 22] discussion on a theoretical framework for data mining. He commented on the ad hoc situation of data mining research and called for a systematic framework to develop KDD applications. A framework for mining rules was discussed but lacked details to support its feasibility. Our work continues from ....
H. Mannila. Methods and Problems in Data Mining. In Proc. of ICDT, Delphi, Greece, January 1997.
....language in Section 3 using examples of which some are classical and others novel. Section 4 then gives a brief description of the theoretical models that the language is built upon. Finally, we summarize our work in Section 5. 2. Related Work The work in this paper was motivated from Mannila s [10] discussion on a theoretical framework for data mining. He commented on the ad hoc situation of data mining research and called for a systematic framework to develop KDD applications. A framework for mining rules was discussed but lacked concrete details. Our work continues from where Mannila ....
H. Mannila. Methods and Problems in Data Mining. In Proc. of ICDT, Greece, 1997.
....is C4.5 in order to be able to handle with numerical attributes (which its predecessor 11)3 cannot) B. Highlighting Prediction association rules Here we are going to implement a more general approach which is in fact usable in other kinds of KDD tasks as well. It was described by Manilla in [6]. A fairly large class of data mining tasks can be described as the search for interesting and frequently occurring patterns from the data. That is, we are given a class P of patterns or sentences that describe properties of the data, and we can specify whether a pattern p e P occurs frequently ....
H. Mannila, "Methods and Problems in Data Mining," in the Proceedings of International Conference on Database Theory, Jan. 1997, Delphi, Springer-Verlag.
.... of association rule mining includes discovery of web content and usage rules [20] phrase association rules from text [38] reducing telecommunication order failures and detecting redundant medical tests [10] recurrent images in medical image databases [76] co citation in scienti c papers [55], automatic classi cation of e mail messages [43] There are variants of association rule mining such as implication rules which can help nd rules like heads of households do not have personal care limitations from U.S Census data [14] We will revisit association rules once again since it is ....
....that there is a need for novel parallel algorithms. We refer the reader to Fayyad et al. 27, 28, 26, 71] for an overview of KDD and data mining. For an overview from the database perspective, see Chen s survey [16] In another introductory article Mannila discusses the eld and open problems [55]. For association rule mining, an excellent survey is due to Zaki [78] which covers both sequential and parallel algorithms and discusses open problems in parallel association mining. Hipp et al. 42] study sequential algorithms and benchmark some of them. In following sections we expound on ....
H. Mannila. Methods and problems in data mining. In Proceedings of International Conference on Database Theory (ICDT), pages 41-55, 1997.
.... anti monotonicity property underlying the a priori algorithm has subsequently been generalized to levelwise search [10] As a matter of fact, the a priori trick is applicable in many other data mining tasks, such as the discovery of keys, inclusion dependencies, functional dependencies, episodes [9, 10], and other kinds of rules [15] With the advent of data mining primitives in query languages, it is interesting and important to explore to which extent the a priori technique can be incorporated into next generation query optimizers. During an invited tutorial at ICDT 97, Heikki Mannila raised ....
H. Mannila. Methods and problems in data mining. In Proc. Int. Conf. on Database Theory, Delphi, Greece, 1997.
....et al. The second sub problem can be solved in main memory in a straightforward manner once all frequent itemsets and their support are known. Hence, the problem of mining association rules is reduced to the problem of finding frequent itemsets. Many algorithms have been proposed in the literature [2, 5, 12, 13, 15, 16, 18]. Although they are very dissimilar, they are all based on the Apriori mining method [2] pruning the subset lattice in order to find frequent itemsets. This relies on the basic properties that all subsets of a frequent itemset are frequent and that all supersets of an infrequent itemset are ....
H. Mannila. Methods and problems in data mining. Proceedings of the 6th Int'l Conference on Database Theory, pp. 41--55 (1997).
....zero at a rate that is a negative exponential function of the square of the difference bp l Gamma k when k is above bp l . When k is large compared to 1, the probability that the set needs processing approaches 1 at a similar negative exponential rate. 1. Introduction The Apriori Algorithm [1, 2, 3, 8] solves the frequent item sets problem. The algorithm analyzes a data set to determine which combination of items occur together frequently. Consider a store with jI j items where b shoppers each have a single basket. Each shopper selects a set of items for his basket. The input to the Apriori ....
....k, the algorithm determines which sets of items are contained in at least k of the b baskets. The Apriori Algorithm is at the core of various algorithms for data mining problems. The best known such problem is the problem of finding the association rules that hold in a basket items relation [1, 2, 3, 8, 12]. Other data mining problems based on the Apriori Algorithm are discussed in [7, 8, 10, 13, 14] Let J l be a subset of size l 1 that is selected from the jI j items. For a particular set J l , define J Gammah l to be the set obtained from J l by omitting element h (a set of size l Gamma 1 ....
[Article contains additional citation context not shown here]
H. Mannila, Methods and problems in data mining, in Proceedings of the International Conference on Database Theory, pp 41--55, 1997.
.... masses of unanalyzed or underanalyzed data, has become an important research area [6] Considering a data mining process as a sequence of queries over the data but also generalizations of the data, the so called theory of the data, has been more or less explicitly used for various mining tasks [8, 11]. Given a language L of patterns (e.g. association rules, data dependencies) the theory of a database r with respect to L and a selection predicate q is the set Th(r; L; q) f 2 L j q(r; g: The predicate q indicates whether a pattern is considered interesting (e.g. denotes a property ....
Heikki Mannila. Methods and problems in data mining. In Proceedings of the International Conference on Database Theory (ICDT'97), volume 1186 of Lecture Notes in Computer Science, pages 41-55. Springer-Verlag, 1997.
....may involve spreadsheets, statistical packages, graphical tools etc. The above mentioned features are very similar to the relational database facilities (as discussed in the subsection 2.1) Now we will pass to some speci c features of KDD. The KDD process is inherently interactive and iterative [8]. According to Mannila, the KDD systems should be seen as an interactive tool, not as automatic analysis systems. An iteration of the basic steps in the KDD process is necessary: The discovered patterns can show that some changes should be made to the data set formation step. Post processing of ....
Mannila, H., Methods and problems in data mining. Proc. of Int. Conf. Of Database Theory, 1997
....support (minsup) and minimum confidence (minconf) Generating association rules involves looking for so called frequent sets in the data. Indeed, the support of the rule X Y equals the frequency of the set X, Y . Thus by looking for frequent sets, we can determine the support of each rule [Mannila 1997]. Definition 1 Frequency of an itemset s(X, D) represents the frequency of itemset X in D, i.e. the fraction of transactions of D that contain X. # Definition 2 Frequent itemset An itemset X is called frequent in D, if s(X, D) s with s the minsup. # A typical approach [Agrawal, Mannila, ....
Mannila, H. 1997. Methods and problems in data mining. In Afrati, F.; and Kolaitis, P. eds. Proceedings of the International Conference on Database Theory, Springer-Verlag, 41-55.
....itemsets in that class exceeding a minimum user defined support threshold. In fact, by using a minimum support threshold, we are certain that all discovered rules are minimaly s complete. The discovery of frequent itemsets has been studied extensively in the literature on association rules [Mannila 1997; Agrawal Srikant 1994; Agrawal, Imielinski Swami, 1993] of which the following provides a short formal overview. Definition 4 Frequency of an itemset s (X, D) represents the frequency of itemset X in D, i.e. the fraction of transactions in the database D that contain X. # Definition 5 ....
Mannila, H. (1997), "Methods and problems in data mining", Proceedings of the International Conference on Database Theory, 41-55.
....the amounts of data in many databases have grown tremendously large. KDD means the application of nontrivial procedures for identifying effective, coherent, potentially useful, and previously unknown patterns in large databases [13] The KDD process generally consists of the following three phases [12, 25]. 1)Pre processing: This consists of all the actions taken before the actual data analysis process starts [12] Famili et al. think that it may be performed on the data for the following reasons: solving data problems that may prevent us from performing any type of analysis on the data, ....
H. Mannila, "Methods and problems in data mining," The International Conference on Database Theory, 1997.
....necessarily a qualitative one. Several machine learning techniques have been proposed so far, including the induction of deci AI Communications ISSN 0921 7126, IOS Press. All rights reserved 2 Wotawa et al. Deriving Qualitative Rules from Neural Networks sion trees [17] association rules [13], or neural networks [7] Bratko, Muggleton, and Varsek [1] introduced another interesting approach for deriving qualitative models out of available data. Their work is quite similar to our work with one exception. They use inductive logic programming techniques for deriving QSIM models from data, ....
....we compare the outcome of our methods with the outcome of two machine learning and data mining methods applied to the same data sets. The first method is the ID3 algorithm [17] computing a decision tree from data. The second method computes a set of association rules from data using frequent sets [13]. For the evaluation we convert the available numerical data from the ozone forecasting domain to a qualitative data set using the same mapping as for our neural network approach. Figure 2 depicts the mappings for the parameters ozone, temperature, wind speed, and cloud cover. 5.1. Decision ....
[Article contains additional citation context not shown here]
Heikki Mannila. Methods and problems in data mining. In Proceedings of the International Conference on Database Theory, Delphi, Greece, January 1997. Springer-Verlag.
....The big problem is the absence of a general format to represent documents that allows one to associate a semantic meaning to some parts of the documents. We think that the use of the Document Type Definition of XML may help to solve this problem. Our idea is to use data mining techniques [13] to split a set of documents in subsets having a quite similar structure using the structural information present in the document. In particular, two approaches are possible. The first one is to define prototype documents to which the documents are compared. If a document is similar to a ....
M. Heikki. Methods and Problems in Data Mining. In F. Afrati and P. Kolaitis, editors, Database Theory - ICDT'97, pages 41--55, 1997.
....The second subproblem can be solved in main memory in a straightforward manner once all frequent itemsets and their support are known. Hence, the problem of mining association rules is reduced to the problem of nding frequent itemsets. Many algorithms have been proposed in the literature [2, 3, 9, 8, 11, 12, 13]. Although they are very dioeerent from each other, they are all based on the Apriori mining method [2] pruning of the subset lattice for nding frequent itemsets. This relies on the basic properties that all subsets of a frequent itemset are frequent and that all supersets of an infrequent ....
H. Mannila. Methods and problems in data mining. Proceedings of the Int'l Conference on Database Theory, pages 4155, January 1997.
....traversal subsequences in log data to extract frequently occurring consecutive subsequences. This leads to maximal reference sequence which are those frequent subsequences that are not subset of others. The problem is similar to order version of finding large itemsets in transaction databases [20,21] discussed under mining of association rules. An improvement over maximal forward references which considers backward references is discussed in [22] and uses a transaction model for data extracted from server access logs to discover sequential patterns. This approach combines all the entries for ....
H. Mannila, Methods and Problems in Data Mining. In Proceedings of International Conference on Database theory, Delphi, Greece, January 1997.
....of subsets using techniques like the hot set selection introduced by Data Desk would be very effective. 3 Visualisation and Data Mining Groups of methods for obtaining information from large data sets have recently attracted attention under the common name of Data Mining. Fayyad et al. [1996] Mannila [1997]) Up till now there has not been much use of visualisation in this field and there is little available of any consequence in Data Mining software. This may be because many methods work with only discrete variables (so that continuous variables have to be discretised) and because graphical tools ....
Mannila, H. (1997). Methods and Problems in Data Mining. In Afrati, F. and Kolaitis, P. (Ed.), International Conference on Database Theory, . Delphi: Springer.
....itemsets in that class exceeding a minimum user defined support threshold. In fact, by using a minimum support threshold, we are certain that all discovered rules are minimaly s complete. The discovery of frequent itemsets has been studied extensively in the literature on association rules (Mannila, 1997; Agrawal and Srikant, 1994; Agrawal, Imielinski and Swami, 1993) of which the following provides a short formal overview. Definition 4 Frequency of an itemset (X, D ) represents the frequency of itemset X in D, i.e. the fraction of transactions in the database D that contain X. # Definition ....
Mannila, H. (1997), "Methods and problems in data mining", Proceedings of the International Conference on Database Theory, 41-55.
....Minimal Keys in a Relation Instance C. Giannella and C.M. Wyss May 14, 1999 Abstract Mannila [11] cites as an open problem in Data Mining the problem of finding all minimal keys in a relation instance using only time that is polynomial in the number of relation attributes and number of minimal keys and sub quadratic in the size of the relation instance. This paper investigates the efficacy of ....
....of finding all minimal keys given a particular relation instance (without recourse to a pre existing set of functional dependencies) has barely been touched upon. Yet Mannila cites the latter key finding problem (given only a relation instance) as an important open problem in data mining (see [11]) It is hard, at first glance, to think of any important data mining applications of finding minimal keys given a relation instance. In data mining, we are most often interested in functional dependencies or association rules in large bodies of data, not keys. What would be the advantage of ....
[Article contains additional citation context not shown here]
Mannila, Heikki. "Methods and Problems in Data Mining.", Proceedings of International Conference on Database Theory, January 1997, Afrati, Kolaitis (ed.), Springer-Verlag
....of these fields. KDD process is an interactive and iterative multi step process which uses 5 CHAPTER 2. A SURVEY IN ASSOCIATION RULES 6 data mining techniques to extract interesting knowledge according to some specific measures and thresholds. Fayyad et al. FPSS96a, FPSS96b] and Mannila [Man96, Man97] describe the steps of knowledge discovery as follows: 1. Understanding the domain, the prior knowledge and the goals of end user, 2. creating a target data set, 3. pre processing the data set (selection of data resources, cleaning the data from errors and noise, handling unknown values, ....
Heikki Mannila. Methods and problems in data mining. In Proceedings of 6 th Intl. Conf. on Database Theory (ICDT'97), pages 41--55, Delphi, Greece, January 1997.
....When we know the change of the trees we could re construct then by using the information attached. No incremental calculation has been discussed here thus it is impossible to deal with a large amount of information. Knowledge discovery in databases (KDD) is one of the hot research topics (see [Mannila, 1997]) It is nothing but knowledge acquisition activity but differs from using database as background (domain) knowledge. Our works show how to obtain new database schemes that are suitable for current database instances (see [Miura et al. 1996; 1998 ] 6 Conclusion In this work, we have proposed ....
Mannila, H. Methods and Problems in Data Mining. Intn'l Conf. on Database Theory (ICDT) (1997).
....MFCS only when a new infrequent itemset is discovered. In fact, MaxClique can be viewed as a special case of our Pincer Search algorithm, if we discard the inplementation details. 2. 5 Other Related Work General survey papers regarding data mining problems can be found in, e.g. FPS96] FPSU96] [M97] [PBKKS97] In addition to the algorithms discussed so far, there has been extensive research relating to the problem of association rule mining such as [BMS97] GKMT97] HCC92] HF95] MT96b] ORS98] S96] SA95b] SA96b] SVA97] T96a] and [KMRTB94] Similar candidate pruning ....
H. Mannila. Methods and problems in data mining (a tutorial). In Proc. of International Conference on Database Theory (ICDT), Jan. 1997.
....Keywords. Data mining, association rules, hypertext systems, trails 1 Introduction Data Mining is an active research field whose significance has recently been increasing on account of its role as a tool to cope with the explosive growth in the amount of data being stored [CY96, FPSS96, Man97, FPSM91] In fact, the capacity to store data has been increasing at a faster pace than the ability of the tools available to inspect and extract meaningful information from such large amounts of data. Researchers in the field of data mining and Knowledge Discovery are aiming to restrain this ....
Heikki Mannila. Methods and problems in data mining. In F. Afrati and P. Kolaitis, editors, Proc. International Conference on Database Theory (ICDT'97), pages 41--55, Delphi, Greece, January 1997.
....store of the Toysrus chain was within 2 dollars Sigma 50 cents throughout 1997. The advantage is that relations over scheme 6 are, on average, twelve times smaller than relations over scheme 4 (as there are twelve months in a year) The significance of condensed representations is discussed in [9]. 3.3 Data Cube Approximation A relation over scheme 3 can be represented as a three dimensional data cube that maps each triple of dom(item) Theta dom(day) Theta dom(store) to a member of dom(cent) The schema transformations discussed in subsections 3.1 and 3.2 are useful in OLAP. A relation ....
H. Mannila. Methods and problems in data mining. In Proceedings of International Conference on Database Theory, Delphi, Greece, 1997.
....to develop DBSbased techniques to meet the requirements of the KDD process. The successful development of such second generation systems should be based on data independence, optimisation of ad hoc queries and affords especially: An embedded query language extensible by user defined operations [18] which perform or support concret data mining algorithms [13,10] Special query optimisations (efficient cache management [12] metadata [10] which are adapted to the query patterns of a KDD process [15] Integration of previously computed query results into query processing [13] ....
H. Mannila. Methods and problems in data mining. In Proceedings of the International Conference on Database Theory. Springer-Verlag, 1997.
....a query flock is a query about its parameters. The result of the flock is not the result of the parametrized query that is used to help specify the flock. 2.1 Our Languages for Flocks The idea of expressing both a query form and a filter condition has been proposed before. For example, Mannila ([Man97]) talks about a logic in which both can be expressed. However, Mannila s formulation puts more in the filter, e.g. one of the items in a market basket must be beer, while for us the role of the filter is limited to a condition about the result of the query. We would simply eliminate one of the ....
H. Mannila, "Methods and problems in data mining," Proc. Intl. Conf. on Database Theory, 1997, pp. 41--55, Springer-Verlag.
....likely be knowledge that has been previously discovered. The output of the knowledge discovery process can be on various formats, depending on the nature of the analysed information. Examples of output formats are frequently occurring patterns, clusterings of database objects, or a set of rules (Mannila 1997). The rough set approach extracts knowledge in the form of rules. 2.3 Rough sets as a data mining tool As described in the previous section, data mining is the core step in the knowledge discovery process. In the data mining step, a tool for extracting rules or patterns from data is needed. ....
Mannila, H. (1997), `Methods and problems in data mining', Proceedings of International Conference on Database Theory, Delphi, Greece.
....molti settori applicativi. Naturalmente esistono numerosi aspetti del data mining non ancora del tutto chiariti e numerosi problemi aperti sia dal punto di vista della teoria generale del data mining sia per ci o che riguarda l efficienza computazionale degli algoritmi e delle architetture usate [25]. Nonostante ci o il data mining rappresenta un settore in grande sviluppo per l analisi e l estrazioni di dati in molte aree applicative. Infatti, gli analisti del META Group, una societ a statunitense specializzata nell analisi del mercato delle tecnologie dell informazione, prevedono che, ad ....
Mannila, H., Methods and Problems in Data Mining. Proc. Int. conf. on Database Theory, Delphi, Greece, Springer-Verlag, 1997.
....a query flock is a query about its parameters. The result of the flock is not the result of the parametrized query that is used to help specify the flock. 2.1 Our Languages for Flocks The idea of expressing both a query form and a filter condition has been proposed before. For example, Mannila ([Man97]) talks about a logic in which both can be expressed. However, Mannila s formulation puts more in the filter, e.g. one of the items in a market basket must be beer, while for us the role of the filter is limited to a condition about the result of the query. We would simply eliminate one of the ....
H. Mannila, "Methods and problems in data mining," Proc. Intl. Conf. on Database Theory, 1997, pp. 41--55, Springer-Verlag.
.... 2 L defines a potentially interesting property of r. Therefore, a mining task is to compute the theory of r with respect to L and q, i.e. the set Th(L; r; q) f 2 L j q(r; is trueg: A reasonnable collection of data mining tasks have already been carried out using this approach (see [7] for a survey) Example 1 Consider the discovery of dependencies that hold in a database. Assume L 1 is the language of inclusion dependencies and consider q 1 as the satisfaction predicate: let r and s be the relations corresponding to R and S and ffi = R[X] S[Y ] 2 L 1 , q 1 (r; ffi) is ....
H. Mannila. Methods and problems in data mining. In Proc. ICDT'97, SpringerVerlag LNCS 1186, pages 41--55, 1997.
....and weaknesses are well explored [26, 35] The process of finding useful patterns in data is nowadays referred to as knowledge discovery in databases (KDD) or data mining [6] Data mining is the process of applying specific algorithms for extracting patterns (models) from data. In the spirit of [7, 13] we regard clustering as one of the essential techniques during a data mining process in order to enable the discovery of useful data patterns. Due to the fact that the various documents comprising the text archive do not lend themselves to immediate analysis some pre processing with intellectual ....
H. Mannila. Methods and problems in data mining. In Int. Conference on Database Theory, Delphi, Greece, 1997.
....to userfriendly information access, the time consuming tasks of text classification and document interrelation can be performed in a highly automated form. Finally, we want to stress that this approach to classification fits nicely within the current framework of research in data mining [Fayyad96, Mannila97] Moreover, due to the fast training time this approach is highly promising in a time of huge collections of digital libraries available on the Internet, accessible via the World Wide Web. 5 Related Work Neural networks found some attention for encapsulation of legal knowledge. This might be due ....
: H. Mannila, Methods and Problems in Data Mining, Proc. of the Int. Conf. on Database Theory, Dephi, Greece, 1997
....prior to Codd s [8] introduction of the relational model: different data mining problems seem to have little to do with one another, approaches are generally ad hoc, there is no concise or precise means of specifying problems, and so forth. This situation has been observed elsewhere, for example [16, 20]. As with databases when all access was navigational, data mining semantics is often defined by implementation, in this case search. There is no distinction between what is being sought and how the search is being carried out. While some of this admittedly is the result of data mining s diversity, ....
.... Xi With this formulation and C; E pairs is given by C such that sCt j N y (s) N y (t) and E = X ] the desired result is immediate. Proposition 5.2 X i Y jZ iff C 4 E. Xi 5. 3 Association Rules Association rules (AR) have been gaining popularity in both data mining and databases, discussed in [4, 5, 21, 20]. As a concept, AR begins with an instance r over R[A] where each attribute A i 2 A has a boolean domain f ; Gammag. For W A, we write W to mean the set of tuples ftjt 2 r t:A i = for each A i 2 Wg. Without loss of generality, let X = fA 1 ; A k g and Y = fA k 1 ; Am g. ....
Mannila, H. Methods and problems in data mining. In (to appear) Proc. of International Conference on Database Theory, Delphi, Greece (January 1997), F. Afrati and P.Kolaitis, Eds., Springer-Verlag.
....are needed for general purpose query languages [8] A possible approach is to formulate a data mining task as locating interesting sentences from a given logic that are true in the database. Then the task of the user analyst can be viewed as querying this set, the so called theory of the database [12]. Discovering knowledge from data, the so called KDD process, contains several steps: understanding the domain, preparing the data set, discovering patterns, postprocessing of discovered patterns, and putting the results into use. This is a complex interactive and iterative process for which many ....
H. Mannila. Methods and problems in data mining. In ICDT'97, volume 1186 of LNCS, pages 41--55. Springer-Verlag, 1997.
....trueg: It is possible to consider generic algorithms to compute such theories following the popular learning as search paradigm. A reasonnable collection of data mining tasks (association rules, sequential patterns, data dependencies, etc) have already been carried out using this approach (see [12] for a survey) Example 2.1. Consider the discovery of dependencies that hold in a database. Assume L 1 is the language of inclusion dependencies and consider q 1 as the satisfaction predicate: let r and s be the instances of R and S, and ffi = R[X ] 4 Jean Francois Boulicaut A KDD framework ....
H. Mannila, Methods and problems in data mining, in: Proc. ICDT'97, Springer-Verlag LNCS 1186, 1997, pp. 41-55.
....needed for general purpose query languages [11] A possible approach is to formulate a data mining task as locating interesting sentences from a given logic that are true in the database. Then the task of the user analyst can be viewed as querying this set, the so called theory of the database [15]. Discovering knowledge from data, the so called KDD process, contains several steps: understanding the domain, preparing the data set, discovering patterns (i.e. computing a theory) postprocessing of discovered patterns, and putting the results into use. This is a complex interactive and ....
H. Mannila. Methods and problems in data mining. In ICDT'97, volume 1186 of LNCS, pages 41--55. Springer-Verlag, 1997.
....of the language is interesting. This definition is quite general: asserting q(r; might mean that is a property that holds, that almost holds, or that defines (in some way) an interesting subgroup of r. This approach has been more or less explicitely used for various data mining tasks (see [12] for a survey and [13] for a detailed study of this setting) Discovering knowledge from data can be seen as a process containing several steps: understanding the domain, preparing the data set, discovering patterns, To appear in Proceedings of the second European Symp. on Principles of Data ....
H. Mannila. Methods and problems in data mining. In ICDT'97, volume 1186 of LNCS, pages 41--55. Springer-Verlag, 1997.
No context found.
MANNILA, H., 1997, "Methods and problems in data mining", In: Proceedings of the 6th International Conference on Database Theory (ICDT'97), LNCS - Lecture Notes in Computer Science, v. 1186, Springer-Verlag, pp. 41-55, Delphi, Greece, Jan.
No context found.
H. Mannila, Methods and problems in data mining, Proc. International Conference on Database Theory, Delphi, Greece, Springer-Verlag, 1997.
No context found.
H. Mannila, "Methods and problems in data mining," In Proceedings of the International Conference on Database Theory, Delphi, Greece, January 1997. Springer-Verlag.
No context found.
H. Mannila, Methods and problems in data mining, in: Proceedings of the International Conference on Database Theory, 1997, pp. 41--55.
No context found.
Heikki Mannila. Methods and problems in data mining. In Foto N. Afrati and Phokion Kolaitis, editors, Database Theory---ICDT'97, 6th International Conference, volume 1186 of Lecture Notes in Computer Science, pages 41--55. Springer, Delphi, Greece, 8--10 January 1997.
No context found.
H. Mannila, Methods and problems in data mining, in Proceedings of the International Conference on Database Theory, pp 41--55, 1997.
No context found.
Mannila, H. Methods and problems in data mining. In Proceedings of the International Conference on Database Theory, Delphi, Greece, January 8-10, 1997, pp. 41-45.
No context found.
Mannila, Heikki (1997). Methods and Problems in Data Mining, Proc. International Conference on Database Theory, Delphi, Greece, January 1997.
No context found.
Mannila, Heikki. "Methods and Problems in Data Mining." Proceedings of the ICDT, pg. 83-99, January 1997.
No context found.
H. Mannila. Methods and problems in data mining. In Proceedings of International Conference on Database Theory. Springer Verlag, 1997.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC