| V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):926--938, 1993. |
....C4.5 is also discussed in [4] In a search for appropriate classification algorithm, efficiency is important. To increase the efficiency, some feature selection and data abstraction methods can be tried, by excluding some items from the data set in the beginning of the process such as described in [4, 5]. Acknowledgments We thank Be gendik Corp. for providing us their basket data that belongs to one of their supermarkets located in Ankara. ....
V.Dhar, A.Tuzhilin, Abstract Driven Pattern Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No.6, December 1993.
....to it as a measure of coverage of the concept E. 3.2 One way support The absolute support AS(H is the other standard measure used for mining association rules [1] called confidence of the rule E H . Di#erent names were given to this measure, including the accuracy [13, 29] strength [8, 15, 26], and certainty factor [15] In the context of information retrieval, the same measure is referred to as the measure of precision [32] Tsumoto and Tanaka [29] used the quantity AS(E H) for measuring the coverage or true positive rate. It is regarded as a measure of sensitivity by Klosgen [15] ....
Fhar, V. and Tuzhilin, A. Abstract-driven pattern discovery in databases, IEEE Transactions on Knowledge and Data Engineering, 5, 926-938, 1993.
....between these sets of attributes. The interestingness or usefulness of the rule is usually measured by some predefined metric function such as confidence and support [2] gain [9] chi squared value [4] gini [22] entropy gain [23, 22] laplace [6, 32] lift [16] interest [5] strength [8], and conviction [5] Several proposals for mining different types of rules according to different types of pre specified interest metrics have been suggested in the literature. The suggested techniques are fully automatic but need to have predefined tasks. The ground work of formalizing the ....
V. Dhar and A. Tuzhilin. Abstract-driven patterns discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993.
....surprising a rule, the more interesting it is for the user. This measure is also called surprisingness. the statistics associated with them. Objective measures include J measure [Smyth Goodman 1991, Wang, Tay Liu 1998] certainty [Hong Mao 1991] RI [Piatetsky Shapiro 1991] and strength [Dhar Tuzhilin 1993]. More recent measures of objective interestingness include R interestingness [Srikant Agrawal 1995] intensity of implication [Suzuki Kodratoff 1998, Guillaume, Guillet Philipp 1998] discrimination [Gray Orlowska 1998] Kamber and Shinghal [1996] propose specific measures of rule ....
Dhar V., and Tuzhilin A. (1993). Abstract-driven pattern discovery in databases, in IEEE Transactions on Knowledge and Data Engineering, 5(6).
....every 100 seconds. 75 5.4.3 Biological Pattern Discovery In this subsection, we present the results of our experiment to apply PLinda to a data mining application. We parallelized a sequential biological pattern discovery program[63] using PLinda. These types of data mining applications[1, 23, 35] are interesting because they are compute intensive and they are usually coarse grain parallel problems. First, we explain the problem and the sequential approach. We then describe our parallel approach and show the performance results. Biological pattern discovery is the problem of finding ....
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):926--938, December 1993.
....other hand, objective measures of rule interestingness are based on the structure of the rules and the statistics associated with them. Objective measures include J measure (Smyth and Goodman, 1991, Wang, Tay and Liu, 1998) certainty (Hong and Mao, 1991) RI (Piatetsky Shapiro, 1991) and strength (Dhar and Tuzhilin, 1993). More recent measures of objective interestingness include R interestingness (Srikant and Agrawal, 1995) intensity of implication (Suzuki and Kodratoff, 1998; Guillaume, Guillet and Philipp, 1998) discrimination (Gray and Orlowska, 1998) Kamber and Shinghal (1996) propose specific measures of ....
Dhar V., and Tuzhilin A. (1993). Abstract-driven pattern discovery in databases, in IEEE Transactions on Knowledge and Data Engineering, 5(6).
.... in Data is used for investigating the behaviour of the legacy IS in terms of its operational data and the way that such data is presently being used within the chosen business processes; it can also be used for identifying behavioural patterns which may give rise to new business processes [Dhar and Tuzhilin 1993]. The approach advocated in this paper is underpinned by three key activities: 1. Modelling of enterprise objectives, rules, and processes for describing both the AS IS and the TO BE situations. 2. Analysis of legacy IS for discovering the actual behaviour of the system against a set of ....
Dhar, V. and Tuzhilin, A. (1993) Abstract-Driven Pattern Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, December, 1993, pp. 926-938.
....if its support is at least minsup. A rule is said to have a large improvement if its improvement is at least minimp. Other measures of predictive ability that are sometimes used to rank and filter rules in place of confidence include lift [9,15] which is also known as interest [10] and strength [13]) and conviction [10] Below we show that these values can each be expressed as a function of the rule s confidence and the frequency of the consequent; further, note that both functions are monotone in confidence: Though we frame the remainder of this work in terms of confidence alone, it can be ....
Dhar, V. and Tuzhilin, A. 1993. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6).
....to select a small subset of records sharing some common characteristics. For example, the predicate HIGH CREDIT(X) could be used to select the top N of customer records based on their credit ratings. A novel approach to field and record based focusing is in the use of abstracts as described in [Dhar and Tuzhilin, 1993] in this issue. 3.5. Extracting Patterns The term pattern refers to any relation among elements of a database, i.e. the records, fields, and values. Simple examples of patterns include: ADMISSION DATE x RELEASE DATE x ; if REGION x = west then SALES x average(SALES) When such patterns ....
Vasant Dhar and Alexander Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, to appear, 1993.
....obtained by filtering the interesting attributes and by performing a generalization of the attribute values so that two different tuples can be represented by a single summarized tuple. A measure of the rule strength has been given. This strength is representative of the rule generalization power. [4] adds the possibility to aggregate two attributes in order to obtain rules which more fit the user s needs. Some operational OLAP (On Line Analytical Processing) 12] for example, have been developed as an upper layer for relational databases. Their aim is to analyse multivariable data; and their ....
V. Dhar, A. Tuzhilin, Abstract-Driven Pattern Discovery in Databases, IEEE transactions on knowledge and data engineering, pp 926-938, Vol.5, N.6, 1993.
....graphs) but can produce perceptually compelling iconic style pictures, and the developers of EXVIS argue for its integration into knowledge discovery systems and discuss barriers to such integration [21] 6.2.3. Related Research There is a growing body of work in the area of knowledge discovery [2, 7, 16, 17, 20, 22, 24, 25, 29, 30, 34]. While this work shares our goal of extracting information from large databases, most of it has emphasized the data mining approach. This work uses either statistical methods or statistically oriented machine learning algorithms to extract dependencies or correlations from data. The kinds of ....
....where the interest and significance of classes is determined by the user. In addition, there is no emphasis on data viewing or work reuse. However, we consider the domain modelling stage of our work to play the role of integrating multiple database schemas, one goal of this work. Dhar and Tuzhilin [17] take a similar approach to ours in forming new classes. They use a separate data dictionary for users to define conceptual views ; these correspond to new concept definitions in IMACS. Their language for forming these views is also customized by the user with user predicates, built up from ....
[Article contains additional citation context not shown here]
Dhar, V. and Tuzhilin, A., Abstract-Driven Pattern Discovery in Databases, Center for Research on Information Systems, NYU, Working Paper Series STERM IS-92-11, March, 1992.
....is abstraction. This method assumes the existence of additional knowledge in the form of classification hierarchies on the attributes of the data. Algorithms for abstraction are described by Han et al. [HCC93] and a data mining system employing abstraction methods is described by Dhar and Tuzhilin [DT93]. The following is an abstraction example described in their paper. Example: Abstraction of Credit Data This example concerns a database for credit card information having the following schema: customer(Name; Addr; Income;Profession; Age; Card type; Marital status) transaction(Name; Merchant; ....
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):926--938, December 1993.
.... rules which associate a set of generalized attribute properties in a logic implication rule by integration of attribute oriented induction and the methods for mining association rules [7, 39, 66, 78] Moreover, statistical pattern discovery can also be performed using attribute oriented induction [24]. The core of the attribute oriented induction technique is on line data generalization which is performed by first examining the data distribution for each attribute in the set of relevant data, calculating the corresponding abstraction level that data in each attribute should be generalized to, ....
....in the schema item(id; name; category; producer; date made; cost; price) fcategory, producer, date madeg ae fcategory, date madeg indicates the former forms a lower level concept than the latter. Moreover, rules and view definitions can also be used as the definitions of concept hierarchies [24]. Conceptual hierarchies for numerical or ordered attributes can be generated automatically based on the analysis of data distributions in the set of relevant data [38] Moreover, a given hierarchy may not be best suited for a particular data mining task. Therefore, such hierarchies should be ....
V. Dhar and A. Tuzhilin. Abstract-Driven Pattern Discovery in Databases. IEEE Transactions on Knowledge and Data Engineering, pages 926--938, December 1993.
....subjective measures. Objective measures typically involve analyzing the discovered patterns structures, their predictive performances, and their statistical significance [4, 8, 12] Examples of objective measures are: coverage, certainty factor, strength, statistical significance and simplicity [3, 6, 8, 11]. It has been noted in [11] however, that objective measures are insufficient for determining the interestingness of the discovered patterns. Subjective measures are needed. Two main subjective measures are: unexpectedness, and actionability [4, 11] 8] defined pattern interestingness in terms ....
V. Dhar and A Tuzhilin, Abstract-driven pattern discovery in databases. IEEE Trans. Knowl. Data Eng. 5(6), 1993.
....much research done in recent years quantifying interestingness of a rule, and several metrics have been proposed and used as a result of this work. 124 Among objective metrics, besides confidence and support [8] there are gain [36] variance and chi squared value [59] gini [58] strength [27], conviction [20] sc and pc optimality [13] etc. Subjective metrics include unexpectedness [73, 53, 79, 60] and actionability [66, 73, 1] Any of these metrics can be used as a part of the interestingness based filter ing tool, and the validation system can support different interestingness ....
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993.
....done in recent years quantifying interestingness of a rule, and several metrics have been proposed and used as a result of this work. Among objective metrics, besides confidence and support [AIS93] there are gain [FMMT96] variance and chisquared value [Mor98] gini [MFM 98] strength [DT93] conviction [BMUT97] sc and pcoptimality [BA99] etc. Subjective metrics include unexpectedness [ST96b, LH96, Suz97, PT98] and actionability [PSM94, ST96b, AT97] Any of these metrics can be used as a part of the interestingness based filtering operator, and the validation system can ....
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993.
....the discovery process and help focus on the discovery of unexpected patterns. Second, user defined beliefs are crucial for the discovery process in some applications, such as Weblog applications. In these applications, important patterns are often expressed in terms of the user defined vocabulary [DT93] and beliefs provide the means for identifying this vocabulary and driving the discovery processes. As explained in the introduction, we do not describe how to generate an initial system of beliefs. To generate such beliefs, we use the methods described in [ST96b] However, more work needs to be ....
Dhar, V., and Tuzhilin, A., 1993. Abstract-Driven Pattern Discovery in Databases. IEEE Transactions on Knowledge and Data Engineering, December 1993.
....of this paper, it does not matter how the KDS works and what the structure of the discovered patterns is. Therefore, we will treat the KDS as a black box and assume that a pattern is an arbitrary first order sentence expressed in terms of the schema of the database or the user defined vocabulary [2]. It has been recognized in the knowledge discovery literature that a discovery system can generate a glut of patterns, most of which are of no interest to This work was supported in part by the NSF under Grant IRI 93 18773. 1 This is a simplified version of Figure 1 1 from [3] To avoid ....
....A B is usually defined as a function of p(A) p(B) and p(A B) where p(ff) is the probability that condition ff is true. Typical examples of such objective measure of interestingness of a rule are its information content based on the J measure [10] a certainty factor [4] and a strength [2]. It has been noted in [8] that objective measures of interestingness, although useful in many respects, usually do not capture all the complexities of the pattern discovery process, and that subjective measures of interestingness are needed to define interestingness of a pattern. These ....
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993.
....an interactive process between the end user and a KDD system. Then the role of the end user is to analyze the patterns discovered by the system and provide the feedback on where the KDD system should focus the search for new patterns. For example, in the abstract driven discovery framework [DT93] the end user should interactively provide information about the regions of the search space on which the discovery system should concentrate. This concept of interactive communication between the end user and the discovery search engine is graphically presented in Figure 2. The end user ....
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993.
....generate a large number of strong rules that are interesting objectively but of little interest to the user. To address this problem, in [8] the authors propose a system of templates, that are rules expressed not in terms of the attributes of the data but in terms of the user defined vocabulary [4] that is defined in terms of the data attributes. Then, a pattern (rule) is interesting, if it matches a restrictive template [8] In their work, Klemettinen et al. bring the users into the discovery process by letting them specify the templates. However, 8] does not address the question of what ....
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993.
No context found.
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):926--938, 1993.
No context found.
Dhar, V., Tuzhilin, A.: Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering 5 (1993)
No context found.
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):926--938, 1993.
No context found.
V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):926--938, 1993.
No context found.
V.Phar and A. Tuzhilin, Abstract-Driven Pattern Discovery in Databases, IEEE Trans- actions on Knowledge and Data Engineering, Vol.5, No.6. December 1993
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC