| K.C.C. Chan and A. K. C. Wong, "A statistical technique for extracting classificatory knowledge from databases," in G. Piatetsky-Shapirc and W. I. Frawley, eds., Knowledge Discovery in Databases; Menlo Park, CA: AAAI/MIT, 1991, pp. 107-124. |
....version space for the same concepts should be p: 20] The search space for attribute oriented generalization is much smaller than the one for tuple oriented generalization. Similar arguments hold for the attribute oriented method in comparison with other tuple oriented approaches [13] 14] An attribute oriented generalization requires the testing of redundant tuples in processing, which is performed after the generalization of all the values on each attribute. In contrast, a tuple oriented approach requires the testing for concept coverage, which should be performed after each ....
....from the learned quantitative rules by pruning the tuples which are below the specified pruning thresholds. Moreover, rules relevant to a subset of the previously studied set of attributes can often be extracted directly from the previously learned rules. We examine one such example. Example 4: Suppose the learning task is to characterize the professors in Applied Sciences relevant to the attributes Age and Salary only. The learning task is almost the same as the task posed in Example 1, except that it is relevant to a subset of previously studied set of attributes. Instead of ....
K.C.C. Chan and A. K. C. Wong, "A statistical technique for extracting classificatory knowledge from databases," in G. Piatetsky-Shapirc and W. I. Frawley, eds., Knowledge Discovery in Databases; Menlo Park, CA: AAAI/MIT, 1991, pp. 107-124.
....applied to a number of real world datasets, to see if they offer improved efficiency or usefulness compared to visualisation without any form of data management. Two of the algorithms are based on decision trees (Agrawal et al. 1992; Quinlan, 1986) one algorithm is based on statistical tables (Chan and Wong, 1991), and one algorithm is based on rough sets (Ziarko, 1991) All four algorithms build their classification rules from a user supplied training set. The decision tree algorithms use chi squared tests to maximize information gain at each level in the tree; leaves in the tree hold a single ....
Chan, K. C. C. and Wong, A. K. C. (1991). A statistical technique for extracting classificatory knowledge from databases. In Piatetsky-Shapiro, G. and Frawley, W. J., editors, Knowledge Discovery in Databases, pages 107--123. AAAI Press/MIT Press, Menlo Park, California.
....in r 0 is generalized to or just below its attribute generalization threshold. Exceptional data often occur in a large relation. It is important to consider exceptional cases when learning in databases. Statistical information helps learning from examples to handle exceptions and or noisy data [15, 18]. A special attribute, count, can be added to each generalized relation to register the number of tuples in the original relation which are generalized to the current tuple in the generalized relation. The attribute count carries database statistics and supports the pruning of scattered data and ....
....among classes) Since relational operations are set oriented and have been implemented efficiently in many existing systems, our approach is not only efficient but easily exported to many relational systems. Our approach has absorbed many advanced features of recently developed learning algorithms [27, 18]. As shown in our study, attribute oriented induction can learn disjunctive rules and handle exceptional cases elegantly by incorporating statistical techniques in the learning process. Moreover, incremental learning has been developed in many learning algorithms [15, 28] When a new tuple is ....
K. C. C. Chan and A. K. C. Wong. A statistical technique for extracting classificatory knowledge from databases. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 107--124. AAAI/MIT Press, 1991.
....the correlation between the decision classes and the other features, the importance score is used to heuristically verify the relation between them. In some cases, the Chi square test was not of help in discovering such a good correlation. More details on the Chi square method can be found in (Chan, 1991; Imam, 1993) The performance of a feature subset and the predictive accuracy of the AQ produced classification rules are measured by applying a fitness function which will be described in later sections. The AQ algorithm The AQ algorithm is a rule induction technique used to produce a complete ....
Chan, K. C., and Wong, A. K., "A Statistical Technique for Extracting Classificatory Knowledge from Databases", Knowledge Discovery In Databases, Piatetsky-Shapiro, G., Frawley, W., (Eds.), AAAI Press, 1991.
....defined. Section V harmonizes the discovered rules with the existing rules. Finally, the contributions of this paper and future work are described in Section VI. II. Related Work Until recently, the only methodology available about reasoning from databases has been based on statistical meth1 ods [2,20] For example, Smyth and Goodman [20] have introduced the form of the probabilistic rule IF Y = y then X = x with probability p, in which the probability p(x j y) is added to the rule. However, statistical or probabilistic methods are not always efficient for evolutionary systems because of ....
....a notion of justifying rules to suggest the refinements of the rules. Piatetsky Shapiro [16] has discussed the expected accuracy of the discovered rules by using a statistical function about the number of tuples. There is a significant difference between our approach and the previous research [2,3,6,9,11,20,21] The previous research has concentrated on how artificial intelligence can help in knowledge discovery, without considering the characteristics of databases. As shown in the diagonal vector (2 ) of the following diagram, logically, if new rules X are discovered from a database S (i.e. S X ) ....
K. C. Chan and A. K. Wong. A statistical technique for extracting classificatory knowledge from databases. In Gregory Piatetsky-Shapiro and William J. Frawley, editors, Knowledge Discovery in Databases, pages 107--124. MIT Press, 1991.
.... such as database systems, machine learning, intelligent information systems, statistics, data warehousing and knowledge acquisition in expert systems [4] It may be noted that data mining is different from the goals and emphases of the individual fields, though it may heavily use their results [5, 3, 6, 7, 8]. In the following we present the basic differences (and or similarities) between a data mining problem and research interests of the various allied fields. In developing database systems to manage uncertain (or imprecise) information as well as certain (or precise) information, several extensions ....
.... The last few years have seen an increasing use of techniques in data mining that draw upon or are based on statistics; namely, in feature selection [12] data dependency involving two variables for constructing data dependency networks [13, 14] classification of objects based on descriptions [7], discretization of continuous values [13, 15] data summarization [14] predicting missing values [16] etc. The motivation behind this trend can be explained by the fact that statistical techniques for data analysis are well developed and in some cases, we do not have any other means to apply. ....
[Article contains additional citation context not shown here]
K. C. C. Chan and A. K. C. Wong, "A statistical technique for extracting classificatory knowledge from databases," in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), pp. 107--123, Cambridge, MA: AAAI/MIT, 1991.
....tuple (cat, 4) would pass through the interval containing cat (in the root node) then through the interval containing 4 (in the child node) and be assigned a classification value of medium. 7. 5 Statistical Tables Chan and Wong describe a data classification technique based on statistical theory [Chan and Wong, 1991]. The user provides an initial training set, with each tuple classified using one of P possible values fc 1 ; c P g. Each tuple consists of N attributes A 1 ; AN , where each domain(A i ) fa i;j j j = 1; n i g has n i possible values. The algorithm associates an attribute ....
Chan, K. C. C. and Wong, A. K. C. (1991). A statistical technique for extracting classificatory knowledge from databases. In Knowledge Discovery in Databases, PiatetskyShapiro, G. and Frawley, W. J., Eds., 107--123. AAAI Press/MIT Press, Menlo Park, California.
.... [35] Spatial data mining, or knowledge discovery in spatial database, refers to the extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial databases [34] Previous works in machine learning [17, 38, 39] database systems [50, 51] and statistics [9, 19, 31, 47] laid the foundation for research into knowledge discovery in databases. Also, advances in spatial databases, such as spatial data structures [22, 23, 46] spatial reasoning [10, 12] computational geometry [43] etc. paved the way for the study of spatial data mining. A crucial challenge to ....
D. K. Y. Chiu, A. K. C. Wong, and B. Cheung. A Statistical Technique for Extracting Classificatory Knowledge from Databases. In Piatetsky-Shapiro and Frawley [43], pp. 125--141.
....data warehousing and knowledge acquisition in expert systems [4] It may be noted that data mining is a distinct descipline and its objectives are different from the goals and emphases of the individual fields. Data mining may, however, heavily use theories and developments of these fields [5, 3, 6, 7, 8]. In the following we present basic differences (and or similarities) between data mining and various allied research areas. In developing database systems to manage uncertain (or imprecise) information as well as certain (or precise) information, several extensions to relational model have been ....
.... The last few years have seen an increasing use of techniques in data mining that draw upon or are based on statistics; namely, in feature selection [12] data dependency involving two variables for constructing data dependency networks [13, 14] classification of objects based on descriptions [7], discretization of continuous values [13, 15] data summarization [14] predicting missing values [16] etc. The motivation behind this trend can be explained by the fact that statistical techniques for data analysis are well developed and in some cases, we do not have any other means to apply. ....
[Article contains additional citation context not shown here]
K. C. C. Chan and A. K. C. Wong, "A statistical technique for extracting classificatory knowledge from databases," in Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), pp. 107--123, Cambridge, MA: AAAI/MIT, 1991.
....of objects according to certain criterion. A commonly accepted object clustering criterion is the principle of conceptual clustering [18] clustering a set of objects in an attempt to maximize intraclass similarity and inter class differences. Data clustering has been studied in statistics [11], machine learning [13, 18, 19] and databases [42, 15] with different methods and different emphases. Previous approaches, probability based (like most approaches in machine learning) or distance based (like many methods in statistics) do not adequately consider the cases the data sets can be ....
K. C. C. Chan and A. K. C. Wong. A statistical technique for extracting classificatory knowledge from databases. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 107--124. AAAI/MIT Press, 1991.
....graphs) but can produce perceptually compelling iconic style pictures, and the developers of EXVIS argue for its integration into knowledge discovery systems and discuss barriers to such integration [21] 6.2.3. Related Research There is a growing body of work in the area of knowledge discovery [2, 7, 16, 17, 20, 22, 24, 25, 29, 30, 34]. While this work shares our goal of extracting information from large databases, most of it has emphasized the data mining approach. This work uses either statistical methods or statistically oriented machine learning algorithms to extract dependencies or correlations from data. The kinds of ....
....mining approach. This work uses either statistical methods or statistically oriented machine learning algorithms to extract dependencies or correlations from data. The kinds of knowledge that are sought include unobserved functional relationships from empirical data [34] classification knowledge [16], and probabilistic rules that can be used for describing the data and predicting characteristics of new data based on discovered dependencies. The underlying assumption is that the data includes hidden dependencies that can be discovered even though the data is noisy . For example, the Knowledge ....
Chan, K. C. C., and Wong, A. K. C., A Statistical Technique for Extracting Classification Knowledge from Databases, in [30].
....test our hypothesis, we implemented four existing techniques, then tested them to see if they offered improved efficiency or usefulness compared to visualization without any form of data management. We chose two algorithms based on decision trees [1, 11] one algorithm based on statistical tables [2], and one algorithm based on rough sets [24] All four data mining algorithms build their classification rules from a user supplied training set. The decision tree algorithms begin by identifying significant attributes using chi squared tests. The attribute that provides the largest information ....
CHAN, K. C. C., AND WONG, A. K. C. A statistical technique for extracting classificatory knowledge from databases. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. J. Frawley, Eds. AAAI Press/MIT Press, Menlo Park, California, 1991, pp. 107-- 123.
....and to select the best generalized rules by domain experts and or users. Exceptional data often occur in a large relation. It is important to consider exceptional cases when learning in databases. Statistical information helps learningfrom examples to handle exceptions and or noisy data [3, 15]. A special attribute, vote, can be added to each generalized relation to register the number of tuples in the original relation which are generalized to the current tuple in the generalized relation. The attribute vote carries database statistics and supports the pruning of scattered data and the ....
....classes) Since relational operations are set oriented and have been implemented efficiently in many existing systems, our approach is not only efficient but easily exported to many relational systems. Our approach has absorbed many advanced features of recently developed learning algorithms [3, 13]. As shown in our study, attribute oriented induction can learn disjuctive rules and handle exceptional cases elegantly by incorporating statistical techniques in the learning process. Moreover, when a new tuple is inserted into a database relation, rather than restarting the learning process from ....
K. C. C. Chan and A. K. C. Wong, A Statistical Technique for Extracting Classificatory Knowledge from Databases, in G. Piatetsky-Shapiro and W. J. Frawley (eds.), Knowledge Discovery in Databases, AAAI/MIT Press, 1991, 107-124.
.... and stored for efficient data retrieval and knowledge discovery [8, 11, 14] In order to represent general characteristics at a high concept level, attribute concept hierarchies should be provided by domain experts or constructed automatically or semiautomatically by data statistical analysis [15]. In our algorithms, an attribute hierarchy is represented by a function c parent(attri val ) which returns a parent (high level) concept for a given attribute value. A spatial hierarchy may be represented by two functions, s parent(obj ) which returns the parent node of the object obj , and ....
A. K. C. Wong, A Statistical Technique for Extracting Classificatory Knowledge from Databases, in G. Piatetsky-Shapiro and W. J. Frawley (eds.), Knowledge Discovery in Databases, AAAI/MIT Press, 1991, 107-124.
....in spatial DB adopts a learning from examples approach which treats the task relevant data as examples for learning processes and relies mainly on the generalization process. There have been many studies on machine learning [5, 6] and some recent studies on knowledge discovery in large databases [3, 7, 9, 10, 12, 16]. These studies set up the foundation for knowledge discovery in spatial databases. Recently, an attribute oriented approach has been developed for discovery of different kinds of knowledge rules in relational databases [9] Moreover, a multi resolution relational data model has been developed ....
K. C. C. Chan and A. K. C. Wong, A Statistical Technique for Extracting Classificatory Knowledge from Databases, in G. Piatetsky-Shapiro and W. J. Frawley (eds.), Knowledge Discovery in Databases, AAAI/MIT Press, 1991, 107-124.
No context found.
K.C.C. Chan and A.K.C. Wong, "A Statistical Technique for Extracting Classificatory Knowledge from Databases," in G. Piatetsky-Shapiro and W.J. Frawley (Eds.), Knowledge Discovery in Databases, Menlo Park, CA; Cambridge, MA: AAAI/MIT Press, 1991, pp. 107-123.
No context found.
K. C. C. Chan, and A. K. C. Wong, "A Statistical Technique for Extracting Classificatory Knowledge from Databases", in G. Piatetsky-Shapiro, and W. J. Frawley (Eds.), Knowledge Discovery in Databases, AAAI/MIT Press, 1991, pp. 107-123.
....also have X. If both support and confidence is greater than the user supplied threshold, the association is considered interesting. A weakness of these approaches lies in the difficulty in deciding what these thresholds should be. To overcome this problem, F APACS utilizes adjusted difference [3 5] analysis to identify interesting associations among attributes. Unlike other data mining algorithms (e.g. 1 2, 10, 15 16] the use of this technique has the advantage that it does not require any user supplied thresholds which are often hard to determine. Furthermore, F APACS also has the ....
....categorical attributes. 3 F APACS for Mining Fuzzy Association Rules In this section, we describe a novel algorithm, called F APACS, which makes use of linguistic terms to represent the regularities and exceptions discovered from databases. Furthermore, F APACS also employs adjusted difference [3 5] analysis to identify interesting associations among attributes. The definition of linguistic terms is presented in Section 3.1. An overview of F APACS is then given in Section 3.2. After that, we describe how interesting associations can be identified in Section 3.3. A confidence measure, called ....
[Article contains additional citation context not shown here]
K.C.C. Chan, and A.K.C. Wong, "A Statistical Technique for Extracting Classificatory Knowledge from Databases," in [14], pp. 107-123.
....on fuzzy set theory and hence we call the rules having these terms fuzzy association rules. Unlike other data mining algorithms (e.g. 1, 11] which utilize user supplied thresholds to identify interesting associations, FARM employs an objective interestingness measure, called adjusted difference [2 5], to mine fuzzy association rules. The use of this technique has the advantage that it does not require any user supplied thresholds which are often hard to determine. Furthermore, FARM also has the advantage that it allows us to discover both positive and negative association rules. A positive ....
....( min( 2 2 2 1 1 1 m m r m v r v r v r v L v L v L I d I d I d d l = L 3. 2 Identification of Interesting Associations In order to decide whether the association between a linguistic term, L jk , and another linguistic term, L pq , is interesting, we employ the adjusted difference [2 5] which is defined as k pq k pq j j g L L L L = 1) where j L L is the standardized difference [2 5] given by k pq k pq k pq k pq e e deg j j = 2) j L L is the sum of degrees to which records are expected to be characterized by L pq and L jk and is calculated by ### # # ....
[Article contains additional citation context not shown here]
K.C.C. Chan and A.K.C. Wong, "A Statistical Technique for Extracting Classificatory Knowledge from Databases," in [10], pp. 107-123.
....also have X. If both support and confidence is greater than the user supplied threshold, the association is considered interesting. A weakness of these approaches lies in the difficulty in deciding what these thresholds should be. To overcome this problem, F APACS utilizes adjusted difference [2 5] analysis to identify interesting associations among attributes. The use of this technique has the advantage that it does not require any usersupplied thresholds which are often hard to determine. Furthermore, F APACS also has the advantage that it allows us to discover both positive and negative ....
....rules. A positive association rule tells us that a record having certain characteristic will also have another characteristic whereas a negative association rule tells us that a record having certain characteristic will not have another characteristic. Many data mining algorithms (e.g. [1 2, 4 5, 10 12]) require the class labels (conclusions of rules) to be crisp and the variables representing the class labels are therefore qualitative. This makes quantitative values are not inferred from those rules. To be more effective, F APACS is able to deal with class boundaries that are fuzzy and to ....
[Article contains additional citation context not shown here]
K.C.C. Chan, and A.K.C. Wong, "A Statistical Technique for Extracting Classificatory Knowledge from Databases," in [9], pp. 107-123.
No context found.
Chiu, D. K. Y., A. K. C. Wong, and B. Cheung. A Statistical Technique for Extracting Classificatory Knowledge from Databases. In Piatetsky-Shapiro and Frawley, pp. 125141. Daly, P. and P. Misra. "GPS Global Navigation Satellite System (GLONASS)." Chapter 9 in Global Positioning System: Theory and application (The Blue Book) 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC