| Matheus, C. J.; Chan, P. K.; and Piatetsky-Shapiro, G. 1993. Systems for knowledge discovery in databases. IEEE Transactions on Knowledge and Data Engineering 5(6):903--913. Special Issue on Learning and Discovery in Knowledge-Based Databases. |
....in traditional approaches like OLS, using methods of dummy variables and orthogonal polynomials. 17] The proposed GA based decile maximization approach is equally applicable with more sophisticated representations. Rule based representations capable of discerning complex patterns in the data [3, 15, 23] can be particularly advantageous. Genetic algorithms have been successfully applied in learning rules comprising of logical combinations of attribute value restrictions. 7, 12 14] Such nonlinear representations can be expected to exhibit superior performance across file depths, and models ....
C.J. MATHEUS, P.K. CHAN, and G. PIATETSKY-SHAPIRO, 1993. Systems for Knowledge Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering 5, 903--913.
....applications to satisfy complex mining requirements. Psaila [17] uses operators to execute the KDD process in AMORE system exhibiting tight coupling between KDD process and SQL based database systems. Architectures for several KDD systems have been reported by researchers. Matheus et al. in [16], present a model of an idealized KDD system and describe the way its components handle the requirements for knowledge discovery in real life applications. DBMiner system tightly integrates On Line Analytical Processing (OLAP) with wide spectrum of data mining functions [11] Mineset, ....
C. J. Matheus, P. K. Chan, and G. Piatetsky-Sahpiro. System for Knowledge Discovery in Databases. IEEE Trans. on Knowledge and Data Engneering, 5(6), Dec 1993.
....rapid access to huge amounts of heterogeneous information in a distributed environment without any relocation, restructuring, or reformatting of data. Many researchers have investigated the use of metadata to support run time access to the original information [1,3,8,9,11, 12,13,19,26] Others [5,11,21,27] have investigated the use of data mining for the automatic extraction of metadata. We refine and synthesize some of the ideas contained in these efforts to provide advanced search and browsing capabilities without Bell Communications Research, 444 Hoes Lane, Piscataway, NJ 08854 LSDIS, ....
C.J. Matheus. P.K. Chan, and G. Piatetsky-Shapiro, "Systems for Knowledge Discovery in Databases", IEEE Trans. on Knowledge and Data Eng., Dec. 1993.
....interest in data mining. To perform data mining on these huge databases, the data mining algorithms must be scalable and ecient. The running time of the algorithms must be predictable and acceptable in very large databases. Sampling and focusing are some of the solutions to over come this problem [5]. There are several di erent data mining problems, based on di erent kinds of knowledge that we can mine from databases [5] These problems include mining of association rules, classi cation rules, clustering, similarity search, mining of path traversal patterns etc. This paper describes design ....
....The running time of the algorithms must be predictable and acceptable in very large databases. Sampling and focusing are some of the solutions to over come this problem [5] There are several di erent data mining problems, based on di erent kinds of knowledge that we can mine from databases [5]. These problems include mining of association rules, classi cation rules, clustering, similarity search, mining of path traversal patterns etc. This paper describes design of a Decision Tree Classi er which uses dynamic pruning and is scalable to a large training set. To evaluate splits, gini ....
[Article contains additional citation context not shown here]
P. K. C. Christoppher J Matheus and G. PiatetskySahppiro. System for knowledge discovery in databases. IEEE TKDE, 5(6), Dec 1993.
....process. In actual practice it may contain loop between any two steps. A set of integrated procedures, processes and possibly human experts, involved in the process of KDD constitute a KDD System. Chen et. al [9] throw light on the requirements and the goals of a KDD system. Matheus et. al [33] present a model of a KDD system which handles all the practical problems of the real world databases viz. noisy, incomplete, redundant and even sparse information. The di erent components of the system are based on certain basic functions easily identi able in all knowledge discovery tasks. ....
C. J. Matheus, P. K. Chan, and G. Piatetsky-Sahppiro. System for Knowledge Discovery in Databases. IEEE TKDE, 5(6), Dec 1993.
....of a KDD system. 3.4. To show discovered knowledge in an appropriate way The system must present the discovered knowledge in an useful and understandable way for the user. For example, it may generate reports using natural language templates with the user s terms like the CoverStory system does(Christopher 1993). Also at this point external domain knowledge is required to decide the most convenient output for the system. 4. Autonomy versus versatility, and the role of data dependencies The help of an experimented analyst who knows well both the tool and the application domain is needed to profit from ....
....for example, they are used for database design and normalization, and for query optimization. All this justifies the interest of developing both better methods to perform the dependency analysis across all types of data, and extraction algorithms that make a greater use of the dependency networks (Christopher 1993). It is equally important to develop tools for presenting the results to the user. But knowledge from data dependencies, that is, implicit domain knowledge, is not the only one needed to accomplish the KDD task. External domain knowledge is also necessary. In this way, better methods for ....
Christopher J.M., Chan P.K. and Piatetsky-Shapiro G. (1993). Systems for Knowledge Discovery in Databases. IEEE Transaction on Knowledge and Data Engineering. Vol.5, No. 6, 903-913.
....step can result in changes in preceding or succeeding steps. Furthermore, the nature of a real world data set, which may contain noisy, incomplete, dynamic, redundant, continuos, and missing values, certainly makes all steps critical on the path going from data to knowledge (Deogun et al. 1997; Matheus, Chan, and PiatetskyShapiro, 1993). One of the methods in data mining step is inductive learning, which is mostly concerned with finding general descriptions of a concept from a set of training examples. Practical data mining tools generally employ a number of inductive learning algorithms. For example, Silicon Graphics data ....
....to handle uncertainty. Generally, uncertain tolerant classification requires relaxing the constraint that the induced descriptions must classify the consistent part of training data (Clark and Niblett, 1989) which is equivalent to say that classification methods should generate almost true rules (Matheus, Chan, and Piatetsky Shapiro, 1993). Considering this point, a noise tolerant version of the ILA metric has been developed. The above idea is also supported by one of the guiding principles of soft computing: Exploit the tolerance for imprecision, uncertainty, and particular truth to achieve tractability, robustness, and low ....
Matheus, C.J., P.K. Chan, and G. Piatetsky-Shapiro. 1993. Systems for Knowledge Discovery in Databases. IEEE Trans. on Knowledge and Data Engineering 5(6): 903-912.
....by humans. Knowledge discovery systems in databases [1] are designed to analyze the data, find regularities in the data (knowledge) and present it to human with understandable formats. One of the goals of knowledge discovery in databases is to extract explicit concepts from the raw data [2, 1, 3, 4, 5]. The existence of numeric data and large amounts of records in a database pose a challenging task toward this goal due to the huge data space determined by numeric attributes. This paper introduces a method that reduces numeric data vertically and horizontally, keeps the discriminating power of ....
C.J. Matheus, P.K. Chan, and G. Piatetsky-Shapiro. Systems for knowledge discovery in databases. IEEE Trans. on Knowledge and Data Engineering, 5(6), December 1993.
....The pruning is performed by cutting the appropriate tree in the SLD tree. In practice this means that after performing unfolding, the pruning can be realized by removing the literal 3 SPECialization by TRansformation and Elimination 4 M P 0 denotes the least Hebrand model as described in [7] corresponding to the sub tree to be removed. An more extensive description of the SPECTRE algorithm is provided by Bostrom and Idestam Almquist [1] In the current paper these experiments will be extended by research involving the application of the SPECTRE algorithm on a real world domain (See ....
Matheus, C.J., Chan, P.K. and Piatetsky-Shapiro, G., "Systems for Knowledge Discovery in Databases. " IEEE Transactions on Knowledge and Data Engineering, volume 5, number 6 (1993), 903--913
....that the particular knowledge acquisition or machine learning tools can work with. Some of the issues which need to be addressed within the KDD context include noise, redundant information, missing values and attributes, large data sets and sparse data (Frawley, Piatetsky Shapiro and Matheus 1992, Matheus, Chan and PiatetskyShapiro 1993). Redundancy can result from the inclusion of attributes and records in the source data which are irrelevant or superfluous to the data mining. Determining which attributes and records are redundant is usually difficult. Removing apparently irrelevant attributes can lead to a reduction in the ....
....knowledge from the user s viewpoint. KDD requires that the discovered knowledge be useful in some sense. The Pattern Evaluation function governs the selection of the patterns which are of interest to the user and has been identified by others as an integral part of KDD (Frawley et al. 1992, Matheus et al. 1993, Fayyad, Piatetsky Shapiro and Smyth 1996) Pattern evaluation is also important in reducing the search space (Holsheimer, Kersten, Mannila and Toivonen 1995) Formally, a Pattern Evaluation function F is a function that maps from a set of statements expressed in some language L (e.g. ....
Matheus, C. J., Chan, P. K. and Piatetsky-Shapiro, G.: 1993, Systems for knowledge discovery in databases, IEEE Transactions on Knowledge and Data Engineering 5(6), 903--913.
....restructuring, or reformatting of data. InfoHarness is aimed at facilitating individual and enterprise productivity by harnessing existing and new information assets. Many researchers have investigated the use of meta data to support runtime access to the original information [2,3,10,12] Others [8,12] have investigated the use of data mining for the automatic extraction of meta data. Our own work develops and synthesizes some of the ideas contained in these efforts to provide advanced search and browsing capabilities without imposing constraints on information suppliers or creators. ....
....J. Remde, M. Lesk, C. Lochbaum, and D. Ketchum, Enhancing the usability of text through computer delivery and formative evaluation: the SuperBook Project , In C. McKnight, A. Dillon, and J. Richardson, eds) Hypertext: A Psychological Perspective , Chichester: Ellis Horwood, 1993, pp. 71 136. [8] C.J. Matheus. P.K. Chan, and G. Piatetsky Shapiro, Systems for Knowledge Discovery in Databases , IEEE Transactions on Knowledge and Data Engineering, December 1993. 9] John R. Rymer, Distributed Object Computing , Distributed Computing Monitor, Vol. 8, No. 8, Boston, 1993. 10] A. Sheth and ....
[Article contains additional citation context not shown here]
C.J. Matheus. P.K. Chan, and G. Piatetsky-Shapiro, "Systems for Knowledge Discovery in Databases", IEEE Transactions on Knowledge and Data Engineering, December 1993.
....be transformed into a format that the particular data mining tools can work with. Some of the issues which need to be addressed within the KDD context include noise, redundant information, missing values and attributes, large data sets and sparse data (Frawley, Piatetsky Shapiro and Matheus 1992, Matheus, Chan and Piatetsky Shapiro 1993). Redundancy can result from the inclusion of attributes and records in the source data which are irrelevant or superfluous to the data mining. Determining which attributes and records are redundant is usually difficult. Removing apparently irrelevant attributes can lead to a reduction in the ....
....knowledge from the user s viewpoint. KDD requires that the discovered knowledge be useful in some sense. The Pattern Evaluation function governs the selection of the patterns which are of interest to the user and has been identified by others as an integral part of KDD (Frawley et al. 1992, Matheus et al. 1993, Fayyad, PiatetskyShapiro and Smyth 1996) Pattern evaluation is also important in reducing the search space (Holsheimer, Kersten, Mannila and Toivonen 1995) Formally, a Pattern Evaluation function F is a function that maps from a set of statements expressed in L (e.g. production rules) to a ....
Matheus, C. J., Chan, P. K. and Piatetsky-Shapiro, G.: 1993, Systems for knowledge discovery in databases, IEEE Transactions on Knowledge and Data Engineering 5(6), 903--913.
....whose initiative belongs to different agents as separate mutually interfering plans. If we prefer to take the plot as a single plan, we may concentrate on the viewpoint of one agent, such as the (databaseowner) corporation itself. After a database has been put to use, a vital knowledge discovery [MCP93] opportunity is offered by the study of plots: to create a library of typical plans employed by users to reach successfully (and also, optionally, without success) their identifiable goals. Several techniques are applicable for combining analogous plots to build the patterns of the typical plans, ....
Matheus, C. J., Chan, P. K. and Piatesky-Shapiro, G. - "Systems for knowledge discovery in databases". IEEE Transactions on Knowledge and Data Engineering, 5, 6, 1993.
....more meaningful groups. Most ILP systems can be used with different quantities and qualities of background knowledge. Adding background knowledge to the specialization problem has the advantage that the search process can be biased to raise the efficiency of the search process (see [11] [7], 5] 13] 3] When databases grow larger this topic becomes of more interest. Background knowledge can be elicited performing Knowledge Acquisition and is normally represented in a symbolical form (This is one of the reasons for our approach to represent the example symbolical as well) The ....
....can only be mentioned here due to space constraints. As mentioned above adding background knowledge might increase efficiency and accuracy of the search algorithm by biased searching and avoiding (known) local maxima. Focussing is found a useful process in the stage of learning from databases ( [7] for more on this topic) Interviews and reports resulted in a preliminary model of the processes involved and their connections. Partly this knowledge was selected to provide knowledge implemented in the form of intermediate predicates, were another part of this knowledge was used to focus the ....
Matheus, C.J., Chan, P.K. and Piatetsky-Shapiro, G., "Systems for Knowledge Discovery in Databases." IEEE Transactions on Knowledge and Data Engineering, volume 5, number 6 (1993), 903--913
.... 1994; Loucopoulos 1994; Loucopoulos and Katsouli 1992; Loucopoulos and Kavakli 1995; Rolland and Grosz 1994] The term knowledge discovery in data refers to correlations between data variables, identification of rules, and classifications implicitly contained in large amounts of corporate data [Matheus, Chan, et al. 1993; Yoon and Kerschberg 1993] Enterprise knowledge modelling provides the basis for developing models of current business processes and objectives for change, including changes to the business goals and business rules. Knowledge Discovery in Data is used for investigating the behaviour of the ....
Matheus, C.J., Chan, P.K. and Piatetsky-Shapiro, G. (1993) Systems for Knowledge Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, December, 1993, pp. 903-913.
....intractable without computer assistance and powerful analytical tools. Standard computer based statistical and analytical packages alone, however, are of limited benefit without the guidance of trained statisticians to apply them correctly and the domain experts to filter and interpret the results [MCP93]. Data mining has been ranked as one of the most promising topics for research for the 1990s by both database and machine learning researchers [SSU91] William Frawley and his colleague [FPM91] give a definition of knowledge as follows: Given a set of facts (data) F , a language L, and some ....
....data and storing knowledge rules. The algorithms in this system do not take advantage of database implementation techniques in the learning process. 2.4. 2 KDW System Like INLEN, the Knowledge Discovery Workbench (KDW) is a collection of tools for the interactive analysis of large databases [MCP93]. Its components have evolved through three versions (KDW, KDW II, and KDW ) all of which provide a graphical user interface to a suite of tools for accessing database tables, creating new fields, defining a focus, plotting data and results, applying discovery algorithms and handling domain ....
C.J. Matheus, P.K. Chan, and G. Piatetsky-Shapiro, (1993). Systems for Knowledge Discovery in Databases, IEEE transaction on Knowledge and data Engineering, Vol 5(6) 903-913
....be analyzed, though they contain potential gold mine of valuable information. Unfortunately, the database technology of today offers little functionality to explore such data. At the same time, knowledge discovery 1 techniques for intelligent data analysis are not yet mature for large data sets[23, 2, 49, 10, 25]. Furthermore, the fact that data has been organized and collected around the needs of organizational activities may pose a real difficulty in locating relevant data for knowledge discovery techniques from diverse sources. The 1 Knowledge discovery can be defined as the nontrivial extraction of ....
....of the data that should be addressed by a KDD system. Then we consider the case where a KDD system has a DBMS interface. Second, we present a taxonomy of database mining queries, which is not exhaustive; however, it constitutes an interesting subset of the ones cited in the literature[1, 4, 19, 25, 36]. 2 In the literature, the database mining problem is also known as data mining or the knowledge discovery in databases. 2.1 Characteristic features of the database mining problem 1. Ultra Large Data: The volume of data in real world database systems has already reached to the level of giga ....
[Article contains additional citation context not shown here]
C. J. Matheus, P. K. Chan, and G. Piatetsky-Shapiro. Systems for knowledge discovery in databases. IEEE Trans. on Knowledge and Data Engineering, 5(6):903--912, 1993.
....restructuring, or reformatting of data. InfoHarness is aimed at facilitating individual and enterprise productivity by harnessing existing and new information assets. Many researchers have investigated the use of meta data to support runtime access to the original information [2,3,10,12] Others [8,12] have investigated the use of data mining for the automatic extraction of meta data. Our own work develops and synthesizes some of the ideas contained in these efforts to provide advanced search and browsing capabilities without imposing constraints on information suppliers or creators. ....
....J. Remde, M. Lesk, C. Lochbaum, and D. Ketchum, Enhancing the usability of text through computer delivery and formative evaluation: the SuperBook Project , In C. McKnight, A. Dillon, and J. Richardson, eds) Hypertext: A Psychological Perspective , Chichester: Ellis Horwood, 1993, pp. 71 136. [8] C.J. Matheus. P.K. Chan, and G. Piatetsky Shapiro, Systems for Knowledge Discovery in Databases , IEEE Transactions on Knowledge and Data Engineering, December 1993. 9] John R. Rymer, Distributed Object Computing , Distributed Computing Monitor, Vol. 8, No. 8, Boston, 1993. 10] A. Sheth and ....
[Article contains additional citation context not shown here]
C.J. Matheus. P.K. Chan, and G. Piatetsky-Shapiro, "Systems for Knowledge Discovery in Databases", IEEE Transactions on Knowledge and Data Engineering, December 1993.
....this data, creating a need for automated analysis of databases. Knowledge Discovery in Databases (KDD) 6] which is also referred to as data mining, is the process of identifying valid, novel, and potentially useful knowledge from databases. Several KDD tasks have been defined in the literature [10], e.g. class identification, classification and dependency analysis. The task considered in this paper is class identification, i.e. the grouping of the objects of a database into meaningful subclasses. In this paper, the objects to be clustered are points in multidimensional space. These points ....
Matheus, C. J., Chan, P. K., Piatetsky-Shapiro, G. (1993), "Systems for Knowledge Discovery in Databases", IEEE Trans, on Knowledge and Data Engineering, Vol. 5, No. 6, 903--913.
....[6] 7] The work done in data mining focuses on the semi automatic extraction of knowledge. In all mentioned areas, important advances have been made over the last years. Many novel data mining techniques have been developed and several advanced data mining systems have been implemented [1] [8]. Nowadays, however, only a limited number of approaches work for very large amounts of data (millions of data items) and little interest has been given to noisy data [8] Examples for techniques that work for very large data sets are DHP [9] Apriori [10] and DBLearn [11] and examples for ....
....years. Many novel data mining techniques have been developed and several advanced data mining systems have been implemented [1] 8] Nowadays, however, only a limited number of approaches work for very large amounts of data (millions of data items) and little interest has been given to noisy data [8]. Examples for techniques that work for very large data sets are DHP [9] Apriori [10] and DBLearn [11] and examples for techniques that also work for noisy data are DBLearn [11] and CLARANS [12] An interesting observation is that all mentioned techniques work fully automatically but need to ....
C. J. Matheus, P. K. Chan, G. Piatetsky-Shapiro: `Systems for Knowledge Discovery in Databases', IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 903-913, 1993.
....design processes over again and very few, if any, of these steps are automated. Thus, the grand challenge of KDD is to automatically process large quantities of raw data, identify the most significant and meaningful patterns, and present these as knowledge appropriate for achieving a user s goals (Matheus, Chan, and Piatetsky Shapiro, 1993). A piece of knowledge is a relationship or pattern among data elements that is potentially interesting and useful. In general, discovery means finding something that is hidden or previously unknown. A knowledge discovery system, then, is a system that can discover knowledge. When a knowledge ....
....interesting and useful. In general, discovery means finding something that is hidden or previously unknown. A knowledge discovery system, then, is a system that can discover knowledge. When a knowledge discovery system operates on data in a large, real world database, it becomes a KDD system (Matheus, Chan, and Piatetsky Shapiro, 1993) or a data mining system. Unfortunately, the relational database technology of today offers little functionality to explore data in such a fashion. At the same time KD techniques for intelligent data analysis are not yet mature for large real world databases , the contents of which may be of poor ....
[Article contains additional citation context not shown here]
Matheus, C. J., Chan, P. K. & Piatetsky-Shapiro, G. (1993). Systems for knowledge discovery in databases, IEEE Trans. on Knowledge and Data Engineering,vol. 5, no. 6, 903-912.
.... : 163 xiii Chapter 1 Introduction With the rapid growth in size and number of available databases in commercial, industrial, administrative and other applications, it is necessary and interesting to examine how to extract knowledge automatically from huge amounts of data [33, 72, 36]. For example, the Wal Mart databases collect 20 million transactions every day. Knowledge Discovery in Databases (KDD) or data mining, is the effort to understand, analyze, and eventually make use of the huge volume of data available. Through the extraction of knowledge in databases, large ....
....1.2. A data mining session is usually an interactive process of data mining query submission, task analysis, data collection from the database, interesting pattern search, and findings presentation. 1. 1 Data Mining Tasks There have been many interesting studies on knowledge discovery in databases [33, 72, 87]. These studies cover a wide variety of data mining tasks and use different methodologies. The most common types of data mining tasks, classified based on the kind of knowledge they are looking for, are listed as follows. A survey of different methodological approaches to KDD, including machine ....
C. Matheus, P. K. Chan, and G. Piatetsky-Shapiro. Systems for knowledge discovery in databases. IEEE Trans. Knowledge and Data Engineering, 5:903--913, 1993.
....automatic fashion, rather than developing individual applications for each user s need. Unfortunately, the database technology of today offers little functionality to explore data in such a fashion. At the same time KD techniques for intelligent data analysis are not yet mature for large data sets [3]. Furthermore, the fact that data has been organized and collected around the needs of organizational activities may pose a real difficulty in locating relevant data for knowledge discovery techniques from diverse sources. The data mining 1 problem is defined to emphasize the challenges of ....
.... such as database systems, machine learning, intelligent information systems, statistics, data warehousing and knowledge acquisition in expert systems [4] It may be noted that data mining is different from the goals and emphases of the individual fields, though it may heavily use their results [5, 3, 6, 7, 8]. In the following we present the basic differences (and or similarities) between a data mining problem and research interests of the various allied fields. In developing database systems to manage uncertain (or imprecise) information as well as certain (or precise) information, several extensions ....
[Article contains additional citation context not shown here]
C. J. Matheus, P. K. Chan, and G. Piatetsky-Shapiro, "Systems for knowledge discovery in databases," IEEE Trans. on Knowledge and Data Engineering, vol. 5, no. 6, pp. 903--912, 1993.
....inference generalizes precise inference in that it models the ability to infer sets of values or inference chunks. It is widespread in precise databases and plays an important role in knowledge discovery and database security. Imprecise inference analysis can mine rules from database data [4]. In security applications, it helps determine whether or not a database system is free from inference attacks [3,5,6,8] This paper considers the problem of catalytic inference analysis. Common sense knowledge possessed by users must be included explicitly in database inference analysis. Such ....
C. Matheus, P. Chan and G. PiatetskyShapiro, "Systems for knowledge discovery in databases," IEEE Transactions on Knowledge and Data Engineering, vol. 5, pp. 903-913, 1993.
....provides services for progressively describing the result of the activity in an object oriented manner and thanks to the proposed representation model and the associated schema construction primitives. We recently begun investigating the potential bene ts of mixing Knowledge Discovery mechanisms [13] and program analysis in the re engineering and the reverse engineering of software systems. 4 Conceptual Integration The reverse engineering step, as applied to every individual system in the legacy systems, delivers abstract individual representations. The aim of the conceptual integration step ....
C.J. Matheus, P.K. Chan, and G. Piatetsky-Shapiro. Systems for Knowledge Discovery in Databases. IEEE Trans. on Knowledge and Data Engineering, 5(6):903912, December 1993.
....1.2 Spatial Data Mining Architecture Various architectures (models) have been proposed for data mining. They include Han s architecture for general data mining prototype DBLEARN DBMINER [24] Holsheimer et al. s parallel architecture [29] and Matheus et al. s multicomponent architecture [37]. Almost all of these architectures have been used or extended to handle spatial data mining. Matheus et al. s architecture seems to be very general and has been used by other researchers in spatial data mining, including Ester et al. 13] This architecture comparable to others is presented ....
C. J. Matheus, P. K. Chan, and G. Piatetsky-Shapiro. Systems for Knowledge Discovery in Databases. In IEEE Trans. Knowledge and Data Engineering, 5:903-- 913, 1993.
....automatic fashion, rather than developing individual applications for each user need. Unfortunately, the database technology of today offers little functionality to explore data in such a fashion. At the same time KD techniques for intelligent data analysis are not yet mature for large data sets [3]. Furthermore, the fact that data has been organized and collected around the needs of organizational activities may pose a real difficulty in locating relevant data for knowledge discovery techniques from diverse sources. The data mining 1 problem is defined to emphasize the challenges of ....
....data warehousing and knowledge acquisition in expert systems [4] It may be noted that data mining is a distinct descipline and its objectives are different from the goals and emphases of the individual fields. Data mining may, however, heavily use theories and developments of these fields [5, 3, 6, 7, 8]. In the following we present basic differences (and or similarities) between data mining and various allied research areas. In developing database systems to manage uncertain (or imprecise) information as well as certain (or precise) information, several extensions to relational model have been ....
[Article contains additional citation context not shown here]
C. J. Matheus, P. K. Chan, and G. Piatetsky-Shapiro, "Systems for knowledge discovery in databases," IEEE Trans. on Knowledge and Data Engineering, vol. 5, no. 6, pp. 903--912, 1993.
....including object oriented, deductive, and spatial database systems. The implementation status of DBMiner, a system prototype which applies the method, is also reported here. 16.1 Introduction With an upsurge of the application demands and research activities on knowledge discovery in databases (Matheus, Chan and Piatetsky Shapiro 1993; Piatetsky Shapiro and Frawley 1991) an attribute oriented induction method (Cai, Cercone and Han 1991; Han, Cai and Cercone 1993) has been developed as an interesting technique for mining knowledge from data. The method integrates a machine learning paradigm (Michalski 1983) especially ....
....mechanism leads to an efficient implementation of a data mining system. Besides further advance of our study on the attribute oriented induction methodology, we are also investigating other data mining methods (Agrawal, Imielinski and Swami 1993; Chu and Chiang 1994; Kivinen and Mannila 1994; Matheus, Chan and PiatetskyShapiro 1993; Michalski et al. 1992; Piatetsky Shapiro and Frawley 1991; Shen et al. 1994; Uthurusamy, Fayyad and Spangler 1991; Ziarko 1994) and working on the integrated method for discovery of various kinds of knowledge in different kinds of database and information systems. Acknowledgements Research is ....
Matheus, C., Chan, P. K. and Piatetsky-Shapiro, G. 1993. Systems for knowledge discovery in databases. IEEE Trans. Knowledge and Data Engineering, 5(6): 903-- 913.
....values ranges) or information chunks [8,9] It is widespread even in precise databases and has important applications in knowledge discovery and database security. By systematically analyzing the extent of inference in databases, it is possible to mine new facts and rules from database data [1,10,12,14]. In security applications, inference analysis helps determine whether or not a database system is free from potential inference compromises [9,13,15,18,19] Examples of imprecise inference compromises range from identifying an individual s salary within a few thousand dollars based on mortgage ....
C. Matheus, P. Chan and G. Piatetsky-Shapiro, "Systems for knowledge discovery in databases," IEEE Transactions on Knowledge and Data Engineering, vol. 5, pp. 903-913, 1993.
....will become increasingly tedious as the size of the database grows. If the robot can discover associations and regularities in the data, these can be used both to make expedient predictions about new objects and to index database objects for efficient retrieval. Current knowledge discovery systems (Matheus et al. 1993) use a highly restricted knowledge representation language for representing patterns and regularities: they can only capture associations among different features of objects. A new generation of knowledge discovery tools now deals with structured concepts: they capture associations between ....
Matheus, C. J.; Chan, P. K.; and Piatetsky-Shapiro, G. 1993. Systems for knowledge discovery in databases. IEEE Trans. Knowledge and Data Engineering 5(6):903--913.
....hold for large databases. Furthermore, the runtime of CLARANS is prohibitive on large databases. In general, the issue of interfacing KDD systems with a database management system (DBMS) has received little attention in the KDD literature and many systems are not yet integrated with a DBMS (c.f. MCP 93] MCP 93] proposes an architecture of a KDD system including a DBMS interface and a focusing component. Well known techniques are, e.g. focusing on a small subset of all tuples or focusing on a subset of all attributes. AIS 93] presents a set of basic operations for solving different KDD tasks ....
....databases. Furthermore, the runtime of CLARANS is prohibitive on large databases. In general, the issue of interfacing KDD systems with a database management system (DBMS) has received little attention in the KDD literature and many systems are not yet integrated with a DBMS (c.f. MCP 93] MCP 93] proposes an architecture of a KDD system including a DBMS interface and a focusing component. Well known techniques are, e.g. focusing on a small subset of all tuples or focusing on a subset of all attributes. AIS 93] presents a set of basic operations for solving different KDD tasks and shows ....
[Article contains additional citation context not shown here]
Matheus C.J., Chan P.K., Piatetsky-Shapiro G.: "Systems for Knowledge Discovery in Databases", IEEE Transactions on Knowledge and Data Engineering, Vol.5, No.6, 1993, pp. 903-913.
....in preceding or succeeding steps. Furthermore, the nature of a large, real world data set, which may contain noisy, incomplete, dynamic, redundant, spare, and missing values, certainly requires that existing techniques and approaches be extended to cope with such problems (Deogun et al. 1996; Matheus, Chan Piatetsky Shapiro, 1993). This paper uses rough set model to address issues related to some aspects of real world data and investigates the interactions between feature selection algorithms and rough classifiers. The potential for using rough set methodology for investigating problems relating to very large and dynamic ....
Matheus, C. J., Chan, P. K. & Piatetsky-Shapiro, G. (1993). Systems for knowledge discovery in databases, IEEE Trans. on Knowledge and Data Engineering,vol. 5, no. 6, 903-912.
....knowledge discovery becomes more and more important in spatial databases. Knowledge discovery in databases (KDD) is the non trivial extraction of implicit, previously unknown, and potentially useful information from databases [FPM 91] Awide variety of algorithms have been proposed for KDD. MCP 93] tries to classify these algorithms and identifies the following generic tasks: class identification, i.e. grouping the objects of the database into meaningful subclasses. classification, i.e. finding rules that describe the partition of the database into a given set of classes. dependency ....
Matheus C.J., Chan P.K., Piatetsky-Shapiro G.: "Systems for Knowledge Discovery in Databases", IEEE Transactions on Knowledge and Data Engineering, Vol.5, No.6, 1993, pp. 903-913.
....rules that generalize well can be safely applied to the application database with unknown classes to determine each tuple s class. This problem has been widely studied by researchers in the AI field [28] It is recently reexamined by database researchers in the context of large database systems [5, 7, 14, 15, 13]. Two basic approaches to the classification problems studied by AI researchers are the symbolic approach and the connectionist approach. The symbolic approach is based on decision trees and the connectionist approach mainly uses neural networks. In general, neural networks give a lower ....
C.J. Matheus, P.K. Chan, and G. PiatetskyShapiro. Systems for knowledge discovery in databases. IEEE Trans. on Knowledge and Data Engineering, 5(6), December 1993.
....of an item. 3.2. 4 Analysis Function Similarity in analysis (sim a ) can be considered as a way of knowledge mining, i.e. a way of finding interesting relations within an information base (Cook and Holder, 1994; Milosavljevic and Jurka, 1993; Arikawa et al. 1993; Parsaye et al. 1991; Matheus, Chan and Piatetsky Shapiro, 1993; Frawley and Piatetsky Shapiro, 1991; Langley, Simon and Bradshaw, 1990; Hamilton, 1990) Formally, we define analysis function as follows: Definition 3.8 (Similarity in analysis) Given an information base Delta, then sim a (var SI ; var Omega ; Delta) implies that Omega 0 2 var Omega iff ....
Matheus, C. J., Chan, P. K., and Piatetsky-Shapiro, G. (1993). Systems for knowledge discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):903--913.
....agreements on nomenclature over the sites. These agreements already exist in a few communities, notably libraries for bibliographic data, and in electronic commerce (Electronic Data Interchange or EDI, Becker 1990) A particular example of a new class of service is data mining on a data warehouse (Matheus et al. 1993). This technology uses algorithms related to SQL aggregation to identify patterns in large data sets. It is very computation intensive and relies on successive data reductions on the server to extract a relatively small pattern. A group of major retailers could make their data warehouses available ....
Matheus, C.J., Chan, P.K. and Piatesky-Shapiro, G. (1993) "Systems for knowledge discovery in databases" IEEE Transactions on Knowledge and Data Engineering Vol. 5, No. 6, pp. 903-913.
....which do not require any preliminary or additional information about data, as opposed to rough sets in probabilistic approximation spaces. We use upper classifiers and decision tables to address some aspects of very large data that can be listed as redundant, incomplete, noisy, and dynamic data [14]. In the rough set literature, the terms inconsistent and nondeterministic decision algorithms (or rules) are used interchangeably [15, 16] though they are different concepts. As shown in Deogun et al. 17] inconsistent decision algorithms, under an appropriate representation structure, can ....
C. J. Matheus, P. K. Chan, and G. PiatetskyShapiro, "Systems for knowledge discovery in databases," IEEE Trans. on Knowledge and Data Engineering, vol. 5, no. 6, pp. 903--912, 1993.
....precise inference in that it models the inference of sets of values. Imprecise inference is prevalent even in precise databases and plays an important role in knowledge discovery and database security. Imprecise inference analysis can be used to mine rule based knowledge from database data [1,7,15]. Also, it can be used to determine whether or not a database system is free from inference attacks [5,8,9,12] This paper considers the important problem of catalytic inference analysis. Operating under the closed world assumption requires that a priori knowledge possessed by users be ....
C. Matheus, P. Chan and G. Piatetsky-Shapiro, "Systems for knowledge discovery in databases," IEEE Transactions on Knowledge and Data Engineering, vol. 5, pp. 903-913, 1993.
....discovery and database security control. 1 Introduction The problem of analyzing inference in databases is interesting and important, especially in knowledge discovery and database security. Many researchers have shown that knowledge can be mined from precise relational databases (see, e.g. [1,10,11,13,18,25]) One facet of this knowledge is a collection of rules obtained by characterizing dependencies between condition and consequent attributes. Systematic database mining is a rich source of rule structured knowledge. By analyzing the extent of inference in databases, it is possible to extract new ....
C. Matheus, P. Chan and G. Piatetsky-Shapiro, "Systems for knowledge discovery in databases," IEEE Transactions on Knowledge and Data Engineering, vol. 5, pp. 903-913, 1993.
....by a learning algorithm when it is presented with large amounts of data, which results in intolerable performance or inability of the algorithm to execute. More importantly, machine learning is central to knowledge discovery in databases data mining (KDD DM) Piatesky Shapiro Frawley, 1991; Matheus et al. 1993) systems. In most cases research in this area is faced with massive databases. That is, learning systems are facing vast amounts of information and scaling them up is a critical issue facing machine learning research. In the next section we explore the relationship between inductive learning and ....
Matheus, C., Chan, P., & Piatesky-Shapiro, G. (1993). Systems for knowledge discovery in databases. IEEE Trans. Knowledge and Data Engineering, 5(6), 903-- 913.
....coming age of very large network computing, it is likely that orders of magnitudemore data in databases will be available for various learning problems of real world importance. The Grand Challenges of HPCC [20] are perhaps the best examples. Learning techniques are central to knowledge discovery [11] and the approach proposed here may substantially increase the amount of data a Knowledge Discovery system can handle effectively. Quinlan [14] approached the problem of efficiently applying learning systems to data that are substantially larger than available main memory with a windowing ....
C. Matheus, P. Chan, and G. PiateskyShapiro. Systems for knowledge discovery in databases. IEEE Trans. Know. Data. Eng., 1993. To appear.
....direct human inspection or are overlooked due to the sheer volume of data. Various machine learning algorithms have been proposed to learn descriptive relationships and uncover rules that classify and explain what the data mean. These techniques are central to automated knowledge discovery systems [12], which generally also contain statistical data analysis tools. Attaining high prediction accuracy is the primary goal of most of the work on inductive learning techniques (techniques that use minimal or no domain knowledge and are further discussed in Section 2) and is also the focus of our ....
C. Matheus, P. Chan, and G. Piatesky-Shapiro. Systems for knowledge discovery in databases. IEEE Trans. Know. Data. Eng., 1993. To appear.
No context found.
Matheus, C. J.; Chan, P. K.; and Piatetsky-Shapiro, G. 1993. Systems for knowledge discovery in databases. IEEE Transactions on Knowledge and Data Engineering 5(6):903--913. Special Issue on Learning and Discovery in Knowledge-Based Databases.
No context found.
C.J. Matheus, P.K. Chan, and G. Piatetsky-Shapiro, "Systems for knowledge discovery in databases," IEEE Transactions on Knowledge and Data Engineering,vol. 5, pp. 903--913, 1993.
No context found.
Matheus, C, J., Chan, P, K., Piatetsky-Shapiro, G., (1993), "Systems for Knowledge Discovery in Databases", IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, December, pp. 903913.
No context found.
Matheus, C. J., P. K. Chan, and G. Piatetsky-Shapiro. Systems for Knowledge Discovery in Databases. In IEEE Trans. Knowledge and Data Engineering, pp. 903-913, 1993.
No context found.
Matheus C., Chan P.K. and Piatetsky-Shapiro G., 1993. Systems for knowledge discovery in databases, IEEE Transactions on Knowledge and Data Engineering, 5(6).
No context found.
C. J. Matheus, P. K. Chan, and G. Piatetsky-Shapiro. Systems for knowledge discovery in databases. IEEE Trans. Knowl. & Data Eng., 5(6), 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC