| U. Fayyad, D. Haussler, and P. Stolorz. Mining science data. Communications of ACM, 39(11), 1996. |
....could leave the reader in a state of doubt: with a path littered with so many explosive mines, is there any hope to data mining endeavors The answer is a resounding yes. Indeed, many data mining activities have resulted in great results. For examples of applications in science data analysis, see [10] and in industry see [4] The primary reasons for hope in data mining include: Database Regularity. Most databases exhibit a good degree of regularity and do not approach the theoretically possible levels of difficulty . For example, consider data on user supermarket shopping, or equivalently ....
....seem to do just fine. Frequent experience is that greedy heuristic algorithm that examine one 46 dimension at a time, locally ignoring interactions between dimensions, appear to perform nicely over real world data sets. This surprising fact is responsible for many of the successes of data mining [10, 4]. Sampling can help. It may not be necessary or desirable to work over the entire contents of a large database. If the model being built is simple, then using lots of data may not be called for. Recall that if all one is doing is estimating the average value of a column, then a small sample of a ....
[Article contains additional citation context not shown here]
U. Fayyad, D. Haussler, and P. Stolorz. Mining science data. Communications of ACM, 39(11), 1996.
....measure among different cases and the location of cluster centres and corresponding boundaries. Like the nearest neighbour technique, clustering is completely dependent on the distance measure to be applied. Clustering is an important application area for many fields including data mining [47], statistical data analysis [75, 49] data compression or reduction [161] 38 psychology, biology, sociology, and business applications [16] For instance, clustering techniques can be used in marketing for finding customer groups; to find patients with similar profiles in the health care ....
Fayyad U., Haussler D., and Stolorz P., "Mining Science Data". Communications of the ACM, 39(11), 1996.
....based on our similarity query execution algorithms are discussed. The performance of these techniques is evaluated in Sec. 7. Finally, Sec. 8 concludes the paper and provides an overview of our future plans. 2 Related Work Clustering has been used in a number of areas such as statistics [18, 11], pattern recognition [14, 10] and machine learning [12] to name a few. In addition, there has been a significant amount of work in the area of content based extraction [15, 1] In this paper, we attempt to merge these two areas to develop techniques to improve the performance of similarity ....
U. Fayyad, D. Haussler, and P. Stolorz. Mining Science Data. Communications of the ACM, 39(11), 1996.
....of our similarity query execution algorithms are presented in Sec. 5. The performance of these techniques is evaluated in Sec. 6. Finally, Sec. 7 concludes the paper and provides an overview of our future plans. 2 Related Work Clustering has been used in a number of areas such as statistics [18, 11], pattern recognition [14, 10] and machine learning [12] to name a few. In addition, there has been a significant amount of work in the area of content based extraction [15, 1] In this paper, we attempt to merge these two areas to develop techniques to improve the performance of similarity ....
U. Fayyad, D. Haussler, and P. Stolorz. Mining Science Data. Communications of the ACM, 39(11), 1996.
....of these techniques is evaluated in Sec. 7. Finally, Sec. 8 concludes the paper and provides an overview of our future plans. 1 To simplify, here we assume a uniform distribution of objects into clusters. 2 Related Work Clustering has been used in a number of areas such as statistics [17, 10], pattern recognition [13, 9] and machine learning [11] to name a few. In addition, there has been a significant amount of work in the area of content based extraction [15, 1] In this paper, we attempt to merge these two areas to develop techniques to improve the performance of similarity ....
U. Fayyad, D. Haussler, and P. Stolorz. Mining Science Data. Communications of the ACM, 39(11), 1996.
....in memory implementations to large databases. 1 Preliminaries and Motivation Data clustering is important in many fields, including data mining [FPSU96] statistical data analysis [KR89,BR93] compression [ZRL97] and vector quantization [DH73] Applications include data analysis and modeling [FDW97,FHS96], image segmentation, marketing, fraud detection, predictive modeling, data summarization, general data reporting tasks, data cleaning and exploratory data analysis [B 96] Clustering is a crucial data mining step and performing this task over large databases is essential. Scaling EM Clustering ....
U. Fayyad, D. Haussler, and P. Stolorz. Mining Science Data. Communications of the ACM 39(11), 1996.
No context found.
Fayyad, U., D. Haussler, and P. Stolorz. "Mining Science Data." Communications of the ACM 39(11), November 1996.
....data. The framework is naturally extended to update multiple clustering models simultaneously. We empirically evaluate on synthetic and publicly available data sets. Introduction Clustering is an important application area for many fields including data mining [FPSU96] statistical data analysis [KR89,BR93,FHS96], compression [ZRL97] vector quantization, and other business applications [B 96] Clustering has been formulated in various ways in the machine learning [F87] pattern recognition [DH73,F90] optimization [BMS97,SI84] and statistics literature [KR89,BR93,B95,S92,S86] The fundamental clustering ....
U. Fayyad, D. Haussler, and P. Stolorz. "Mining Science Data." Communications of the ACM 39(11), 1996.
....Scaling EM Clustering to Large Databases Bradley, Fayyad, and Reina 1 1 Introduction Data clustering is important in many fields, including data mining [FPSU96] statistical data analysis [KR89,BR93] compression [ZRL97] and vector quantization. Applications include data analysis and modeling [FDW97,FHS96], image segmentation, marketing, fraud detection, predictive modeling, data summarization, and general data reporting tasks [B 96] It has important applications in data cleaning and exploratory data analysis. Clustering is a crucial data mining step and performing the task over massive databases ....
U. Fayyad, D. Haussler, and P. Stolorz. "Mining Science Data." Communications of the ACM 39(11), 1996.
No context found.
U. Fayyad, D. Haussler, and P. Stolorz. "Mining Science Data." Communications of the ACM 39(11), 1996.
No context found.
U. Fayyad, D. Haussler, and P. Stolorz. "Mining Science Data." Communications of the ACM 39(11), 1996.
....databases [93] 7 Research Challenges Successful KDD applications continue to appear, driven mainly by a glut in databases that have clearly grown to surpass raw human processing abilities. For examples of success stories in applications in industry see [15] and in science analysis see [43]. More detailed case studies are found in [45] Driving the growth of this field are strong forces (both economic and social) that are a product of the data overload phenomenon. We view the need to deliver workable solutions to pressing problems as a very healthy pressure on the KDD field. Not ....
U. Fayyad, D. Haussler, and P. Stolorz. Mining science data. Communications of ACM, 39(11), 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC