| D.B. Skillicorn, "Strategies for Parallel Data Mining," IEEE Concurrency, Oct./Dec. 1999. |
....done using volunteer computing. Other examples of data mining not yet implemented with volunteer computing include finding patterns in customers buying records that may help guide marketing strategies (e.g. customers who buy product A are likely to buy product B [110] and many others [122, 140]. These applications have the potential of bringing the benefits of volunteer computing beyond scientific applications to commercial applications as well. For future work, it would be useful not only to implement more examples of applications under the classes already described here, but to find ....
D.B. Skillicorn. Strategies for Parallel Data Mining. Technical report TR1999.
....time I O Time Figure 8: Performance of k Nearest Neighbor 14 results on an SMP machine, it is using MPI) Second, I O is handled explicitly by the programmers in their approach. The similarity among parallel versions of different data mining techniques have also been observed by Skillicorn [37, 36]. Our work is different in offering a middleware to exploit the similarity, and ease parallel application development. The challenges in scalable and parallel data mining we listed in Section 1 have also been observed by a number of other authors [5, 13, 21, 26, 28, 31, 36] Several runtime ....
....have also been observed by Skillicorn [37, 36] Our work is different in offering a middleware to exploit the similarity, and ease parallel application development. The challenges in scalable and parallel data mining we listed in Section 1 have also been observed by a number of other authors [5, 13, 21, 26, 28, 31, 36]. Several runtime support libraries and file systems have been developed to support efficient I O in a parallel environment [15, 34] most noticeable among these is the PASSION library designed by Alok Choudhary s group [39, 40] They usually provide a collective I O interface, in which all ....
David B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, Oct-Dec 1999.
....for developing parallel implementation of data mining algorithms. However, they only focus on distributed memory parallelization, and I O is handled explicitly by the programmers. The similarity among parallel versions of different data mining techniques has also been observed by Skillicorn [25]. Our work is different in offering a middleware to exploit the similarity, and ease parallel application development. OpenMP is the general accepted standard for shared memory programming. OpenMP currently only supports scalar reduction variables and a small number of simple reduction ....
David B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, Oct-Dec 1999.
....(while they report results on an SMP machine, it is using MPI) Second, I O is handled explicitly by the programmers in their approach. The similarity among parallel versions of different data mining techniques, which motivated the design of our middleware, has also been observed by Skillicorn [26, 25]. Several runtime support libraries and file systems have been developed to support efficient I O in a parallel environment [11, 23] most noticeable among these is the PASSION library designed by Alok Choudhary s group [27, 28] They usually provide a collective I O interface, in which all ....
David B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, Oct-Dec 1999.
....human experts and meta learning algorithms. Fast DM algorithm are essential as well as the ecient coupling between them and the software managing the data. This problem is mostly clear for Parallel DM, where the I O bandwidth and communications are two balancing terms of parallelism exploitation [14]. To exploit parallelism at all levels, from the algorithm down to the I O system, thus removing any bottleneck, a higher degree of integration has already been advocated in the literature [15, 9] Ideally, parallel implementations of the DM algorithms, the le system, DBMS, and Data Warehouse ....
....to columns, breaking the records. Either of the two may be suited to a particular method for I O and algorithmic reasons. Of course, beside these 6 two simple schemes, other parallel organizations come from the coordinate decomposition of the input data and the structure of the search space [14]. 4 Apriori Association Rules The problem of association rule mining (ARM) which has been proposed back in 1993, has its classical application in market basket analysis. From the sell database we want to detect rules of the form AB ) C, meaning that a customer that buys together objects A and B ....
D. Skillicorn, Strategies for Parallel Data Mining, IEEE Concurrency 7 (4) (1999) 26-35.
....From the sequential ILP algorithm I can see that the disk access is one of the 16 most significant bottleneck. Therefore, how to divide the access to the dataset and minimize communication between processors are important to the total performance. In general, there are three different approaches [29] to parallelizing data mining. They are: ffl Independent Search. Each processor has access to the whole dataset, but each heads off into a different part of the search space, starting from a randomly chosen initial position. ffl Parallelize a sequential data mining algorithm. There are two forms ....
D.B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, 7(4):26--35, October 1999.
....new and improved serial algorithm is found, and one is forced to come up with new parallel formulations. Thus, it is crucial that the PKDD system support rapid development and testing of algorithms to facilitate algorithmic performance evaluation. One recent e ort in this direction are discuss by (Skillicorn 1999). He emphasizes the importance of and presents a set of cost measures that can be applied to parallel algorithms to predict their computation, data access, and communication performance. These measures make it possible to compare di erent parallel implementation strategies for data mining ....
D. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, 7(4):26-35, October-December 1999.
.... for parallel computing, especially as those who want the results of parallel programs derive a direct commercial advantage from them, and are thus willing to pay the costs involved [43] Many data mining algorithms can partition the data evenly, and are thus a good fit with BSP structured programs [32]. Data mining techniques can be roughly divided into two kinds. The first kind is given data about customers for whom the outcome is known and computes predictors that can be used to predict the outcome for future, new customers. The second kind produce clusters showing how customers are ....
D. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, 7(4):26--35, October--December 1999.
....redundant examples. 3 Parallel Inductive Logic There are usually a number of ways in which a data mining algorithm can be parallelized. It is often possible to make a relatively accurate judgement of the best way by taking account of the costs of computation, data access, and communication [10]. In many cases, an approach based on replicating the sequential data mining algorithm, and communicating frequently to spread the knowledge acquired by each processor, has been 3 divide dataset into p subsets forall processors i repeat if there is still an e in E i select e in E i form a ....
D.B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, 7(4):26--35, October 1999.
No context found.
D.B. Skillicorn, "Strategies for Parallel Data Mining," IEEE Concurrency, Oct./Dec. 1999.
No context found.
David B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, Oct-Dec 1999.
No context found.
David B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, Oct-Dec 1999.
No context found.
David B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, Oct-Dec 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC