Parallel and Distributed Association Mining: A Survey (1999) [65 citations — 1 self]
Abstract:
This article presents a survey of the state-of-the-art in parallel and distributed association rule mining (ARM) algorithms. This is direly needed given the importance of association rules to data mining, and given the tremendous amount of research it has attracted in recent years. This article provides a taxonomy of the extant association mining methods, characterizing them according to the database format used, search and enumeration techniques utilized, and depending on whether they enumerate all or only maximal patterns, and their complexity in terms of the number of database scans. The survey clearly lists the design space of the parallel and distributed ARM algorithms based on the platform used (distributed or sharedmemory), kind of parallelism exploited (task or data), and the load balancing strategy used (static or dynamic). A large number of parallel and distributed ARM methods are reviewed and grouped into related techniques. It is shown that there are a few dominant paradigms, while the other techniques propose optimizations over these base schemes. There are two goals of this survey. The first is to serve as a reference for both researchers and practitioners interested in the state-of-the-art in parallel and distributed ARM methods. The second is to point out the challenges and open research problems in this exciting field. 1

