IHKU CSIS Tech Report TR-2002-05J A GSP-based Efficient Algorithm for Mining Frequent Sequences
Abstract:
This paper studies the problem of mining frequent sequences in transactional databases. In [3], Agrawal and Srikant proposed the GSP algorithm for extracting frequently occurring sequences. GSP is an iterative algorithm. It scans the database a number of times depending on the length of the longest frequent sequences in the database. The I/O cost is thus substantial if the database contains very long frequent sequences. In this paper, we extend the candidate generating function used by GSP and propose a new two-stage algorithm IqFS. Our algorithm first mines a sample of the database to obtain a rough estimate of the frequent sequences and then refines the solution. Experiment results show that IqFS saves I/O cost significantly compared with GSP.
Citations
| 299 | Mining sequential patterns: Generalizations and performance improvements – Srikant, Agrawal - 1996 |

