MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  IHKU CSIS Tech Report TR-2002-05J A GSP-based Efficient Algorithm for Mining Frequent Sequences

Download:
Download as a PDF
by Minghua Zhang, Ben Kao, Chi-lap Yip, David Cheung
http://www.csis.hku.hk/research/techreps/document/TR-2002-05.pdf
Add To MetaCart

Abstract:

This paper studies the problem of mining frequent sequences in transactional databases. In [3], Agrawal and Srikant proposed the GSP algorithm for extracting frequently occurring sequences. GSP is an iterative algorithm. It scans the database a number of times depending on the length of the longest frequent sequences in the database. The I/O cost is thus substantial if the database contains very long frequent sequences. In this paper, we extend the candidate generating function used by GSP and propose a new two-stage algorithm IqFS. Our algorithm first mines a sample of the database to obtain a rough estimate of the frequent sequences and then refines the solution. Experiment results show that IqFS saves I/O cost significantly compared with GSP.

Citations

299 Mining sequential patterns: Generalizations and performance improvements – Srikant, Agrawal - 1996