Mining a commercial banking data set: The SaintEtiQ approach [1 citations — 1 self]
Abstract:
Abstract—In this paper, an original approach to database summarization is applied to a massive data set provided by a bank marketing department. The overall summarization process is concerned with the knowledge discovery paradigm, even if purposes of the approach are quite different from those of KDD. The summarization process is intended to find general representations of data overall the database, whereas KDD processes deal with knowledge nugget extraction from data, without prioritizing the cover property. The summarization process is based on an incremental and hierarchical conceptual clustering algorithm, building a summary hierarchy from database records. Levels of the hierarchy provides some views with different granularities over the entire database. Each summary describes part of the data set. Furthermore, the fuzzy set-based representation of summaries allows the system to ensure a strong robustness and accuracy regarding the wellknown threshold effect of the crisp clustering methods. The summarization process is also supported by some background knowledge, providing a user-friendly vocabulary to describe summaries with a high-level semantics. Even though our method is not immediately concerned with computational performance, its low time and memory requirements makes it appropriate for large real-life databases. The scalability of the process is demonstrated through the application on a banking data set. The produced summary hierarchy not only provides human-friendly views over the all database, but can also be queried in a knowledge discovery perspective.

