Mining time-changing data streams (2001)
| Venue: | IN PROC. OF THE 2001 ACM SIGKDD INTL. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING |
| Citations: | 196 - 4 self |
BibTeX
@INPROCEEDINGS{Hulten01miningtime-changing,
author = {Geoff Hulten and Laurie Spencer and Pedro Domingos},
title = {Mining time-changing data streams},
booktitle = {IN PROC. OF THE 2001 ACM SIGKDD INTL. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING},
year = {2001},
pages = {97--106},
publisher = {}
}
Years of Citing Articles
OpenURL
Abstract
Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a station-ary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying pro-cesses generating them changed during this time, sometimes radically. Although a number of algorithms have been pro-posed for learning time-changing concepts, they generally do not scale well to very large databases. In this paper we propose an efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner. This algorithm, called CVFDT, stays current while making the most of old data by growing an alternative subtree whenever an old one becomes ques-tionable, and replacing the old with the new when the new becomes more accurate. CVFDT learns a model which is similar in accuracy to the one that would be learned by reapplying VFDT to a moving window of examples every time a new example arrives, but with O(1) complexity per example, as opposed to O(w), where w is the size of the window. Experiments on a set of large time-changing data streams demonstrate the utility of this approach.







