Results 1 
4 of
4
Discretization from data streams: applications to histograms and data mining
 In Proceedings of the 2006 ACM symposium on Applied computing (SAC’06
, 2006
"... In this paper we propose a new method to perform incremental discretization. The basic idea is to perform the task in two layers. The rst layer receives the sequence of input data and keeps some statistics on the data using many more intervals than required. Based on the statistics stored by the rs ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
In this paper we propose a new method to perform incremental discretization. The basic idea is to perform the task in two layers. The rst layer receives the sequence of input data and keeps some statistics on the data using many more intervals than required. Based on the statistics stored by the rst layer, the second layer creates the nal discretization. The proposed architecture processes streaming examples in a single scan, in constant time and space even for in nite sequences of examples. We experimentally demonstrate that incremental discretization is able to maintain the performance of learning algorithms in comparison to a batch discretization. The proposed method is much more appropriate in incremental learning, and in problems where data ows continuously, as in most of the recent data mining applications. 1.
Sawtooth: Learning from huge amounts of data
, 2004
"... Data scarcity has been a problem in data mining up until recent times. Now, in the era of the Internet and the tremendous advances in both, data storage devices and highspeed computing, databases are filling up at rates never imagined before. The machine learning problems of the past have been augm ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Data scarcity has been a problem in data mining up until recent times. Now, in the era of the Internet and the tremendous advances in both, data storage devices and highspeed computing, databases are filling up at rates never imagined before. The machine learning problems of the past have been augmented by an increasingly important one, scalability. Extracting useful information from arbitrarily large data collections or data streams is now of special interest within the data mining community. In this research we find that mining from such large datasets may actually be quite simple. We address the scalability issues of previous widelyused batch learning algorithms and discretization techniques used to handle continuous values within the data. Then, we describe an incremental algorithm that addresses the scalability problem of Bayesian classifiers, and propose a Bayesiancompatible online discretization technique that handles continuous values, both with a “simplicity first ” approach and very low memory (RAM) requirements. To my family. To Nana. iii iv
Incremental Discretization for NaïveBayes Classifier
"... Abstract. NaïveBayes classifiers (NB) support incremental learning. However, the lack of effective incremental discretization methods has been hindering NB’s incremental learning in face of quantitative data. This problem is further compounded by the fact that quantitative data are everywhere, from ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. NaïveBayes classifiers (NB) support incremental learning. However, the lack of effective incremental discretization methods has been hindering NB’s incremental learning in face of quantitative data. This problem is further compounded by the fact that quantitative data are everywhere, from temperature readings to share prices. In this paper, we present a novel incremental discretization method for NB, incremental flexible frequency discretization (IFFD). IFFD discretizes values of a quantitative attribute into a sequence of intervals of flexible sizes. It allows online insertion and splitting operation on intervals. Theoretical analysis and experimental test are conducted to compare IFFD with alternative methods. Empirical evidence suggests that IFFD is efficient and effective. NB coupled with IFFD achieves a rapport between high learning efficiency and high classification accuracy in the context of incremental learning. 1
Partition Incremental Discretization Carlos
"... Abstract — In this paper we propose a new method to perform incremental discretization. This approach consists in splitting the task in two layers. The first layer receives the sequence of input data and stores statistics of this data, using a higher number of intervals than what is usually required ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — In this paper we propose a new method to perform incremental discretization. This approach consists in splitting the task in two layers. The first layer receives the sequence of input data and stores statistics of this data, using a higher number of intervals than what is usually required. The final discretization is generated by the second layer, based on the statistics stored by the previous layer. The proposed architecture processes streaming examples in a single scan, in constant time and space even for infinite sequences of examples. We demonstrate with examples that incremental discretization achieves better results than batch discretization, maintaining the performance of learning algorithms. The proposed method is much more appropriate to evaluate incremental algorithms, and in problems where data flows continuously as most of recent data mining applications.