• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Learning in the presence of concept drift and hidden contexts, (1996)

by G Widmer, M Kubat
Venue:Mach. Learn.
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 285
Next 10 →

Mining time-changing data streams

by Geoff Hulten, Laurie Spencer, Pedro Domingos - IN PROC. OF THE 2001 ACM SIGKDD INTL. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING , 2001
"... Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a station-ary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying pro-cesses genera ..."
Abstract - Cited by 338 (5 self) - Add to MetaCart
Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a station-ary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying pro-cesses generating them changed during this time, sometimes radically. Although a number of algorithms have been pro-posed for learning time-changing concepts, they generally do not scale well to very large databases. In this paper we propose an efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner. This algorithm, called CVFDT, stays current while making the most of old data by growing an alternative subtree whenever an old one becomes ques-tionable, and replacing the old with the new when the new becomes more accurate. CVFDT learns a model which is similar in accuracy to the one that would be learned by reapplying VFDT to a moving window of examples every time a new example arrives, but with O(1) complexity per example, as opposed to O(w), where w is the size of the window. Experiments on a set of large time-changing data streams demonstrate the utility of this approach.

Collaborative filtering with temporal dynamics

by Yehuda Koren - In Proc. of KDD ’09 , 2009
"... Customer preferences for products are drifting over time. Product perception and popularity are constantly changing as new selection emerges. Similarly, customer inclinations are evolving, leading them to ever redefine their taste. Thus, modeling temporal dynamics should be a key when designing reco ..."
Abstract - Cited by 246 (4 self) - Add to MetaCart
Customer preferences for products are drifting over time. Product perception and popularity are constantly changing as new selection emerges. Similarly, customer inclinations are evolving, leading them to ever redefine their taste. Thus, modeling temporal dynamics should be a key when designing recommender systems or general customer preference models. However, this raises unique challenges. Within the eco-system intersecting multiple products and customers, many different characteristics are shifting simultaneously, while many of them influence each other and often those shifts are delicate and associated with a few data instances. This distinguishes the problem from concept drift explorations, where mostly a single concept is tracked. Classical time-window or instancedecay approaches cannot work, as they lose too much signal when discarding data instances. A more sensitive approach is required, which can make better distinctions between transient effects and long term patterns. The paradigm we offer is creating a model tracking the time changing behavior throughout the life span of the data. This allows us to exploit the relevant components of all data instances, while discarding only what is modeled as being irrelevant. Accordingly, we revamp two leading collaborative filtering recommendation approaches. Evaluation is made on a large movie rating dataset by Netflix. Results are encouraging and better than those previously reported on this dataset.
(Show Context)

Citation Context

...mpact on future behavior, while capturing longer-term trends that reflect the inherent nature of the data. This led to many works on the problem, which is also widely known as concept drift; see e.g. =-=[15, 25]-=-. Modeling temporal changes in customer preferences brings unique challenges. One kind of concept drift in this setup is the emergence of new products or services that change the focus of customers. R...

A streaming ensemble algorithm (SEA) for large-scale classification

by W. Nick Street , 2001
"... Classification ..."
Abstract - Cited by 167 (1 self) - Add to MetaCart
Classification
(Show Context)

Citation Context

...uilding our classier. Further, such problems are subject to gradual or sudden changes in the underlying concept, as business conditions not re ected in the predictive features { the \hidden context&qu=-=ot; [22-=-] { can change without warning. This concept drift requires an algorithm that can adjust quickly to changing conditions. One approach to large-scale classication is to improve the storage eciency of t...

The Problem of Concept Drift: Definitions and Related Work

by Alexey Tsymbal , 2004
"... In the real world concepts are often not stable but change with time. Typical examples of this are weather prediction rules and customers' preferences. The underlying data distribution may change as well. Often these changes make the model built on old data inconsistent with the new data, and r ..."
Abstract - Cited by 110 (5 self) - Add to MetaCart
In the real world concepts are often not stable but change with time. Typical examples of this are weather prediction rules and customers' preferences. The underlying data distribution may change as well. Often these changes make the model built on old data inconsistent with the new data, and regular updating of the model is necessary. This problem, known as concept drift, complicates the task of learning a model from data and requires special approaches, different from commonly used techniques, which treat arriving instances as equally important contributors to the final concept. This paper considers different types of concept drift, peculiarities of the problem, and gives a critical review of existing approaches to the problem.

Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift

by Jeremy Z. Kolter, Marcus A. Maloof , 2003
"... Algorithms for tracking concept drift are important for many applications. We present a general method based on the Weighted Majority algorithm for using any on-line learner for concept drift. Dynamic Weighted Majority (DWM) maintains an ensemble of base learners, predicts using a weighted-majority ..."
Abstract - Cited by 91 (0 self) - Add to MetaCart
Algorithms for tracking concept drift are important for many applications. We present a general method based on the Weighted Majority algorithm for using any on-line learner for concept drift. Dynamic Weighted Majority (DWM) maintains an ensemble of base learners, predicts using a weighted-majority vote of these "experts", and dynamically creates and deletes experts in response to changes in performance. We empirically evaluated two experimental systems based on the method using incremental naive Bayes and Incremental Tree Inducer (ITI) as experts.
(Show Context)

Citation Context

...coping with concept drift [1, 2, 4–6, 9–14]. The stagger algorithm [1] was the first designed expressly for concept drift, as were many of the algorithms that followed, such as flora2, flora3, flo=-=ra4 [10]-=-, aq-pm [6], aq11-pm [12, 13], and aq11-pm-wah [14]. stagger [1] uses a probabilistic concept description, so it responds to drift by adjusting counts and weights. All of the other methods learn rules...

Learning with drift detection

by João Gama, Pedro Medas, Gladys Castillo, Pedro Rodrigues - In SBIA Brazilian Symposium on Artificial Intelligence , 2004
"... Abstract. Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of ch ..."
Abstract - Cited by 91 (7 self) - Add to MetaCart
Abstract. Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to control the online error-rate of the algorithm. The training examples are presented in sequence. When a new training example is available, it is classified using the actual model. Statistical theory guarantees that while the distribution is stationary, the error will decrease. When the distribution changes, the error will increase. The method controls the trace of the online error of the algorithm. For the actual context we define a warning level, and a drift level. A new context is declared, if in a sequence of examples, the error increases reaching the warning level at example kw, and the drift level at example kd. This is an indication of a change in the distribution of the examples. The algorithm learns a new model using only the examples since kw. The method was tested with a set of eight artificial datasets and a real world dataset. We used three learning algorithms: a perceptron, a neural network and a decision tree. The experimental results show a good performance detecting drift and with learning the new concept. We also observe that the method is independent of the learning algorithm.
(Show Context)

Citation Context

...ithms on artificial and real datasets. Section 5 concludes the paper and present future work. 2 Tracking Drifting Concepts There are several methods in machine learning to deal with changing concepts =-=[7, 6, 5, 12]-=-. In machine learning drifting concepts are often handled by time windows or weighted examples according to their age or utility. In general, approaches to cope with concept drift can be classified in...

Classifier technology and the illusion of progress. Statist

by David J. Hand - Sci , 2006
"... Abstract. A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to ..."
Abstract - Cited by 85 (2 self) - Add to MetaCart
Abstract. A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.
(Show Context)

Citation Context

... case. The term concept drift is sometimes used to describe changes to the definitions of the classes. See, for example, the special issue of Machine Learning (1998, Vol. 32, No. 2), Widmer and Kubat =-=[46]-=- and Lane and Brodley [30]. The problem of changing class definitions has been examined in [25, 26] and [28]. If the very definitions of the classes may change between designing the classification rul...

Learning Implicit User Interest Hierarchy for Context in Personalization

by Hyoung R. Kim, Philip K. Chan - In Proc. of International Conference on Intelligent User Interface (IUI , 2003
"... To provide a more robust context for personalization, we desire to extract a continuum of general (long-term) to specific (short-term) interests of a user. Our proposed approach is to learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a divisive hierarchical c ..."
Abstract - Cited by 63 (4 self) - Add to MetaCart
To provide a more robust context for personalization, we desire to extract a continuum of general (long-term) to specific (short-term) interests of a user. Our proposed approach is to learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a divisive hierarchical clustering (DHC) algorithm to group words (topics) into a hierarchy where more general interests are represented by a larger set of words. Each web page can then be assigned to nodes in the hierarchy for further processing in learning and predicting interests. This approach is analogous to building a subject taxonomy for a library catalog system and assigning books to the taxonomy. Our approach does not need user involvement and learns the UIH "implicitly." Furthermore, it allows the original objects, web pages, to be assigned to multiple topics (nodes in the hierarchy). In this paper, we focus on learning the UIH from a set of visited pages. We propose a few similarity functions and dynamic threshold-funding methods, and evaluate the resulting hierarchies according to their meaningfulhess and shape.

Classifier ensembles for changing environments

by Ludmila I. Kuncheva - In Multiple Classifier Systems , 2004
"... Abstract. We consider strategies for building classifier ensembles for non-stationary environments where the classification task changes during the operation of the ensemble. Individual classifier models capable of online learning are reviewed. The concept of “forgetting ” is discussed. Online ensem ..."
Abstract - Cited by 55 (1 self) - Add to MetaCart
Abstract. We consider strategies for building classifier ensembles for non-stationary environments where the classification task changes during the operation of the ensemble. Individual classifier models capable of online learning are reviewed. The concept of “forgetting ” is discussed. Online ensembles and strategies suitable for changing environments are summarized.
(Show Context)

Citation Context

...ent illumination are fed to the system, the class descriptions might change so as to make the system worse than a random guess. The type of changes can be roughly summarized as follows – Random noise =-=[1, 23]-=- – Random trends (gradual changes) [13] – Random substitutions (abrupt changes) [23] – Systematic trends (“recurring contexts”) [23] 2 The term SPAM is coined to denote unsolicited e-mail, usually of ...

Extracting Hidden Context

by Michael Harries, Claude Sammut, Kim Horn , 1997
"... Concept drift due to hidden changes in context complicates learning in many domains including financial prediction, medical diagnosis, and network performance. Existing machine learning approaches to this problem use an incremental learning, on-line paradigm. Batch, offline learners tend to be i ..."
Abstract - Cited by 50 (2 self) - Add to MetaCart
Concept drift due to hidden changes in context complicates learning in many domains including financial prediction, medical diagnosis, and network performance. Existing machine learning approaches to this problem use an incremental learning, on-line paradigm. Batch, offline learners tend to be ineffective in domains with hidden changes in context as they assume that the training set is homogeneous. An offline, meta-learning approach for the identification of hidden context is presented. The new approach uses an existing batch learner and the process of contextual clustering to identify stable hidden contexts and the associated context specific, locally stable concepts. The approach is broadly applicable to the extraction of context reflected in time and spacial attributes. Several algorithms for the approach are presented and evaluated. A successful application of the approach to a complex control task is also presented.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University