• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Knowledge Discovery from Data Streams, Chapman and Hall, (2010)

by J Gama
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 47
Next 10 →

Mining Big Data: Current Status, and Forecast to the Future ABSTRACT

by Wei Fan
"... Big Data is a new term used to identify the datasets that due to their large size and complexity, we can not manage them with our current methodologies or data mining software tools. Big Data mining is the capability of extracting useful information from these large datasets or streams of data, that ..."
Abstract - Cited by 18 (0 self) - Add to MetaCart
Big Data is a new term used to identify the datasets that due to their large size and complexity, we can not manage them with our current methodologies or data mining software tools. Big Data mining is the capability of extracting useful information from these large datasets or streams of data, that due to its volume, variability, and velocity, it was not possible before to do it. The Big Data challenge is becoming one of the most exciting opportunities for the next years. We present in this issue, a broad overview of the topic, its current status, controversy, and forecast to the future. We introduce four articles, written by influential scientists in the field, covering the most interesting and state-of-the-art topics on Big Data mining. 1.
(Show Context)

Citation Context

...is important that the Big Data mining techniques should be able to adapt and in some cases to detect change first. For example, the data stream mining field has very powerful techniques for this task =-=[13]-=-. • Compression: Dealing with Big Data, the quantity of space needed to store it is very relevant. There are two main approaches: compression where we don’t loose anything, or sampling where we choose...

Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm

by Dariusz Brzezinski, Jerzy Stefanowski
"... Abstract—Data stream mining has been receiving increasing attention due to its presence in a wide range of applications such as sensor networks, banking, and telecommunication. One of the most important challenges in learning from data streams is reacting to concept drift, i.e., unforeseen changes o ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Abstract—Data stream mining has been receiving increasing attention due to its presence in a wide range of applications such as sensor networks, banking, and telecommunication. One of the most important challenges in learning from data streams is reacting to concept drift, i.e., unforeseen changes of the stream’s underlying data distribution. Several classification algorithms that cope with concept drift have been put forward, however, most of them specialize in one type of change. In this paper, we propose a new data stream classifier, called the Accuracy Updated Ensemble (AUE2), which aims at reacting equally well to different types of drift. AUE2 combines accuracy-based weighting mechanisms known from block-based ensembles with the incremental nature of Hoeffding Trees. The proposed algorithm was experimentally compared with 11 state-of-the-art stream methods, including single classifiers, block-based and online ensembles, and hybrid approaches in different drift scenarios. Out of all the compared algorithms, AUE2 provided best average classification accuracy while proving to be less memory consuming than other ensemble approaches. Experimental results show that AUE2 can be considered suitable for scenarios involving many types of drift as well as static environments. Index Terms—concept drift, data stream mining, ensemble classifier, nonstationary environments I.
(Show Context)

Citation Context

...gorithms work in dynamic environments, where data are continuously generated. Sensor networks, monitoring, traffic management, telecommunication, or web log analysis are examples of such applications =-=[1]-=-. In these dynamic environments, incoming data form a data stream characterized by huge volumes of instances and rapid arrival-rate which often requires quick, real-time response. Compared to static e...

Accuracy Updated Ensemble for Data Streams with Concept Drift

by Dariusz Brzeziński, Jerzy Stefanowski
"... Abstract. In this paper we study the problem of constructing accurate block-based ensemble classifiers from time evolving data streams. AWE is the best-known representative of these ensembles. We propose a new algorithm called Accuracy Updated Ensemble (AUE), which extends AWE by using online compon ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Abstract. In this paper we study the problem of constructing accurate block-based ensemble classifiers from time evolving data streams. AWE is the best-known representative of these ensembles. We propose a new algorithm called Accuracy Updated Ensemble (AUE), which extends AWE by using online component classifiers and updating them according to the current distribution. Additional modifications of weighting functions solve problems with undesired classifier excluding seen in AWE. Experiments with several evolving data sets show that, while still requiring constant processing time and memory, AUE is more accurate than AWE. 1
(Show Context)

Citation Context

...classifiers. On the other hand, a new type of problems is becoming more visible, one in which learning algorithms work in dynamic environments with data continuously generated in the form of a stream =-=[1]-=-. Processing data streams implies new requirements concerning limited amount of memory, small processing time, and one scan of incoming data. Moreover, the data distributions and definitions of target...

Combining block-based and online methods in learning ensembles from concept drifting data streams,”

by Dariusz Brzezinski , Jerzy Stefanowski - Information Sciences, , 2014
"... Abstract Most stream classifiers are designed to process data incrementally, run in resource-aware environments, and react to concept drifts, i.e., unforeseen changes of the stream's underlying data distribution. Ensemble classifiers have become an established research line in this field, main ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Abstract Most stream classifiers are designed to process data incrementally, run in resource-aware environments, and react to concept drifts, i.e., unforeseen changes of the stream's underlying data distribution. Ensemble classifiers have become an established research line in this field, mainly due to their modularity which offers a natural way of adapting to changes. However, in environments where class labels are available after each example, ensembles which process instances in blocks do not react to sudden changes sufficiently quickly. On the other hand, ensembles which process streams incrementally, do not take advantage of periodical adaptation mechanisms known from block-based ensembles, which offer accurate reactions to gradual and incremental changes. In this paper, we analyze if and how the characteristics of block and incremental processing can be combined to produce new types of ensemble classifiers. We consider and experimentally evaluate three general strategies for transforming a block ensemble into an incremental learner: online component evaluation, the introduction of an incremental learner, and the use of a drift detector. Based on the results of this analysis, we put forward a new incremental ensemble classifier, called Online Accuracy Updated Ensemble, which weights component classifiers based on their error in constant time and memory. The proposed algorithm was experimentally compared with four state-of-the-art online ensembles and provided best average classification accuracy on real and synthetic datasets simulating different drift scenarios.
(Show Context)

Citation Context

...er quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. DOI: http://dx.doi.org/10.1016/j.ins.2013.12.011 mechanisms in order to adjust to changing environments. Moreover, depending on the rate of these changes, concept drifts are usually divided into sudden or gradual ones, both of which require different reactions [34]. As standard data mining algorithms are not capable of dealing with concept drifts and rigorous processing requirements posed by data streams, several new techniques have been proposed [14, 24]. Out of many algorithms proposed to tackle evolving data streams, ensemble methods play an important role. Due to their modularity, they provide a natural way of adapting to change by modifying their structure, either by retraining ensemble members, replacing old component classifiers with new ones, or updating rules for aggregating component predictions [23]. Current adaptive ensembles can be further divided into block-based and online approaches [14]. Block-based approaches are designed to work in environments were examples arrive in portions, called blocks or chunks. Most block ensembles p...

SMM: a data stream management system for knowledge discovery

by Hetal Thakkar, Nikolay Laptev, Hamid Mousavi, Barzan Mozafari, Vincenzo Russo, Carlo Zaniolo - IN PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE , 2011
"... The problem of supporting data mining applications proved to be difficult for database management systems and it is now proving to be very challenging for data stream management systems (DSMSs), where the limitations of SQL are made even more severe by the requirements of continuous queries. The ma ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
The problem of supporting data mining applications proved to be difficult for database management systems and it is now proving to be very challenging for data stream management systems (DSMSs), where the limitations of SQL are made even more severe by the requirements of continuous queries. The major technical advances that achieved separately on DSMSs and on data stream mining algorithms have failed to converge and produce powerful data stream mining systems. Such systems, however, are essential since the traditional pullbased approach of cache mining is no longer applicable, and the push-based computing mode of data streams and their bursty traffic complicate application development. For instance, to write mining applications with quality of service (QoS) levels approaching those of DSMSs, a mining analyst would have to contend with many arduous tasks, such as support for data
(Show Context)

Citation Context

...and DSMS was discussed in the introduction. Here we focus on data stream mining algorithms. On-line data stream mining has been the focus of many research efforts, and a recent review can be found in =-=[22]-=-. For instance, Ester et al. [15] proposed extending a static clustering algorithm, namely DBScan, for continuous clustering of data streams. Similarly, there have been efforts to build online classif...

Mining Big Data in Real Time

by Albert Bifet - Informatica , 2013
"... Abstract Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are co ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.
(Show Context)

Citation Context

...orks, measurements in network monitoring and traffic management, log records or click-streams in web exploring, manufacturing processes, call detail records, email, blogging, twitter posts and others =-=[17]-=-. In fact, all data generated can be considered as streaming data or as a snapshot of streaming data, since it is obtained from an interval of time. In the data stream model, data arrive at high speed...

Control-flow Discovery from Event Streams

by Andrea Burattin, Alessandro Sperduti, Wil M. P. van der Aalst - IN PROCEEDINGS OF THE IEEE CONGRESS ON EVOLUTIONARY COMPUTATION. IEEE , 2014
"... Process Mining represents an important research field that connects Business Process Modeling and Data Mining. One of the most prominent task of Process Mining is the discovery of a control-flow starting from event logs. This paper focuses on the important problem of control-flow discovery starting ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Process Mining represents an important research field that connects Business Process Modeling and Data Mining. One of the most prominent task of Process Mining is the discovery of a control-flow starting from event logs. This paper focuses on the important problem of control-flow discovery starting from a stream of event data. We propose to adapt Heuristics Miner, one of the most effective control-flow discovery algorithms, to the treatment of streams of event data. Two adaptations, based on Lossy Counting and Lossy Counting with Budget, as well as a sliding window based version of Heuristics Miner, are proposed and experimentally compared against both artificial and real streams. Experimental results show the effectiveness of control-flow discovery algorithms for streams on artificial and real datasets.
(Show Context)

Citation Context

...) stream “concepts” (i.e. models generating data) are assumed to be stationary or evolving [15]. The task of mining data streams is typically focused on specific types of algorithms [16], [15], [13], =-=[17]-=-. In particular, techniques have been developed for clustering, classification, frequency counting, time series analysis, and changes diagnosis (concept drift detection). The remainder of this paper i...

Applying neural networks for concept drift detection in financial markets

by Bruno Silva, Nuno Marques, Gisele Panosso - In ECAI2012, Ubiquitous Data Mining Workshop , 2012
"... Abstract. Traditional stock market analysis is based on the assumption of a stationary market behavior. The recent financial crisis was an example of the inappropriateness of such assumption, namely by detecting the presence of much higher variations than what would normally be expected by tradition ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Abstract. Traditional stock market analysis is based on the assumption of a stationary market behavior. The recent financial crisis was an example of the inappropriateness of such assumption, namely by detecting the presence of much higher variations than what would normally be expected by traditional models. Data stream methods present an alternative for modeling the vast amounts of data arriving each day to a financial analyst. This paper discusses the use of a framework based on an artificial neural network that continuously monitors itself and allows the implementation on a multivariate financial non-stationary model of market behavior. An initial study is performed over ten years of the Dow Jones Industrial Average index (DJI), and shows empirical evidence of concept drift in the multivariate financial statistics used to describe the index data stream. 1
(Show Context)

Citation Context

...tationary, i.e., the target concept may change over time. Concept drift means that the concept about which data is being collected may shift from time to time, each time after some minimum permanence =-=[6]-=-. In this paper we address the detection and analysis of concept drift in financial markets by employing a methodology based on Artificial Neural Networks (ANN). ANN are a set of biologically inspired...

A Framework for Time-aware Recommendations

by Kostas Stefanidis, Irene Ntoutsi, Hans-peter Kriegel
"... Abstract. Recently, recommendation systems have received significant attention. However, most existing approaches focus on recommending items of potential interest to users, without taking into consideration how temporal information influences the recommendations. In this paper, we argue that time-a ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract. Recently, recommendation systems have received significant attention. However, most existing approaches focus on recommending items of potential interest to users, without taking into consideration how temporal information influences the recommendations. In this paper, we argue that time-aware recommendations need to be pushed in the foreground. We introduce an extensive model for time-aware recommendations from two perspectives. From a fresh-based perspective, we propose using a suite of aging schemes towards making recommendations mostly depend on fresh and novel user preferences. From a context-based perspective, we focus on providing different suggestions under different temporal specifications. The proposed strategies are experimentally evaluated using real movies ratings. 1
(Show Context)

Citation Context

...e popularity of the items themselves, fresh-based recommendation approaches care for suggesting items taking mainly into account recent and novel user preferences. Driven by the work in stream mining =-=[10]-=-, we use different types of aging mechanisms to define the way that the historical information (in form of ratings) is incorporated in the recommendation process. Aging in streams is typically impleme...

Dynamic Visual Analytics – Facing the Real-Time Challenge

by Florian Mansmann, Fabian Fischer, Daniel A. Keim
"... Abstract. Modern communication infrastructures enable more and more information to be available in real-time. While this has proven to be use-ful for very targeted pieces of information, the human capability to pro-cess larger quantities of mostly textual information is definitely limited. Dynamic v ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Abstract. Modern communication infrastructures enable more and more information to be available in real-time. While this has proven to be use-ful for very targeted pieces of information, the human capability to pro-cess larger quantities of mostly textual information is definitely limited. Dynamic visual analytics has the potential to circumvent this real-time information overload by combining incremental analysis algorithms and visualizations to facilitate data stream analysis and provide situational awareness. In this book chapter we will thus define dynamic visual ana-lytics, discuss its key requirements and present a pipeline focusing on the integration of human analysts in real-time applications. To validate this pipeline, we will demonstrate its applicability in a real-time monitoring scenario of server logs.
(Show Context)

Citation Context

...es in the derived models and as a result of high data volumes it might no longer be possible to process the data efficiently more than once. In addition to this edited book, Gama’s comprehensive book =-=[7]-=- focuses on a number of stream mining solutions based on adaptive learning algorithms. Thereby the set of examples is not only incremented for a given learning algorithm, but also outdated examples ar...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University