Results 1 
7 of
7
A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems
, 2006
"... In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system’s functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system’s functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the model of the system. Since the state of the system is constantly changing, it is necessary to keep the models uptodate. Computing global data mining models e.g. decision trees, kmeans clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its kmeans clustering. The theoretical claims are corroborated with a thorough experimental analysis.
Distributed Data Mining for Sustainable Smart Grids
"... Electric power infrastructure is rapidly running up against oversized growth, scale and efficiency. Electricity production, distribution and consumption play a critical role in the sustainability of the planet and its natural resources. Smart Grids which enable twoway communication and monitoring ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Electric power infrastructure is rapidly running up against oversized growth, scale and efficiency. Electricity production, distribution and consumption play a critical role in the sustainability of the planet and its natural resources. Smart Grids which enable twoway communication and monitoring between producers and endusers need novel computational algorithms for supporting generation of power from wide range of sources, efficient energy distribution, and sustainable consumption. This paper explores fundamentally distributed approaches with more local flexibility leading to sustainable methodology compared to the traditional centralized frameworks for analyzing and processing data. The paper consider the problems of aggregation and prediction of power generation and consumption trends over a distributed smart grid. The need for more local control, privacy issues, and cost sensitivity for transmission of remote sensory data over the lowbandwidth wireless network is leading toward more distributed approach to data analysis in smart grids. This paper reviews our recent work on more sustainable distributed asynchronous methodology for constructing energy demand prediction models in a smart grid by multivariate linear regression as well a dynamic pricing model built on distributed rank aggregation that will help shape power consumption and optimize the grid.
REFERENCES
"... Das et al. (2013) describe a modification of the general orthogonal regression (GOR) method (Fuller, 1987) applied to the magnitude conversion problem that the same authors have already published at least twice in other journals (Das et al., 2012; Wason et al., 2012). Unfortunately, as more exhaust ..."
Abstract
 Add to MetaCart
(Show Context)
Das et al. (2013) describe a modification of the general orthogonal regression (GOR) method (Fuller, 1987) applied to the magnitude conversion problem that the same authors have already published at least twice in other journals (Das et al., 2012; Wason et al., 2012). Unfortunately, as more exhaustively discussed in our comment to Wason et al. (2012), published by the Geophysical Journal International (Gasperini and Lolli, 2014), some assumptions made by the authors are wrong and therefore their method has to be rejected. The main mistake made by them consists of assuming as goodnessoffit statistics the simple standard deviation between observed and computed Mw estimates, whereas such statistics are only valid if the errors in the independent variable (mb) are negligible. The correct statistics to evaluate the goodness of fit of
Monitoring Least Squares Models of Distributed Streams
"... Least squares regression is widely used to understand and predict data behavior in many fields. As data evolves, regression models must be recomputed, and indeed much work has focused on quick, efficient and accurate computation of linear regression models. In distributed streaming settings, howev ..."
Abstract
 Add to MetaCart
(Show Context)
Least squares regression is widely used to understand and predict data behavior in many fields. As data evolves, regression models must be recomputed, and indeed much work has focused on quick, efficient and accurate computation of linear regression models. In distributed streaming settings, however, periodically recomputing the global model is wasteful: communicating new observations or model updates is required even when the model is, in practice, unchanged. This is prohibitive in many settings, such as in wireless sensor networks, or when the number of nodes is very large. The alternative, monitoring prediction accuracy, is not always sufficient: in some settings, for example, we are interested in the model’s coefficients, rather than its predictions. We propose the first monitoring algorithm for multivariate regression models of distributed data streams that guarantees a bounded model error. It maintains an accurate estimate using a fraction of the communication by recomputing only when the precomputed model is sufficiently far from the (hypothetical) current global model. When the global model is stable, no communication is needed. Experiments on real and synthetic datasets show that our approach reduces communication by up to two orders of magnitude while providing an accurate estimate of the current global model in all nodes.
Automated Change Detection and Reactive Clustering in Multivariate Streaming Data
"... Abstract. Many automated systems need the capability of automatic change detection without the given detection threshold. This paper presents an automated change detection algorithm in streaming multivariate data. Two overlapping windows are used to quantify the changes. While a window is used as ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Many automated systems need the capability of automatic change detection without the given detection threshold. This paper presents an automated change detection algorithm in streaming multivariate data. Two overlapping windows are used to quantify the changes. While a window is used as the reference window from which the clustering is created, the other called the current window captures the newly incoming data points. A newly incoming data point can be considered a change point if it is not a member of any cluster. As our clusteringbased change detector does not require detection threshold, it is an automated detector. Based on this change detector, we propose a reactive clustering algorithm for streaming data. Our empirical results show that, our clusteringbased change detector works well with multivariate streaming data. The detection accuracy depends on the number of clusters in the reference window, the window width. 1
1A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems
"... Abstract — In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system’s functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We re ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system’s functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the model of the system. Since the state of the system is constantly changing, it is necessary to keep the models uptodate. Computing global data mining models e.g. decision trees, kmeans clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its kmeans clustering. The theoretical claims are corroborated with a thorough experimental analysis. I.