Results 1  10
of
25
Distributed Data Mining: Algorithms, Systems, and Applications
, 2002
"... This paper presents a brief overview of the DDM algorithms, systems, applications, and the emerging research directions. The structure of the paper is organized as follows. We first present the related research of DDM and illustrate data distribution scenarios. Then DDM algorithms are reviewed. Subs ..."
Abstract

Cited by 70 (5 self)
 Add to MetaCart
This paper presents a brief overview of the DDM algorithms, systems, applications, and the emerging research directions. The structure of the paper is organized as follows. We first present the related research of DDM and illustrate data distribution scenarios. Then DDM algorithms are reviewed. Subsequently, the architectural issues in DDM systems and future directions are discussed
Privacypreserving computation of Bayesian networks on vertically partitioned data
 IEEE Transactions on Data Knowledge Engineering
"... Abstract—Traditionally, many data mining techniques have been designed in the centralized model in which all data is collected and available in one central site. However, as more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data store ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Traditionally, many data mining techniques have been designed in the centralized model in which all data is collected and available in one central site. However, as more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data stored by business, governments, and other parties increases. Different parties often wish to benefit from cooperative use of their data, but privacy regulations and other privacy concerns may prevent the parties from sharing their data. Privacypreserving data mining provides a solution by creating distributed data mining algorithms in which the underlying data need not be revealed. In this paper, we present privacypreserving protocols for a particular data mining task: learning a Bayesian network from a database vertically partitioned among two parties. In this setting, two parties owning confidential databases wish to learn the Bayesian network on the combination of their databases without revealing anything else about their data to each other. We present an efficient and privacypreserving protocol to construct a Bayesian network on the parties ’ joint data. Index Terms—Data privacy, Bayesian networks, privacypreserving data mining. 1
Learning bayesian network structure from distributed data
 In Proceedings of the 3rd SIAM International Data Mining Conference
, 2003
"... We propose a collective method to address the problem of learning the structure of a Bayesian network from a distributed heterogeneous data sources. In this case, the dataset is distributed among several sites, with different features at each site. The collective method has four steps: local learnin ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
We propose a collective method to address the problem of learning the structure of a Bayesian network from a distributed heterogeneous data sources. In this case, the dataset is distributed among several sites, with different features at each site. The collective method has four steps: local learning, sample selection, cross learning, and combination of the results. The parents of local nodes can be correctly identified in local learning. The main task of cross learning is to identify the links whose vertices are in different sites (cross links). This is done by transmitting a small subset of samples from each local site to a central site. The combination step involves removing extra links from local Bayesian networks that may be introduced during local learning due to the well known hidden variable problem. The sample selection step selects samples, based on a likelihood criterion, that are possibly evidence of cross links. The overall procedure is called collective learning. Experimental results verify that, for sparsely connected networks, the collective learning method can learn the same structure as that obtained by a centralized learning method (which simply aggregates data from all local sites into a single site). 1
Multirelational classification: a multiple view approach,”
 Knowledge and Information Systems,
, 2008
"... Abstract Multirelational classification aims at discovering useful patterns across multiple interconnected tables (relations) in a relational database. Many traditional learning techniques, however, assume a single table or a flat file as input (the socalled propositional algorithms). Existing mu ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Abstract Multirelational classification aims at discovering useful patterns across multiple interconnected tables (relations) in a relational database. Many traditional learning techniques, however, assume a single table or a flat file as input (the socalled propositional algorithms). Existing multirelational classification approaches either "upgrade" mature propositional learning methods to deal with relational presentation or extensively "flatten" multiple tables into a single flat file, which is then solved by propositional algorithms. This article reports a multiple view strategywhere neither "upgrading" nor "flattening" is requiredfor mining in relational databases. Our approach learns from multiple views (feature set) of a relational databases, and then integrates the information acquired by individual view learners to construct a final model. Our empirical studies show that the method compares well in comparison with the classifiers induced by the majority of multirelational mining systems, in terms of accuracy obtained and running time needed. The paper explores the implications of this finding for multirelational research and applications. In addition, the method has practical significance: it is appropriate for directly mining many realworld databases.
Efficient PeertoPeer Belief Propagation ⋆
"... Abstract. In this paper, we will present an efficient approach for distributed inference. We use belief propagation’s messagepassing algorithm on top of a DHT storing a Bayesian network. Nodes in the DHT run a variant of the spring relaxation algorithm to redistribute the Bayesian network among the ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Abstract. In this paper, we will present an efficient approach for distributed inference. We use belief propagation’s messagepassing algorithm on top of a DHT storing a Bayesian network. Nodes in the DHT run a variant of the spring relaxation algorithm to redistribute the Bayesian network among them. Thereafter correlated data is stored close to each other reducing the message cost for inference. We simulated our approach in Matlab and show the message reduction and the achieved load balance for random, treeshaped, and scalefree Bayesian networks of different sizes. As possible application, we envision a distributed software knowledge base maintaining encountered software bugs under users ’ system configurations together with possible solutions for other users having similar problems. Users would not only be able to repair their system but also to foresee possible problems if they would install software updates or new applications. 1
PrivacyPreserving Bayesian Network Learning Using Post Randomization, (in preparation
, 2006
"... Abstract — In this paper, we propose a post randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a privacy sensitive fashion. In this case, two or more parties own sensitive data but want to learn a Bayesian network from the combined data. We consider both ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we propose a post randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a privacy sensitive fashion. In this case, two or more parties own sensitive data but want to learn a Bayesian network from the combined data. We consider both structure and parameter learning for the BN. The only required information from the data set is a set of sufficient statistics for learning both network structure and parameters. The proposed method estimates the sufficient statistics from the randomized data. The estimated sufficient statistics are then used to learn a BN. For structure learning, we face the familiar extralink problem since estimation errors tend to break the conditional independence among the variables. We propose modifications of score functions used for BN learning, to solve this problem. We show both theoretically and experimentally that post randomization is an efficient, flexible, and easytouse method to learn Bayesian network from privacy sensitive data.
www.ijseat.com Page 415 Data Mining with Big Data Using HACE Theorem
, 2015
"... Abstract — The term Big Data comprises largevolume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, b ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — The term Big Data comprises largevolume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This datadriven model involves demanddriven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the datadriven model and also in the Big Data revolution.
246 Conference on Data Mining  DMIN'06  PrivacyPreserving Bayesian Network Learning From Heterogeneous Distributed Data
"... Abstract — In this paper, we propose a post randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a privacy sensitive fashion. In this case, two or more parties own sensitive data but want to learn a Bayesian network from the combined data. We consider both ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — In this paper, we propose a post randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a privacy sensitive fashion. In this case, two or more parties own sensitive data but want to learn a Bayesian network from the combined data. We consider both structure and parameter learning for the BN. The only required information from the data set is a set of sufficient statistics for learning both network structure and parameters. The proposed method estimates the sufficient statistics from the randomized data. The estimated sufficient statistics are then used to learn a BN. For structure learning, we face the familiar extralink problem since estimation errors tend to break the conditional independence among the variables. We propose modifications of score functions used for BN learning, to solve this problem. We show both theoretically and experimentally that post randomization is an efficient, flexible, and easytouse method to learn Bayesian network from privacy sensitive data.