MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  On the Complexity of Rule Discovery from Distributed Data (2005) [1 citations — 0 self]

Download:
pdf
by Martin Scholz
http://www-ai.cs.uni-dortmund.de/DOKUMENTE/scholz_2005e.pdf
Add To MetaCart

Abstract:

This paper analyses the complexity of rule selection for supervised learning in distributed scenarios. The selection of rules is usually guided by a utility measure such as predictive accuracy or weighted relative accuracy. Other examples are support and confidence, known from association rule mining. A common strategy to tackle rule selection from distributed data is to evaluate rules locally on each dataset. While this works well for homogeneously distributed data, this work proves limitations of this strategy if distributions are allowed to deviate. To identify those subsets for which local and global distributions deviate may be regarded as an interesting learning task of its own, explicitly taking the locality of data into account. This task can be shown to be basically as complex as discovering the globally best rules from local data. Based on the theoretical results some guidelines for algorithm design are derived. 1

Citations

357 Fast algorithms for mining association rules in large databases – Agrawal, Srikant - 1994
84 An algorithm for multi-relational discovery of subgroups – Wrobel - 1997
67 Explora: a multipattern and multistrategy discovery assistant – Klosgen - 1996
65 Parallel and distributed association mining: A survey – Zaki - 1999
40 Y.: Improved boosting using confidence-rated predictions – Schapire, Singer - 1999
28 The geometry of ROC space: understanding machine learning metrics through ROC isometrics – Flach
24 An analysis of rule evaluation metrics – Fürnkranz, Flach - 2003
22 Roc ’n’ rule learning – towards a better understanding of covering algorithms – Fürnkranz, Flach - 2005
13 Effect of data skewness in parallel mining of association rules – Cheung, Xiao - 1998
9 Finding the most interesting patterns in a database quickly by using sequential sampling – Scheffer, Wrobel - 2001
7 Boosting algorithms for parallel and distributed learning – Lazarevic, Obradovic - 2002
3 An Efficient Strategy for Mining Exceptions in Multi-databases. Information Sciences – Zhang, Zhang, et al. - 2004