MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

 

Download:
Download as a PDF
unknown authors
http://opim.wharton.upenn.edu/~balaji/icdm02.pdf
Add To MetaCart

Abstract:

Many applications are characterized by having naturally incomplete data on customers – where data on only some fixed set of local variables is gathered. However, having a more complete picture can help build better models. The naïve solution to this problem – acquiring complete data for all customers – is often impractical due to the costs of doing so. A possible alternative is to acquire complete data for “some ” customers and to use this to improve the models built. The data acquisition problem is determining how many, and which, customers to acquire additional data from. In this paper we suggest using active learning based approaches for the data acquisition problem. In particular, we present initial methods for data acquisition and evaluate these methods experimentally on web usage data and UCI datasets. Results show that the methods perform well and indicate that active learning based methods for data acquisition can be a promising area for data mining research. 1.

Citations

2138 UCI Repository of Machine Learning Databases – Merz, Murphy - 1996
505 The EM Algorithm and Extensions – McLachlan, Krishnan - 1996
445 Statistical analysis with missing data – Little, Rubin - 1986
261 Active learning with statistical models – Cohn, Ghahramani, et al. - 1995
168 Selective sampling using the query by committee algorithm – Freund, Seung, et al. - 1997
162 Information-based objective functions for active data selection – MacKay - 1992
99 Neural network exploration using optimal experiment design – Cohn - 1994
58 The usefulness of optimum experimental designs – Atkinson - 1996
44 Selecting concise training sets from clean data – Plutowski, White - 1993
23 2001]: ‘Active Learning for Structure in Bayesian Networks – Tong, Koller
17 Multiple imputation for multivariate missing-data problems: a data analyst’s perspective – Schafer, Olsen - 1998
16 Personalization from incomplete data: what you don’t know can hurt – Padmanabhan, Zheng, et al. - 2001
10 Minimizing Statistical Bias with Queries – Cohn - 1997
3 1998, Additive Logistic Regression: A statistical view of Boosting – Hastie, T, et al.
2 Survey Sampling: Theory and Methods – Chaudhuri, Stenger - 1992
1 Working with missing data. Family Science Review – Acock - 1997
1 Active Learning in Neural Networks, working paper in the university of Bielefeld – Hasenjäger, Ritter - 1999
1 Active Learning for – Maytal, Provost - 2001
1 Multiple Imputation for Missing Data – Yuan - 2000