MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Using Machine Learning to Analyze Biological Macromolecular Crystallization Data

Download:
Download as a PDF | Download as a PS
by Vanathi Gopalakrishnan, Daniel Hennessy, Bruce Buchanan, John Rosenberg, Departments Of Biological Sciences, Devika Subramanian
http://www.cs.pitt.edu/~vanathi/journal.ps
Add To MetaCart

Abstract:

The crystallization of a new macromolecule is still very much a trial and error process. In an effort to uncover useful trends in the crystallization of new macromolecules, Samudzi, Fivash and Rosenberg[12] performed a cluster analysis on the Biological Macromolecule Crystallization Database(BMCD)[7]. The crystallization parameters that were studied in order to differentiate among the experiments were a subset of the BMCD parameters: pH, temperature, molecular weight, macromolecular concentration, precipitant type and crystallization method. Samudzi et al. performed a purely statistical analysis of the data, and reported the clusters by eye-balling the results. We have attempted to recreate their clusters using two different methods- SAS clustering (same as Samudzi's) and COBWEB (a machine learning and discovery program). We then applied RL, an inductive learning program, to the discovered clusters from each of the methods, and verified as well as expanded on the Samudzi results. Apart from using clusters as the data input to RL, we also used RL on the entire BMCD data in an attempt to learn interesting correlations among the various crystallization parameters. From the point of view of crystallography, we have discovered possibly significant new empirical relationships. From a machine learning perspective, our work has led to the refinement of existing methods for incorporating detailed domain knowledge into inductive analysis techniques. In this paper we report these initial experiments and findings from applying RL to the BMCD as well as the Samudzi and COBWEB clusters.

Citations

376 Cluster Analysis for Applications – Anderberg - 1973
212 AutoClass: A Bayesian classification system – Cheeseman, Kelly, et al. - 1988
207 Learning from observation: Conceptual Clustering – Michalski, Stepp - 1983
44 RL4: A tool for knowledge-based induction – Clearwater - 1990
22 Inductive policy – Provost - 1992
10 The Biological Macromolecule Crystallization Database – Gilliland, Tung, et al. - 2001
6 Readings in Knowledge Acquisition and Learning: Automating Construction and Improvement of Expert Systems – Buchanan, Wilkins - 1993
5 Inductive strengthening: The effects of a simple heuristic for restricting hypothesis space search – Provost, Buchanan - 1992
5 Cluster analysis of the biological macromolecular crystallization database – Samudzi, Fivash, et al. - 1992
4 DENDRAL and Meta-Dendral: Roots of knowledge systems and expert system applications – Feigenbaum, Buchanan - 1993
2 The use of a knowledge-based program to predict chemical carcinogenesis in rodents – Ambrosino, Lee, et al. - 1993
2 Hydrogen ion buffers for biologica research. Biochemistry 5 – Good, Winget, et al. - 1966
2 Guide: Statistics, Version 5 edition – User's - 1985