| R. D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(S5):1031-1040, 1996. |
....CProgol4.1 was rst described in [8] Since then a number of advances have been made over the original CProgol4.1 in systems such as CProgol4.2 [9] PProgol2. 1 and PProgol2.2. The development of these systems has been informed by feedback from experiments on a variety of real world applications [3, 6, 5, 14, 2]. This chapter describes the theory and use of CProgol4.4, a publicly distributed version of the Progol family of ILP systems. In order to follow the examples in this chapter, it is assumed that the reader is familiar with the aims of ILP, Horn clause logic and Prolog notation for clauses. It is ....
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental ealth Perspectives, 104(5):1031-1040, 1996.
....theories is not gained at the expense of predictive accuracy. This work is also much in the spirit of studies comparing various methods (FOIL vs. PROGOL [14] propositional learning vs. relational learning [15] in the domain of mutagenicity. However, the closest work in the literature is [7], which reports the application of the ILP algorithm PROGOL to one of the databases also used in this study. 3 Description of the Data In this section we describe the datasets used in our experiments as is , without the data engineering steps to define the learning problems. The next step ....
....our experiments as is , without the data engineering steps to define the learning problems. The next step dealing with the precise definition of the learning problems will be documented in the subsequent section. Our starting point are two databases: The first one, provided by King and Srinivasan [7](abbreviated by K S) contains information about the carcinogenicity of 330 compounds, as classified by the NIEHS. The second database, the Carcinogenic Potency Database (CPD) 4] is provided by Gold and co workers, and contains information about bioassays including the species, the strain and the ....
[Article contains additional citation context not shown here]
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 1997.
.... apparently ILP algorithms have not yet been adopted by tool vendors offering tools for Knowledge Discovery and Data Mining (KDD) We thus think that one way to improve the acceptance 1 There have been, nevertheless, several successful applications of ILP algorithms to realworld problems, e.g. [20, 10, 13, 15]. could be to devise easy to use methods for the specification of declarative language bias for relational application domains. In particular, generating or compiling a low level specification from an abstract high level specification appears to be an option. In fact, this has been hinted at in ....
....by the algorithm. Secondly, the algorithm, by design, always fulfills the type constraints for the assignment of variables. This also holds with respect to meta types. We also did some initial validation of the approach: the algorithm has been applied to several chemical domains ( 9] 20] [13]) using all graph based metaschemata (see appendix D) Subsequently, we compared the results generated by the algorithm with the manually engineered declarative language bias. Not surprisingly, there were only a few observed differences. Some of the generated schemata were not included in the ....
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 1997.
....we are trying to predict the half rate of surface water aerobic aqueous biodegradation in hours (Dzeroski and Kompare 1995) The class to learn is whether this quantity exceeds a certain threshold. The dataset contains 62 chemicals, and we performed 6 fold cross validation in our tests. The third database, provided by King and Srinivasan (King and Srinivasan 1997), contains information about the carcinogenicity of 330 compounds, as classified by the National Institute of Environmental Health Sciences (NIEHS) The NIEHS has classified these chemicals as non carcinogenic, equivocal and carcinogenic. Here, we performed 5 fold cross validation. Note that the ....
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 1997.
....is comparable to the accuracy of other computational methods. However, alternative techniques do not produce a structural model that one can use to visualize spatial relations and thus to posit the deeper causes of mutation, 2 so that the results justified publication in the chemistry literature (King et al. 1996). As in other applications, the developers aided the discovery process in a number of ways. They chose to formulate the task in terms of finding a classifier that labels chemicals as causing mutation or not, rather than predicting levels of mutagenicity. King et al. also presented their system ....
....in that they made clear contact with chemical concepts, the authors aided their interpretation by presenting graphical depictions of their structural claims. Similar interventions have been used by the developers on related scientific problems, including prediction of carcinogenicity (King Srinivasan, 1996) and pharmacophore discovery (Finn, Muggleton, Page, Srinivasan, 1998) 2 This task does not actually involve structural modeling in the sense discussed in Section 2, since the structures are generalizations from observed data rather than combinations of unobserved entities posited to explain ....
King, R. D., & Srinivasan, A. (1996). Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104 (Supplement 5), 1031--1040.
....to be so by linear regression. Recently a set of rules developed by Progol for predicting carcinogenicity were entered in a global competition run by by the National Toxicology Program (NTP) of the National Institute of Environmental Health Studies in the USA. In the initial results reported in [9] the Progol rule predictions came top out of all systems which were provided with only public data for training. Recent experiments have shown that when Progol s mutagenic rules are added to its other rules derived from the NTP data, the predictive accuracy increases from 64 to 72 , making it the ....
R. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(5), 1996.
....the chance of finding the correct concept during propositionalization is very small. So usually the work is divided by the propositionalization algorithm and by the subsequently applied learning algorithm. 3. 2 Carcinogenicity Domain Next, we performed experiments in the carcinogenicity domain [7]. The database contains information about the carcinogenicity of 330 compounds, as classi 1 So in fact we do not allow for clauses of arbitrary length, but still the bounds used are far too large for considering all clauses up to this length. fied by the US National Institute of Environmental ....
....a propositional representation. The hypothesis language of LINUS is restricted to function free constrained DHDB (deductive hierarchical database) clauses. This implies that no recursion is allowed, and that no new variables may be introduced. 2 The experiment with Progol has been described in [7]. Method Accuracy Default 55.00 Ames Test 63.00 C4.5 prune 58.79 C4.5 rules 60.76 T2 65.00 M5 69.93 FOIL 25.15 Progol 63.00 SRT 72.46 SP C4.5 prune 66.78 Table 1: Quantitative results for the carcinogenicity domain obtained by 5 fold cross validation. DINUS [10] weakens the language ....
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 1997.
....the chance of finding the correct concept during propositionalization is very small. So usually the work is divided by the propositionalization algorithm and by the subsequently applied learning algorithm. 3. 2 Carcinogenicity Domain Next, we performed experiments in the carcinogenicity domain [ King and Srinivasan, 1997 ] The database contains information about the carcinogenicity of 330 compounds, as classified by the US National Institute of Environmental Health Sciences (NIEHS) Chemicals are classified as carcinogenic or not (compounds classified as 1 So in fact we do not allow for clauses of arbitrary ....
....Default 55.00 Ames Test 63.00 C4.5 prune 58.79 C4.5 rules 60.76 T2 65.00 M5 69.93 FOIL 25.15 Progol 63.00 SRT 72.46 SP C4.5 prune 66.78 Table 1: Quantitative results for the carcinogenicity domain obtained by 5 fold cross validation. 2 The experiment with Progol has been described in [King and Srinivasan, 1997]. 4 Related Work In this section we briefly review related work on propositionalization and stochastic search in machine learning and Inductive Logic Programming. LINUS [ Lavrac and Dzeroski, 1994 ] was the first system to transform a relational representation into a propositional ....
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 1997.
....but ILP generated theories tend to be more comprehensible. This work is also much in the spirit of studies comparing various methods (FOIL vs. Progol (Srinivasan, Muggleton, King 1995) propositional learning vs. relational learning (Srinivasan et al. 1996) in the domain of mutagenicity. (King Srinivasan 1997) report on the application of Progol to one of the databases also used here. Description of the Data In this section we describe the datasets used in our experiments as is , without the data engineering steps to define the learning problems. Our starting point are two databases: The first one ....
....report on the application of Progol to one of the databases also used here. Description of the Data In this section we describe the datasets used in our experiments as is , without the data engineering steps to define the learning problems. Our starting point are two databases: The first one (King Srinivasan 1997)(abbreviated by K S) contains information about the carcinogenicity of 330 compounds, as classified by the NIEHS. The second database, the Carcinogenic Potency Database (CPD) Gold 1995) contains information about bioassays including the species, the strain and the sex of the animals, and the ....
[Article contains additional citation context not shown here]
King, R., and Srinivasan, A. 1997. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives.
....as Progol [23] which are designed to allow implicit background knowledge, such as of arithmetic. This representation is completely general for chemical compounds and no special attributes need to be invented. This general representation was applied to the prediction of chemical carcinogenicity [17, 27, 28]. The atom and bond representation could also be easily extended to the representation of 3 Dimensional molecular structure. This can be achieved by simply adding Cartesian coordinates to the atom predicate and coding information about 3 D geometery in the background knowledge as rst suggested ....
....the adavntage of propositional StruQT over standard chemometric 9 programs. This dataset is unsuitable for comparison with ILP StruQT as the molecules all come from the same series (the problem is essentially propositional) The most commonly used ILP SAR database is that of mutagenesis [17]. This dataset is (currently) unsuitable for comparison with ILP StruQT because some of the molecules in this dataset are too large to apply the quantum chemistry programs required to calculate the electron densities. The most suitable available datasets are those of the thermolysin and glycogen ....
R. D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(S5):1031-1040, 1996.
.... between researchers now variously located at the Universities of Edinburgh, Louisville, Oxford, Wales, and York; the Imperial Cancer Research Fund (ICRF) Pfizer UK; and Smith Kline Beecham has resulted in applications of symbolic machine learning to problems in molecular biology and biochemistry [6 14, 23, 26, 31, 33]. Much of this has been accomplished within the setting of Inductive Logic Programming (ILP: see [21] This is an anecdotal account of some practical guidelines that I have found useful during the course of the applied work. That they have had a role to play in my thinking about the biological ....
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(5):1031--1040, 1996.
....structural alerts from chemicals in the NTP data base. In the first instance, predictions from these alerts will be compared against other predictions available for PTE 1. This will be followed by predictions for compounds in PTE 2. 4 A preliminary effort by the ILP system Progol in presented in [14]. The results in this paper subsume these early results as a number of toxicology indicators were unavailable to us at that time. Further details are in Section 4. 4 Carcinogenesis predictions using Progol 4.1 Aims The experiment described here has the following aims. 1. Use the ILP system ....
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(5):1031--1040, 1996.
....at least since the 1980s (for example, MARVIN [Sammut and Banerji, 1986] and MIS [Shapiro, 1983] records of their application to non trivial problems date largely from 1991. Since then, ILP systems have been used to construct predictive models for problems in molecular biology and biochemistry [King et al. 1996, King et al. 1992, Muggleton et al. 1992] electronic circuit diagnosis [Feng, 1992] nite element mesh design in engineering [Dolsak and Muggleton, 1992] environmental data monitoring [Dzeroski et al. 1994] discovery of invariants in software [Bratko and M.Grobelnik, 1993] and natural ....
....beam search that use probabilistic arguments to guide clause search and selection. The investigation is an empirical study of the utility of equipping a particular ILP system Aleph with these methods. Utility is judged by a comparison of (1) the estimated predictive accuracy; 3 Mutagenesis [King et al. 1996] Carcinogenesis [King and Srinivasan, 1996] 0 1000 2000 3000 4000 5000 0 1 2 3 4 Clause length (literals) 0 1000 2000 3000 4000 5000 0 1 2 3 4 Clause length (literals) Search space 0 1 2 3 4 5 0 1 2 3 4 Clause length (literals) 0 1 2 3 4 5 0 1 2 3 4 Clause length ....
[Article contains additional citation context not shown here]
King, R. and Srinivasan, A. (1996). Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(5):1031{ 1040.
....Aberystwyth covery tasks. Early specialised programs (for example, Feigenbaum et al. 1971; Langley et al. 1983 ] have given way to more general purpose ones (for example, Muggleton, 1995; Muggleton and Feng, 1990 ] which have been applied with some success in areas of biochemistry ( King et al. 1996; 1992; Muggleton et al. 1992 ] While the experimental studies reported are preliminary, they have at least one commendable feature, namely, they constitute examples of AI programs participating in true scientific discovery tasks. By true here, we mean problems where existing scientific ....
....i to the theory H and repeats from 1) with examples not covered so far until no more compression is possible. Compression is here defined as the difference, in numbers of descriptors, between E i and D i . 5. 2 Background knowledge The generic atom bond representation used in an earlier study [ King et al. 1996; Srinivasan et al. 1996 ] is used. This consists of two basic relations to represent structure: atom and bond . For example, the fact atom(127,127 1,c,ar c 6 ring, 0.133) states that in compound 127, atom no. 1 is of element carbon, and of type aromatic carbon in a 6 membered ring, and has a ....
[Article contains additional citation context not shown here]
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(5):1031--1040, 1996.
....structural alerts from chemicals in the NTP data base. In the first instance, predictions from these alerts will be compared against other predictions available for PTE 1. This will be followed by predictions for compounds in PTE 2. 1 A preliminary effort by the ILP system Progol in presented in [14]. The results in this paper subsume these early results as a number of toxicology indicators were unavailable to us at that time. Further details are in Section 4. 4 Carcinogenesis predictions using Progol 4.1 Aims The experiment described here has the following aims. 1. Use the ILP system ....
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(5):1031--1040, 1996.
....http: www.comlab.ox.ac.uk oucl groups machlearn PTE . The set we have used contains 337 compounds, 182 (54 ) of which have been classified as carcinogenic and the remaining 155 (46 ) otherwise. Each compound is basically described as a set of atoms and their bond connectivities, as proposed in (King et al. 1996). The atoms of a compound are represented as Datalog facts such as atom(d1,d1 25,h,1,0.327) stating that compound d1 contains atom d1 25 of element h and type 1 with partial charge 0.327. For convenience, we have defined additional view predicates atomel, atomty, and atomch; e.g. atomel(d1,d1 ....
....methyl(C,S) occurs in(A,S) is a pattern representing a carbon atom A that occurs in a methyl structure S within compound C. Related work. Related problems in structure discovery in molecular biology have been considered, e.g. in (Wang et al. 1997; Kramer, Pfahringer, Helma 1997; King et al. 1996; King Srinivasan 1996) Substructure discovery and the utilization of background knowledge have been discussed in (Djoko, Cook, Holder 1995) Discovery of logical patterns, similar to Datalog queries, has been considered in (De Raedt Dehaspe 1997) and in the context of metaqueries (Shen ....
[Article contains additional citation context not shown here]
King, R., and Srinivasan, A. 1996. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives 104(5):1031--1040.
No context found.
R.D. King and A. Srinivasan. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC