| T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caliguri, C.D. Bloom eld, and E.S. Lander. Molecular Classi cation of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286:531{ 537, 1999. |
.... pro ling studies of di erent types of (tissue) specimens are motivated largely by a desire to create clinical decision support systems for accurate tumour classi cation and to identify robust and reliable targets, biomarkers , for imaging, diagnosis, prognosis and therapeutic intervention [14, 3, 13, 27, 18, 23, 9, 25, 28, 19, 21, 24]. Meeting these biological challenges includes addressing the general statistical problems of classi cation and prediction, and relevant feature identi cation. Support Vector Machines (SVMs) 30, 8] have been employed successfully for cancer classi cation based on transcript pro les [5, 22, ....
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfeld, and E.S. Lander. Molecular classi cation of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531-537, 1999. The data are available at the URL waldo.wi.mit.edu/MPR/data_sets.html.
....supervised methods have been applied to the analysis of cDNA microarrays and high density oligonucleotide chips. These methods include decision trees, Fisher linear discriminant, multi layer perceptrons (MLP) nearest neighbours classi ers, linear discriminant analysis, Parzen windows and others [10, 17, 23, 30, 39]. In particular support sector machines (SVM) are well suited to manage and classify high dimensional data [50] as microarray data usually are, and have been recently applied to the classi cation of normal and malignant tissues using dot product (linear) kernels [22] When we are faced with more ....
....problem in the genomic diagnosis (and in perspective, therapy) of tumors, consists in selecting subsets of genes mostly related to carcinogenic processes. Previous studies used feature ranking methods to select genes that are most correlated or that individually classify best the training data [22, 23, 39]. These methods can o er useful information about single genes, but they assume that the expression patterns of each gene are independent, while usually mRNA levels are coordinately expressed by groups of dependent genes. We propose a simple heuristic approach that takes into account a priori ....
[Article contains additional citation context not shown here]
T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gassenbeek, J. Mesirov, H. Coller, M. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloom eld, and E.S. Lander. Molecular Classi cation of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286:531-537, 1999.
....To investigate the performance of sKDA in gene expression data classi cation we considered two high density oligonucleotide cancer data sets produced using A ymetrix microarrays (both data sets with supplementary information can be obtained from www genome.wi. mit.edu cancer) The rst example [22] is a leukemia data set where the microarray pro les are divided into two groups, 47 cases of acute lymphoblastic leukemia (ALL) and 25 cases of acute myeloid leukemia (AML) The second example [23] consists of pro les of two types of lymphomas, 58 cases of Di use large B cell lymphoma (BLBCL) ....
....large B cell lymphoma (BLBCL) and 19 cases of follicular lymphoma (FL) All pro les include 7129 expression measurements. Both data sets (obtained by A ymetrix software) have undergone a linear scaling to correct for minor di erences in microarray intensity (see the supplementary information of [22, 23]) We followed the preprocessing steps used in [22, 23] Thus, giving in parentheses the value for the leukemia data set if it di ers from that used with the lymphoma data, gene expression values smaller then 20 (100) were replaced with 20 (100) and values higher than 16000 were set to 16000. ....
[Article contains additional citation context not shown here]
T. Golub, D. Slonim, P.Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H.Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloom eld, E. Lander, Molecular classi cation of cancer: Class discovery and class prediction by gene expression monitoring, Science 286 (1999) 531-537.
....analysis) because the cdc15 data set had a lot of points missing and the elu data set had been sampled for one period, and only coarsely at that. Microarray data has also been used to distinguish between tissue types, providing new methods for diagnosing di erent types of cancers and leukemia [1, 8]. These data sets compare expression in two distinct cell populations, and typically do not represent time series courses. 3 Integrating Cell Cycle Data Sets We sought to interleave all four of the Cho Spellman cell cycle data sets, with the goal that the integrated data set would gain precision ....
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Caasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloom eld, and E. S. Lander. Molecular classi cation of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531-537, 1999.
....is signi cantly improved and the number of genes of the classi cation models is notably reduced for all datasets. 1 1. Introduction A right and accurate cancer classi cation allows to the medical sta the application of speci c therapies and treatments, related with the speci c cancer type [14]. Although cancer classi cation has been improved over the past three decades, it has been traditionaly based on the morphological appearance of the tumor. However, this non automatic and systematic classi cation has obvious and serious limitations, so related with human errors and ....
.... gene selection) outlier detection, principal component analysis, discovering of gene relationships and cluster analysis (unsupervised classi cation) In this way, DNA microarray datasets are an appropiate starting point to carry out the explained systematic and automatic cancer classi cation [14]. Cancer classi cation is divided in two major issues: the discovery of previously unknown types of cancer (class discovery) and the assignment of tumor samples to already known cancer types (class prediction) While the class discovery is related with the cluster analysis (or unsupervised ....
[Article contains additional citation context not shown here]
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caliguri, C.D. Bloom eld, and E.S. Lander, `Molecular Classi cation of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring', Science, 286, 531-537, (1999).
....step, but there is a lot more that can be done with the data. For example, one can use supervised learning techniques to cluster or classify the conditions. Such methods were recently shown to yield very good results in determining cancer types, with important potential applications to diagnostics [Golub et al. 1999, Alizadeh et al. 2000, Ben Dor et al. 2000, Brown et al. 2000, Califano et al. 2000] Another useful idea is to cluster both the genes and the conditions, and to pinpoint subsets of the genes and the conditions ( biclustering ) Getz et al. 2000a, Cheng and Church, 2000] Given the ....
T. R. Golub, D. K. Slonim, et al. Molecular classi cation of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531-537, October 1999.
.... amadou ee mail.engr.ccny.cuny.edu basu ccny.cuny.edu Abstract The goal of this work is to explore the use of gene expression data (ged) in discriminating two types of very similar cancers acute myeloid leukemia (AML) and acute lymphoblastic leukemia(ALL) Classi cation results are reported in [1] using methods other than neural networks. Here, we explore the role of the feature vector in classi cation. Each feature vector consists of 6817 elements which are gene expression data for 6817 genes. We show in this preliminary experiment that learning using neural network is possible when the ....
....network may be a viable option for automating the classi cation task. To prospect the ability of neural networks to cancer classi cation, we are applying it to gene expression data of two types of cancer acute myeloid leukemia (AML) and acute lymphoblastic leukemia(ALL) The data is available in [1] where classi cation was done using Bayes network and a weighted voting scheme. In particular, we use neural networks to determine the minimum informative set of gene expression data. 2 Classi cation In this section, we describe the classi cation method used. We are given a training set D, ....
[Article contains additional citation context not shown here]
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, J.P. Mesirov M. Caasenbeek, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomeld, and E.S. Lander, \Molecular classication of cancer: class discovery and class prediction by gene expression monitoring." Science , 286:531-537, 1999.
....medicine, where an individual s genetic composition is determined and analyzed to determine the best course of treatment. New technologies such as microarrays [5, 8] o er promise for obtaining sequence and expression data on an individual scale. Microarray studies of leukemia and breast cancer [1, 9] tissues have demonstrated that cancer subtypes can be accurately diagnosed on the basis of genomic data, and with them the prognosis for survival under various treatments. Such microarray studies will continue to help develop our understanding of gene expression and disease. However, the ....
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Caasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomeld, and E. S. Lander. Molecular classication of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531-537, 1999.
....is freely available on the web (www.cs. columbia.edu compbio svm) For a more complete explanation of our SVM methods, see [5; 4] and the accompanying web page (www.cse.ucsc.edu research compbio genex) The second classi er is a # nearest neighbor technique, similar to that used by Golub ## ### [12]. For a given test vector, the # closest members of the training set are located using the Pearson correlation coecient [24] as similarity metric. The predicted label of the test vector is determined by a vote among the # neighbors. In this work, we use # =3. We evaluate the classi cation ....
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomeld, and E. S. Lander. Molecular classication of cancer: class discovery and class prediction by gene expression monitoring. #######, 286(5439):531-537, 1999.
.... various tumors using gene expression data [6; 8; 1] It has been shown that gene expression data acquired from leukemia patients can be used to build predictors that can discriminate between two acute leukemia subtypes, acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) [1]. To maximize the ecacy of cancer treatment while at the same time reducing its toxicity, it is imperative to target speci c therapies to pathogenetically distinct tumor types. Thus, improved cancer classi cation is invaluable to advances in cancer treatment. Subtypes of acute leukemia, ALL and ....
....data can vastly improve the accuracy of diagnosis and e ectiveness of treatment. Currently, no single test is sucient to make a diagnosis leukemia classi cation still remains imperfect. Based on gene expression data collected from 72 patients su ering from either ALL or AML, it has been shown [1] that a large number of genes (approximately 1100) have higher correlations with the AML ALL distinction than can be expected by chance. Classi ers built with small subsets of these genes, selected based upon their individual correlations with the cancer subtypes, have been used to predict the ....
[Article contains additional citation context not shown here]
T.R.Golub, D.K.Slonim, P.Tamayo, C.Huard, M.Gaseenbeek, J.P.Mesirov, H.Coller, M.L.Loh, J.R.Downing, M.A.Caligiuri, C.D.Bloomeld and E.S.Lander. Molecular classication of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5349), pages 531-537, 1999.
....is freely available on the web (www.cs. columbia.edu compbio svm) For a more complete explanation of our SVM methods, see [5; 4] and the accompanying web page (www.cse.ucsc.edu research compbio genex) The second classi er is a k nearest neighbor technique, similar to that used by Golub et al. [12]. For a given test vector, the k closest members of the training set are located using the Pearson correlation coecient [24] as similarity metric. The predicted label of the test vector is determined by a vote among the k neighbors. In this work, we use k = 3. We evaluate the classi cation ....
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomeld, and E. S. Lander. Molecular classication of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439):531-537, 1999.
....set is composed with 62 instances of colon cancer patients. Each instance is characterized by 2000 predictive variables, each one related with the numeric expression of a certain gene. The task to be predicted is whether patients su er colon cancer disease. The second data set was proposed by [ 6 ] It contains 72 instances of leukemia patients involving 7129 variables, each one related with the numeric expression of a certain gene. The class to be predicted is the speci c type of leukemia: AML or ALL. For he discrete na ve Bayes models each variable is discretized into two values ....
....features. Table 1: Results of Leave One Out with All the Variables and SFS. DATA TYPE ALL FEATURES SFS accuracy no.var. accuracy no.var. Colon disc 70.97 2000 91.93 5 cont 53.23 2000 95.83 3 Leukemia disc 63.89 7129 98.61 6 cont 84.72 7129 87.09 2 These results follow the discoveries of [ 6 ] and [ 15 ] relating the low number of features needed to improve the accuracy of the whole feature set. For each dataset and initialization method 10 EDAs independent runs have been executed. Table 2 shows the estimated accuracy of na ve Bayes and the number of selected features for the best ....
[Article contains additional citation context not shown here]
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caliguri, M. A., Bloomeld, C. D., and Lander, E. S. (1999). Molecular classication of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531-537.
....methods on (a) a linear problem and (b) a nonlinear problem both with many irrelevant features. The x axis is the number of training points, and the y axis the test error as a fraction of test points. 8. 2 DNA Microarray Data Next, we tested this idea on two leukemia discrimination problems [7] and a problem of predicting treatment outcome for Medulloblastoma 6 . The rst problem was to classify myeloid versus lymphoblastic leukemias based on the expression of 7129 genes. The training set consists of 38 examples and the test set 34 examples. Standard linear SVMs achieve 1 error on the ....
....test set. Using gradient descent on R 2 = 2 we achieved 0 error using 30 genes and 1 error using 1 gene. Using the Fisher score to select features resulted in 1 error for both 1 and 30 genes. The second leukemia classi cation problem was discriminating B versus T cells for lymphoblastic cells [7]. Standard linear SVMs make 1 error for this problem. Using either the span bound or gradient descent on R 2 = 2 results in 0 error using 5 genes, whereas the Fisher score get 2 errors using the same number of genes. The nal problem is one of predicting treatment outcome of patients that ....
T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomeld, and E.S. Lander. Molecular classication of cancer : Class discovery and class prediction by gene expression monitoring. Science, 286:531-537, 1999.
....by measuring expression levels associated with two kinds of tissue, tumor or non tumor, one obtains labeled data sets that can be used to build diagnostic classi ers. The number of replicates in these experiments are often severely limited, however; indeed, in the data that we analyze here (cf. Golub, et al. 1999), there are only 72 observations of the expression levels of each of 7130 genes. In this extreme of very few observations on very many features, it is natural and perhaps essential to investigate feature selection and regularization methods. Feature selection methods have received much attention ....
....In a problem with over 7000 features, ltering methods have the key advantage of signi cantly smaller computational complexity than wrapper methods, and for this reason these methods are the main focus of this paper. Earlier papers that have analyzed microarray data have also used ltering methods (Golub et al. 1999; Chow et al. in press; Dudoit et al. 2000) We show, however, that it is also possible to exploit prediction error oriented wrapper methods in the context of a large feature space. In particular, we adopt the spirit of Ng s FS ORDERED approach and present a speci c algorithmic instantiation of ....
[Article contains additional citation context not shown here]
Golub, T., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M. L., Downing, J., Caligiuri, M., Bloomeld, C., & Lander, E. (1999). Molecular classication of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531{ 537.
....of inferring the function of unknown genes based on their expression pro les is to use machine learning techniques based on supervised learning [24] This has been recently recognized by a number of researchers and a few attempts have been made to use such algorithms. In particular, Golub et al. [14], by looking at expression pro les of a subset of human genes, a particular type of leukemia can be distinguished from another type of the disease. Brown et al. 2, 3] used several classi cation algorithms to predict if a gene has a particular function based on expression pro les and obtain ....
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomeld, and E. S. Lander. Molecular classication of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531-537, October 1999.
No context found.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J., Caligiuri, M. A., Bloom eld, C. D., & Lander, E. S. (1999). Molecular classi cation of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.
No context found.
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caliguri, C.D. Bloom eld, and E.S. Lander. Molecular Classi cation of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286:531{ 537, 1999.
No context found.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloom eld, C. D., Lander, E. S., 1999. Molecular classi cation of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531-537.
No context found.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J., Caligiuri, M. A., Bloom eld, C. D. & Lander, E. S. (1999), `Molecular classi- cation of cancer: Class discovery and class prediction by gene expression monitoring', Science 286, 531-537.
No context found.
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caliguri, C.D. Bloom eld, and E.S. Lander. Molecular Classi cation of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286:531{ 537, 1999.
No context found.
Golub,T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasenbeek,M., Mesirov,J.P., Coller,H., Loh,M.L., Downing,J.R., Caligiuri,M.A. ## ### (1999) Molecular classication of cancer: class discovery and class prediction by gene expression monitoring. #######, 286, 531537.
No context found.
T. R. Golub, D. K. Slonim, et al., Molecular classi cation of cancer: Class discovery and class prediction by gene expression monitoring, Science 286 (1999) 531-537.
No context found.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gassenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloom eld, D.D., and Lander, E. S. Molecular Classi cation of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, Vol. 286(15):531-537, October 1999.
No context found.
Golub,T., Slonim,D., Tamayo,P., Huard,C., Gaasenbeek,M., Mesirov,J., Coller,H., Loh,M., Downing,J. & Caligiuri,M. (1999) Molecular classication of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-536.
No context found.
T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gassenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caliguri, C. Bloom eld, E. Lander, Molecular classi cation of cancer: Class discovery and class prediction by gene expression monitoring, Science 286 (1999) 531-538.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC