#### DMCA

## Bayesian class discovery in microarray datasets

Venue: | IEEE Trans Biomed Eng |

Citations: | 8 - 1 self |

### Citations

11966 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ... share an identical covariance matrix . Under this model, the log-likelihood for the dataset reads where the mixing proportions sum to one, and denotes a Gaussian density. The classical EM-algorithm, =-=[7]-=-, provides a convenient method for maximizing E-step: set M-step: set (1) observations times, with the -th such replication having observation weights . In [8] it is proven that the M-step can be carr... |

4208 | Regression shrinkage and selection via the LASSO
- Tibshirani
- 1996
(Show Context)
Citation Context ...chanism consists of replacing the -penalty by an -penalty: minimize (2) subject to . In the statistical literature, this model is known as the Least Absolute Shrinkage and Selection Operator (LASSO), =-=[11]-=-. In [12], [13] it has been shown that the LASSO model can be interpreted as a Bayesian inference mechanism for the following model: consider automatic relevance determination (ARD) priors over the re... |

1777 | Molecular classification of cancer: class discovery and class prediction by gene expression monitoring - Golub, Slonim, et al. - 1999 |

641 | Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
- Alizadeh
- 2000
(Show Context)
Citation Context ...g task. Taking a machine-learning viewpoint, this class-discovery problem can be formalized as an unsupervised clustering problem with simultaneous feature selection. Early approaches to this problem =-=[1]-=-–[3], were semi-automatic procedures based on a combination of clustering techniques and human intervention for selecting “relevant” genes. Several shortcomings of such approaches, and also some metho... |

281 |
Molecular classification of cutaneous malignant melanoma by gene expression profiling
- Bittner
- 2000
(Show Context)
Citation Context ...sk. Taking a machine-learning viewpoint, this class-discovery problem can be formalized as an unsupervised clustering problem with simultaneous feature selection. Early approaches to this problem [1]–=-=[3]-=-, were semi-automatic procedures based on a combination of clustering techniques and human intervention for selecting “relevant” genes. Several shortcomings of such approaches, and also some methods f... |

235 | Pairwise data clustering by deterministic annealing.
- Hofmann, Buhmann
- 1997
(Show Context)
Citation Context ...(v) need some further explanation: the problem of identifying compact groups in datasets which are represented by pairwise distances can be solved by optimizing the pairwise clustering cost function, =-=[20]-=-. We iteratively increase the number of clusters (which is a free parameter in the pairwise clustering functional) until the average dissimilarity in each group does not exceed a predefined threshold.... |

234 | Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method.
- Fridlyand, Dudoit
- 2001
(Show Context)
Citation Context ...set of 1479 genes. Then, the data were log-transformed, centered to zero mean, “squashed” through a -function for outlier-reduction, and standardized to unit variance (for each microarray). Following =-=[22]-=-, in a next step we extracted the 200 genes with highest variance across the samples. While the number 200 might appear completely artificial, in [22] it has been shown that for both datasets we have ... |

230 | Penalized discriminant analysis.
- Hastie, Buja, et al.
- 1995
(Show Context)
Citation Context ...RY IN MICROARRAY DATASETS 709 The feature selection mechanism can now be incorporated in the M-step by imposing a certain constraint on the linear regression [step 2) of the above algorithm]. In [9], =-=[10]-=- it has been proposed to use a ridge-type penalized regression. On the technical side, such a penalized regression model is obtained by substituting the covariance matrix by a penalized version of the... |

214 | Discriminant analysis by Gaussian mixtures.
- Hastie, Tibshirani
- 1996
(Show Context)
Citation Context ...ian density. The classical EM-algorithm, [7], provides a convenient method for maximizing E-step: set M-step: set (1) observations times, with the -th such replication having observation weights . In =-=[8]-=- it is proven that the M-step can be carried out via a weighted and augmented linear discriminant analysis (LDA). Following [9], any LDA problem can be restated as an optimal scoring problem. Let the ... |

209 | The Lasso and its dual.
- OSBORNE, PRESNELL, et al.
- 2000
(Show Context)
Citation Context ...tion, is shown in the equation at the bottom of the page. D. Optimizing the Final Model Since space here precludes a detailed discussion of -constrained regression problems, the reader is referred to =-=[16]-=-, where a highly efficient algorithm with guaranteed global convergence has been proposed. For our iterated EM-model we can guarantee convergence to a local maximum of the constrained likelihood. Cons... |

143 | Flexible discriminant analysis by optimal scoring
- Hastie
- 1994
(Show Context)
Citation Context ...ons times, with the -th such replication having observation weights . In [8] it is proven that the M-step can be carried out via a weighted and augmented linear discriminant analysis (LDA). Following =-=[9]-=-, any LDA problem can be restated as an optimal scoring problem. Let the class-memberships of the data vectors be represented by a categorical response variable with levels. Let these responses be cod... |

68 | Bayesian non-linear modelling for the prediction competition
- MacKay
- 1994
(Show Context)
Citation Context ... . Returning to (3), we are now able to interpret the LASSO estimate as a Bayesian feature selection principle: for the purpose of 2 For an introduction to the ARD principle the reader is referred to =-=[14]-=-. (3) (4) feature selection, we would like to estimate the value of a binary selection variable for each feature: equals one, if the -th feature is considered relevant for the given task, and zero oth... |

55 |
Herpes virus induced proteasome-dependent degradation of the nuclear bodies-associated PML and Sp100 proteins. Oncogene
- Chelbi-Alix, Thé
- 1999
(Show Context)
Citation Context ...rferon stimulated gene HEM45 (ISG20) among the top-scoring genes. ISG20 is one of the nuclear bodies (NBs)-associated proteins, which could play an important role in oncogenesis and viral infections, =-=[25]-=-. According to [26], expression of the probable G protein-coupled receptor LCR1 homolog (alias CD184 antigen) is associated with survival in familial chronic lymphocytic leukemia. The second cluster c... |

50 |
A class of rank test procedures for censored survival data.
- Harrington, Fleming
- 1982
(Show Context)
Citation Context ...ase, cf. [40]. As can be seen in the Kaplan-Meier plot, the two IPI classes (low and high risk) are associated with statistically significant differences in overall survival ( in a log-rank test, see =-=[41]-=-). Following [1], we tested the specifity of our inferred partitioning to overall survival within the IPI low risk group (IPI score 0–2). The right panel of Fig. 8 presents the corresponding Kaplan-Me... |

46 | Class discovery in gene expression data
- Ben-Dor, Friedman, et al.
- 2001
(Show Context)
Citation Context ...iques and human intervention for selecting “relevant” genes. Several shortcomings of such approaches, and also some methods for overcoming these problems, have been discussed in the literature, e.g., =-=[4]-=-–[6]. The common strategy of most of these approaches is the use of a (possibly iterated) stepwise procedure, in which the first step consists of extracting a set of hypothetical partitions (the clust... |

43 | Stability-based model selection. In
- Shah, Lange, et al.
- 2003
(Show Context)
Citation Context ...ced that the concept of measuring the stability of solutions as a means of model selection and model assessment has been successfully applied to several unsupervised learning problems, see, e.g. [17]–=-=[19]-=-. Our clustering method splits the data in two disjoint groups, and simultaneously selects features (i.e. prototypical expression patterns of gene-clusters) which support the splitting hypothesis. In ... |

31 |
Lymphoid malignancies: the dark side of B-cell differentiation.
- AL, Rosenwald, et al.
- 2002
(Show Context)
Citation Context ...k differentiation, prevent apoptosis and/or promote proliferation. In DLBCLs, for instance, apoptosis can be abrogated by translocation, amplification or transcriptional activation of the BCL-2 gene, =-=[36]-=-. It is interesting to observe expression of BCL-2 in many of the LN-samples, see Fig. 7. Patients in this group with low IPI score had a distinctly worse overall survival than patients716 IEEE TRANS... |

30 |
Going metric: denoising pairwise data
- Roth, Laub, et al.
- 2003
(Show Context)
Citation Context ...ition cluster of largest size. For this dominating cluster, we then select a prototypical partition. For selecting such prototypical partitions in pairwise clustering problems, we refer the reader to =-=[21]-=-, where it is shown that the pairwise clustering problem can be equivalently restated as a -means problem in a suitably chosen embedding space. Each partition is represented as a vector in this space.... |

27 | Bayesian learning of sparse classifiers,”
- Figueiredo, Jain
- 2001
(Show Context)
Citation Context ...ance . Note that in the above ARD framework only the functional form of the prior (3) is fixed, whereas the parameters , which encode the “relevance” of each variable, are estimated from the data. In =-=[15]-=- the following Bayesian inference procedure for the prior parameters has been introduced: given exponential hyperpriors, (the variances must be nonnegative), , one can analytically integrate out the h... |

15 |
Secreted protein acidic and rich in cysteine promotes glioma invasion and delays tumor growth in vivo. Cancer Res
- Schultz, Lemke, et al.
- 2002
(Show Context)
Citation Context ..., such as the matrix metalloproteinases MMP-9 and MMP-2, the metalloproteinase inhibitor TIMP-3, and SPARC. The latter appears to regulate cell growth and is capable of delaying tumor growth in vivo, =-=[32]-=-. The list of high-scoring genes also includes several genes which are known to be associated with tumor progression/invasion, like the CD63 antigene and human cathepsin B. The latterROTH AND LANGE: ... |

14 |
field CD: Clinical importance of myeloid antigen expression in adult acute lymphoblastic leukemia
- RE, Mick, et al.
- 1987
(Show Context)
Citation Context ...lymphoma (DLBCL) patients or in one of several control groups from different cell lines. eral blood leukocytes, [27]. A member of the fourth cluster is the differentiation antigene CD33. According to =-=[28]-=- it possesses high expression specificity to AML. On the contrary, TCF3 (alias E2A) in the fifth cluster has known expression specificity to ALL, [29]. Moreover, the fifth cluster also contains the we... |

13 | The Generalized LASSO: a wrapper approach to gene selection for microarray data
- Roth
- 2002
(Show Context)
Citation Context ...onsists of replacing the -penalty by an -penalty: minimize (2) subject to . In the statistical literature, this model is known as the Least Absolute Shrinkage and Selection Operator (LASSO), [11]. In =-=[12]-=-, [13] it has been shown that the LASSO model can be interpreted as a Bayesian inference mechanism for the following model: consider automatic relevance determination (ARD) priors over the regression ... |

11 |
Immunophenotyping of leukemias using a cluster of differentiation antibody microarray.
- Belov, O, et al.
- 2001
(Show Context)
Citation Context ...re ordered with respect to their membership in either the group of diffuse large B-cell lymphoma (DLBCL) patients or in one of several control groups from different cell lines. eral blood leukocytes, =-=[27]-=-. A member of the fourth cluster is the differentiation antigene CD33. According to [28] it possesses high expression specificity to AML. On the contrary, TCF3 (alias E2A) in the fifth cluster has kno... |

9 |
A resampling approach to estimate the stability of one- and multidimensional independent components
- Meinecke, Ziehe, et al.
(Show Context)
Citation Context ... noticed that the concept of measuring the stability of solutions as a means of model selection and model assessment has been successfully applied to several unsupervised learning problems, see, e.g. =-=[17]-=-–[19]. Our clustering method splits the data in two disjoint groups, and simultaneously selects features (i.e. prototypical expression patterns of gene-clusters) which support the splitting hypothesis... |

9 |
Dtype cyclins in adult human testis and testicular cancer: relation to cell type, proliferation, differentiation
- Bartkova, Rajpert-de, et al.
- 1999
(Show Context)
Citation Context ... D2, a proto oncogene belonging to the class of D-type cyclins. These genes are involved in key cellular decisions that control cell proliferation, cell-cycle arrest, quiescence, and differentiation, =-=[35]-=-. B. Refining the Partition: Discovery of DLBCL Subtypes Having found a stable partition of the samples into a DLBCL cluster and a non-DLBCL cluster, we now investigate further refinements of this par... |

7 |
Identifying splits with clear separation: a new class discovery method for gene expression data
- Heydebreck
- 2001
(Show Context)
Citation Context ...user is left with several possible high-scoring solutions, from which he/she can only pick one by chance. Even worse, the highest-scoring labelings do not reveal the true structure of the samples. In =-=[5]-=-, [6] a support vector machine-based class discovery algorithm has been presented and tested on this dataset. Among the ten highest scoring partitions in [6], the 4th and the 10th partition separate t... |

6 |
Cyclin D3 is a target gene of t(6;14)(p21.1;q32.3) of mature B-cell malignancies. Blood 98:2837–2844
- Sonoki, Harder, et al.
- 2001
(Show Context)
Citation Context ...etation: In the first cluster, we find Cyclin D3, which has been identified as a dominant oncogene in the pathogenesis and transformation in several histologic subtypes of mature B-cell malignancies, =-=[23]-=-. IL7R is a member of the interleukin receptor family. Expression of interleukin receptors (in particular expression of IL2R, which appears in our list in the 12th cluster) has been examined on a wide... |

5 |
Allograft inflammatory factor-1 augments production of interleukin-6, -10 and -12 by a mouse macrophage line
- Watano, Iwabuchi, et al.
- 2001
(Show Context)
Citation Context ...ble cytokine A5 (RANTES). Also in this category falls the allograft inflammatory factor-1 (IBA1), which is a bioactive macrophage factor which might play a role in macrophage activation and function, =-=[31]-=-. Also in accordance with the analysis in [1], we find several genes involved in remodeling the extracellular matrix, such as the matrix metalloproteinases MMP-9 and MMP-2, the metalloproteinase inhib... |

5 |
CXCL13 (BCA-1) is produced by follicular lymphoma cells: role in the accumulation of malignant B cells
- Husson, Freedman, et al.
- 2002
(Show Context)
Citation Context ...stasis, [33]. We also find CXCR5 which is one component of the chemokine/chemokine receptor pair CXCL13/CXCR5 that is required for the architectural organization of B cells within lymphoid follicles, =-=[34]-=-. Moreover, the first gene-cluster in Table II contains cyclin D2, a proto oncogene belonging to the class of D-type cyclins. These genes are involved in key cellular decisions that control cell proli... |

4 |
Antibodies against human CD63 activate transfected rat basophilic leukemia (RBL-2H3) cells. Mol. Immunol
- Smith, Monk, et al.
- 1995
(Show Context)
Citation Context ...clusters no. 6–12, we also find the well-known marker antigenes CD19 and CD63. The latter is a widely expressed glycoprotein member of the TM4SF superfamily that is present on many nonlymphoid cells, =-=[30]-=-. C. Comparison With Other Algorithms Some other class discovery algorithms have been tested on the AML/ALL Leukemia dataset. BEN-DOR et al., [4], report on the splitting of the dataset into two subse... |

4 |
TCL1 oncogene expression in B cell subsets from lymphoid hyperplasia and distinct classes of B cell lymphoma
- Said
(Show Context)
Citation Context ...sed in the LN group. According to [38], the TCL1 protooncogene is overexpressed in many mature B cell lymphomas, especially from AIDS patients. For Non-AIDS-related lymphomas, it has been reported in =-=[39]-=- that TCL1 expression in B cell lymphoma usually reflects the stage of B cell development from which they derive. In accordance with the grouping proposed in [1], we also find some Germinal-centre B-c... |

2 | Class discovery in gene expression data: characterizing splits by support vector machines
- Markowetz, Heydebreck
- 2002
(Show Context)
Citation Context ...s and human intervention for selecting “relevant” genes. Several shortcomings of such approaches, and also some methods for overcoming these problems, have been discussed in the literature, e.g., [4]–=-=[6]-=-. The common strategy of most of these approaches is the use of a (possibly iterated) stepwise procedure, in which the first step consists of extracting a set of hypothetical partitions (the clusterin... |

2 |
generalized lasso
- “The
- 2004
(Show Context)
Citation Context ...s of replacing the -penalty by an -penalty: minimize (2) subject to . In the statistical literature, this model is known as the Least Absolute Shrinkage and Selection Operator (LASSO), [11]. In [12], =-=[13]-=- it has been shown that the LASSO model can be interpreted as a Bayesian inference mechanism for the following model: consider automatic relevance determination (ARD) priors over the regression coeffi... |

2 | Injecting noise for analysing the stability - Harmeling, Meinecke, et al. - 2004 |

2 |
Transcript synthesis and surface expression of the interleukin-2 receptor (�-, �-, and �-chain) by normal and malignant myeloid cells. Blood 87:2419
- Schumann, Nakarai, et al.
- 1996
(Show Context)
Citation Context ...IL2R, which appears in our list in the 12th cluster) has been examined on a wide range of cells of myeloid origin including bone marrow blasts obtained from acute myelogenous leukemia (AML) patients, =-=[24]-=-. It is also interesting to find the interferon stimulated gene HEM45 (ISG20) among the top-scoring genes. ISG20 is one of the nuclear bodies (NBs)-associated proteins, which could play an important r... |

1 |
CXCR4 expression is associated with survival in familial chronic lymphocytic leukemia, but CD38 expression is not
- Ishibe, Albitar, et al.
- 2002
(Show Context)
Citation Context ...ene HEM45 (ISG20) among the top-scoring genes. ISG20 is one of the nuclear bodies (NBs)-associated proteins, which could play an important role in oncogenesis and viral infections, [25]. According to =-=[26]-=-, expression of the probable G protein-coupled receptor LCR1 homolog (alias CD184 antigen) is associated with survival in familial chronic lymphocytic leukemia. The second cluster contains e.g. the pr... |

1 |
Chromosomal translocation t(1:l9) results in synthesis of a homeobox fusion mRNA that codes for a potential chimeric transcription factor
- Nourse, Galili, et al.
- 1990
(Show Context)
Citation Context ...s the differentiation antigene CD33. According to [28] it possesses high expression specificity to AML. On the contrary, TCF3 (alias E2A) in the fifth cluster has known expression specificity to ALL, =-=[29]-=-. Moreover, the fifth cluster also contains the well-known proto-onkogene c-MYB. Among the clusters no. 6–12, we also find the well-known marker antigenes CD19 and CD63. The latter is a widely express... |

1 |
A polymorphic marker for the human cathepsin
- MacKenzie, Mason, et al.
- 2001
(Show Context)
Citation Context ...AND ANNOTATED GENES WITHIN THE CLUSTERS FOR THE MOST DOMINANT SPLIT OF THE DATA SET. FIRST COLUMN: FREQUENCY SCORE OF GENE CLUSTER is a proteolytic enzyme implicated in tumor invasion and metastasis, =-=[33]-=-. We also find CXCR5 which is one component of the chemokine/chemokine receptor pair CXCL13/CXCR5 that is required for the architectural organization of B cells within lymphoid follicles, [34]. Moreov... |

1 |
Homozygous deletion of INK4aIARF genes and overexpression of bcl-2 in relation with poor prognosis in immunocompetent patients with primary central nervous system lymphoma of the diffuse large B-cell type
- Hayashi, Iwato, et al.
- 2001
(Show Context)
Citation Context ...o a Non-LN group. TABLE III DLBCL SUBGROUPS: HIGH-SCORING GENE CLUSTERS AND ANNOTATED GENES WITHIN THESE CLUSTERS in the Non-LN group, cf. [8]. This observation may be corraborated by HAYASHI et al., =-=[37]-=-, who report a tendency, in which patients with BCL-2 overexpression resulted in poor prognosis in the case of primary central nervous system lymphomas (PCNSLs) of the diffuse large B-cell type. Among... |

1 |
Dysregulated TCLI promotes multiple classes of mature B cell lymphoma
- Hoyer, French, et al.
- 2002
(Show Context)
Citation Context ...s, we also find the B-cell specific antigene CD23a, which has a essential role in the differentiation of B-cells. The highest scoring gene is TCL1 which is overexpressed in the LN group. According to =-=[38]-=-, the TCL1 protooncogene is overexpressed in many mature B cell lymphomas, especially from AIDS patients. For Non-AIDS-related lymphomas, it has been reported in [39] that TCL1 expression in B cell ly... |

1 |
et al., “The international nonhodgkin’s lymphoma prognostic factors project: a predictive model for aggressive nonhodgkin’s lymphoma
- Shipp
- 1993
(Show Context)
Citation Context ...high values of the International Prognostic Indicator (IPI). This clinical indicator of prognosis takes into account the patient’s age, performance status, and the extent and location of disease, cf. =-=[40]-=-. As can be seen in the Kaplan-Meier plot, the two IPI classes (low and high risk) are associated with statistically significant differences in overall survival ( in a log-rank test, see [41]). Follow... |