| Yeung, K.; Fraley, C.; Murua, A.; Raftery, A.; and Ruzzo, W. 2001. Model-based clustering and data transformations for gene expression data. Bioinformatics 17:10:987--997. |
....to remind that this is the same the digit1000 database. DNA microarrays provide a way to the biologists to study the variation of many genes together. Using these has been a plethora of gene expression data generated in the community. This has led to the great need for the data to be analyzed. [12]) We used one such dataset, the yeast cell cycle data, which is publically available at [13] It shows the uctuation of the gene expression levels of over 6000 genes over the two cellcycles (which has 17 time points) The dataset is restricted to the 384 genes who s expression level peak at ....
....to the 384 genes who s expression level peak at di erent points corresponding to the ve phases of the cell cycle. The objective given these expression levels is to be able to cluster them into clusters corresponding to the ve phases. There are two kinds of pre processing that are suggested in [12]. First is to the take the logarithm of the expression level and second to standardize the mean to be zero and variance 1. These data transformations were done so as to make the data t better to the gaussian model (They were using mixture models to cluster the data. See [12] for more details) ....
[Article contains additional citation context not shown here]
K. Yeung, C. Fraley, A. Murua, A. Raftery, and W. Ruzzo. Model-based clustering and data transformations for gene expression data. Technical Report UW-CSE-01-04-02, Dept. of Computer Science and Engineering, University of Washington, 2001.
....not be true in many cases. In particular, modeling gene expression data sets is an ongoing effort by many researchers. To our knowledge, there is no well established model to represent gene expression data yet. Although preprocessing, such as Box Cox transformation and standardized transformation [Yeung2001a], are applied to raw gene expression data, it is still questionable to which extent gene expression data satisfy the Gaussian mixture assumption. Furthermore, the convergence rate of EM algorithm can be very slow and it may not be practical for models with very large numbers of components ....
K.Y. Yeung, C. Fraley, A. Muma, A.E. Raftery and W.L. Ruzzo. "Model-based clustering and data transformations for gene expression data", University of Washington, April 2001.
....algorithms in the literature, only a few of them have been applied to analyze gene expression data in the recent half decade. These include K Means [14] hierarchical clustering [5] graph theory based clustering [2] naive Bayesian clustering [1] and Gaussian mixture model based clustering [18]. There are very rare studies of neural networks on gene clustering, besides a few applications of the self organizing map (SAM) 10] 15] SaM however is not known as an efficient clustering algorithm, due to its high computational cost in maintaining the neighborhood relationship. Adaptive ....
.... Disffibution Cluster N NM NR PS DV 1 29 3 2 12 12 2 23 7 15 0 1 3 1 i 4 7 2 1 4 0 3 1 0 5 0 4 0 0 6 11 1 1 6 4 7 0 1 1 1 8 4 1 0 1 9 0 1 1 0 10 1 0 1 0 11 4 2 2 1 12 1 2 1 0 Total 112 25 39 27 21 used in Yeung et al. s recent work to evaluate their modelbased clustering algorithm [18]. These evaluation measures are all based on the so called external criteria, i.e. they require a known optimal partition of the data set as the reference result. Such an optimal partition however may not be available without extensive human knowledge on the problem domain. Therefore in our ....
K. Yeung, C. Fraley, A. Raftery, and W. Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics, 17(10):977-987, 2001.
.... form such as a Gaussian or as a mixture of functional forms (called a mixture model) The mixture model, in particular, uses superpositions of simpler densities (such as Gaussians defined with known means and covariance) to represent high dimensional and high variability regions of gene expression [Yeung et al. 2001]. The distributions modeled by these simpler densities can be viewed as clusters, which are categorically homogeneous subsets. Thus, clustering looks for regularities in the training data (the data resulting from microarray experiments) and can be used to provide a compact representation of the ....
Yeung, K., Fraley, C., Murua, A., Raftery, A., and Ruzzo, W. (2001). Modelbased clustering and data transformations for gene expression data. Technical Report TR 2001-396, Statistics Department, University of Washington. 40 pages.
....algorithms are largely heuristically motivated, and the issues of determining the correct number of clusters and choosing a good clustering algorithm are not yet rigorously solved. Eisen et al. 1998) and (Tamayo et al. 1999) used visual display to determine the number of clusters. (Yeung et al. 2001b) suggested clustering the data set leaving out one experiment at a time and then comparing the performance of different clustering algorithms using the left out experiment. The gap statistic (Tibshirani et al. 2000) estimates the number of clusters by comparing within cluster dispersion to ....
....distributed according to a mixture of Gaussian distributions, we explored the extent to which different transformations of gene expression data sets satisfy the normality assumption. Due to space limitations, data transformations and normality tests are summarized in Section 5. 2, and described in (Yeung et al. 2001a) and our supplementary web site. In Section 5, we show that the existing model based clustering implementations produce higher quality clustering results than a leading heuristic approach when the data is appropriately transformed. The existing model based clustering methods were designed for ....
[Article contains additional citation context not shown here]
Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E. and Ruzzo, W. L. (2001a) Model-based clustering and data transformations for gene expression data. Tech. Rep. UW-CSE-01-04-02, Dept. of Computer Science and Engineering, University of Washington. Available at our supplementary web site.
No context found.
Yeung, K.; Fraley, C.; Murua, A.; Raftery, A.; and Ruzzo, W. 2001. Model-based clustering and data transformations for gene expression data. Bioinformatics 17:10:987--997.
No context found.
K. Y. Yeung, C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics, 17(10):977--987, 2001. 32
No context found.
K. Y. Yeung, C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics, 17(10):977--987, 2001.
No context found.
Yeung, K.Y, et al. "Model-based clustering and data transformations for gene expression data", Bioinformatics 17: 977-987, (2001).
No context found.
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL. Model-based clustering and data transformations for gene expression data. Bioinformatics. 2001 Oct;17(10):977-87.
No context found.
K.Y. Yeung, C. Fraley, A. Murua, A.E. Raftery, and W.L. Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics, 17:977--987, 2001.
No context found.
K.Y. Yeung, C. Fraley, A. Murua, A.E. Raftery, and W.L. Ruzzo, Model-based clustering and data transformations for gene expression data, Bioinformatics 17 (2001), 977987.
No context found.
K. Y. Yeung, C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics, 17(10):977--987, 2001.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC