Results 1  10
of
808
Summaries of Affymetrix GeneChip probe level data
 Nucleic Acids Res
, 2003
"... High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11±20 pairs of pr ..."
Abstract

Cited by 464 (21 self)
 Add to MetaCart
(Show Context)
High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11±20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spikein studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be signi®cantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.
F: Evolving gene/ transcript definitions significantly alter the interpretation of GeneChip data
 Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng
, 2005
"... ..."
A model based background adjustment for oligonucleotide expression arrays (May 28
, 2004
"... ..."
(Show Context)
A Benchmark for Affymetrix GeneChip Expression Measures
 Bioinformatics
, 2003
"... Motivation: The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics.There are now several methods available for summariz ..."
Abstract

Cited by 137 (10 self)
 Add to MetaCart
Motivation: The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics.There are now several methods available for summarizing probe level data from the popular Affymetrix GeneChips, but it is difficult to identify the best method for a given inquiry. Results: We have developed a graphical tool to evaluate summaries of Affymetrix probe level data. Plots and summary statistics offer a picture of how an expression measure performs in several important areas. This picture facilitates the comparison of competing expression measures and the selection of methods suitable for a specific investigation. The key is a benchmark data set consisting of a dilution study and a spikein study. Because the truth is known for these data, we can identify statistical features of the data for which the expected outcome is known in advance. Those features highlighted in our suite of graphs are justified by questions of biological interest and motivated by the presence of appropriate data. Availability: In conjunction with the release of a graphics toolbox as part of the Bioconductor project
Adaptive Lasso for sparse highdimensional regression
 University of Iowa
, 2006
"... Summary. We study the asymptotic properties of adaptive LASSO estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive LASSO, where the L1 norms in the penalty are reweighted b ..."
Abstract

Cited by 92 (10 self)
 Add to MetaCart
(Show Context)
Summary. We study the asymptotic properties of adaptive LASSO estimators in sparse, highdimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adaptive LASSO, where the L1 norms in the penalty are reweighted by datadependent weights. We show that, if a reasonable initial estimator is available, then under appropriate conditions, adaptive LASSO correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, the adaptive LASSO has an oracle property in the sense of Fan and Li (2001) and Fan and Peng (2004). In addition, under a partial orthogonality condition in which the covariates with zero coefficients are weakly correlated with the covariates with nonzero coefficients, univariate regression can be used to obtain the initial estimator. With this initial estimator, adaptive LASSO has the oracle property even when the number of covariates is greater than the sample size. Key Words and phrases. Penalized regression, highdimensional data, variable selection, asymptotic normality, oracle property, zeroconsistency. Short title. Sparse highdimensional regression
A novel signaling pathway impact analysis
 Bioinformatics
, 2009
"... Motivation: Gene expression class comparison studies may identify hundreds or thousands of genes as differentially expressed (DE) between sample groups. Gaining biological insight from the result of such experiments can be approached, for instance, by identifying the signaling pathways impacted by t ..."
Abstract

Cited by 86 (1 self)
 Add to MetaCart
(Show Context)
Motivation: Gene expression class comparison studies may identify hundreds or thousands of genes as differentially expressed (DE) between sample groups. Gaining biological insight from the result of such experiments can be approached, for instance, by identifying the signaling pathways impacted by the observed changes. Most of the existing pathway analysis methods focus on either the number of DE genes observed in a given pathway (enrichment analysis methods), or on the correlation between the pathway genes and the class of the samples (functional class scoring methods). Both approaches treat the pathways as simple sets of genes, disregarding the complex gene interactions that these pathways are built to describe. Results: We describe a novel Signaling Pathway Impact Analysis (SPIA) that combines the evidence obtained from the classical enrichment analysis with a novel type of evidence, which measures the actual perturbation on a given pathway under a given condition. A bootstrap procedure is used to assess the significance of the observed total pathway perturbation. Using simulations we show that the evidence derived from perturbations is independent of the pathway enrichment evidence. This allows us to calculate a global pathway significance pvalue, which combines the enrichment and perturbation pvalues. We illustrate the capabilities of the novel method on 4 real data sets. The results obtained on these data show that SPIA has better specificity and more sensitivity than several widely used pathway analysis methods. Availability: SPIA was implemented as an R package which is available at
Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays (Extended Abstract)
 J. Comput. Biol
, 2004
"... Zhijin Wu Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street zwu@jhsph.edu Rafael A. Irizarry Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street rafa@jhu.edu ABSTRACT High density oligonucleotide expression arrays are a widely used tool for the measureme ..."
Abstract

Cited by 78 (4 self)
 Add to MetaCart
(Show Context)
Zhijin Wu Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street zwu@jhsph.edu Rafael A. Irizarry Johns Hopkins Bloomberg School of Public Health 615 North Wolfe Street rafa@jhu.edu ABSTRACT High density oligonucleotide expression arrays are a widely used tool for the measurement of gene expression on a large scale. A#ymetrix GeneChip arrays appear to dominate this market. These arrays use short oligonucleotides to probe for genes in an RNA sample. Due to optical noise, nonspecific hybridization, probespecific e#ects, and measurement error, adhoc measures of expression, that summarize probe intensities, can lead to imprecise and inaccurate results. Various researchers have demonstrated that expression measures based on simple statistical models can provide great improvements over the adhoc procedure o#ered by A#ymetrix. Recently, physical models based on molecular hybridization theory, have been proposed as useful tools for prediction of, for example, nonspecific hybridization. These physical models show great potential in terms of improving existing expression measures. In this paper we suggest that the system producing the measured intensities is too complex to be fully described with these relatively simple physical models and we propose empirically motivated stochastic models that compliment the above mentioned molecular hybridization theory to provide a comprehensive description of the data. We discuss how the proposed model can be used to obtain improved measures of expression useful for the data analysts.
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS
, 2008
"... Summary. We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determin ..."
Abstract

Cited by 68 (1 self)
 Add to MetaCart
Summary. We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with Bspline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model and, the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. Following model selection, oracleefficient, asymptotically normal estimators of the nonzero components can be obtained by using existing methods. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method. Key words and phrases. Adaptive group Lasso; component selection; highdimensional data; nonparametric regression; selection consistency. Short title. Nonparametric component selection AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99 1
Androgen receptor regulates a distinct transcription program in androgenindependent prostate cancer. Cell 138
, 2009
"... The evolution of prostate cancer from an androgendependent state to one that is androgenindependent marks its lethal progression. The androgen receptor (AR) is essential in both, though its function ..."
Abstract

Cited by 64 (9 self)
 Add to MetaCart
(Show Context)
The evolution of prostate cancer from an androgendependent state to one that is androgenindependent marks its lethal progression. The androgen receptor (AR) is essential in both, though its function
The Subread aligner: fast, accurate and scalable read mapping by seedandvote. Nucleic Acids Res 2013;41:e108
"... Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multiseed strategy, called seedandvote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from ..."
Abstract

Cited by 48 (12 self)
 Add to MetaCart
Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multiseed strategy, called seedandvote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.