Results 11 - 20
of
854
The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
- NUCLEIC ACIDS RES 2013;41:E108
, 2013
"... Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from ..."
Abstract
-
Cited by 48 (12 self)
- Add to MetaCart
Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160bp, overlapping subreads are used. More con-ventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.
A three-component regulatory system regulates biofilm maturation and type III secretion in Pseudomonas aeruginosa
- J
, 2005
"... This article cites 83 articles, 49 of which can be accessed free ..."
Abstract
-
Cited by 39 (12 self)
- Add to MetaCart
(Show Context)
This article cites 83 articles, 49 of which can be accessed free
Outlier sums for differential gene expression analysis
- BIOSTATISTICS
, 2006
"... We propose a method for detecting genes that, in a disease group, exhibit unusually high or gene expression in some but not all samples. This can be particularly useful in cancer studies, where mutations that can amplify or turn off gene expression often occur in only a minority of samples. In a rea ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
We propose a method for detecting genes that, in a disease group, exhibit unusually high or gene expression in some but not all samples. This can be particularly useful in cancer studies, where mutations that can amplify or turn off gene expression often occur in only a minority of samples. In a real and simulated examples, the new method often exhibits lower false discovery rates than simple t-statistic thresholding. We also compare our approach to the recent COPA proposal of Tomlins et al. (2005).
COXPRESdb: a database of coexpressed gene networks in mammals
- Nucleic Acids Res
, 2008
"... in mammals ..."
(Show Context)
Shotgun stochastic search for “large p” regression
- Journal of the American Statistical Association
, 2007
"... Model search in regression with very large numbers of candidate predictors raises challenges for both model specification and computation, and standard approaches such as Markov chain Monte Carlo (MCMC) and step-wise methods are often infeasible or ineffective. We describe a novel shotgun stochastic ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
(Show Context)
Model search in regression with very large numbers of candidate predictors raises challenges for both model specification and computation, and standard approaches such as Markov chain Monte Carlo (MCMC) and step-wise methods are often infeasible or ineffective. We describe a novel shotgun stochastic search (SSS) approach that explores “interesting” regions of the resulting, very high-dimensional model spaces to quickly identify regions of high posterior probability over models. We describe algorithmic and modeling aspects, priors over the model space that induce sparsity and parsimony over and above the traditional dimension penalization implicit in Bayesian and likelihood analyses, and parallel computation using cluster computers. We discuss an example from gene expression cancer genomics, comparisons with MCMC and other methods, and theoretical and simulationbased aspects of performance characteristics in large-scale regression model search. We also provide software implementing the methods.
Model-based clustering for expression data via a Dirichlet process mixture model,” in Bayesian Inference for Gene Expression and Proteomics
, 2006
"... This chapter describes a clustering procedure for microarray expression data based on a well-defined statistical model, specifically, a conjugate Dirichlet process mixture model. The clustering algorithm groups genes whose latent variables governing expression are equal, that is, genes belonging to ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
This chapter describes a clustering procedure for microarray expression data based on a well-defined statistical model, specifically, a conjugate Dirichlet process mixture model. The clustering algorithm groups genes whose latent variables governing expression are equal, that is, genes belonging to the same mixture component. The model is fit with Markov chain Monte Carlo and the computational burden is eased by exploiting conjugacy. This chapter introduces a method to get a point estimate of the true clustering based on least-squares distances from the posterior probability that two genes are clustered. Unlike ad hoc clustering methods, the model provides measures of uncertainty about the clustering. Further, the model automatically estimates the number of clusters and quantifies uncertainty about this important parameter. The method is compared to other clustering methods in a simulation study. Finally, the method is demonstrated with actual microarray data.
Relationship between gene expression and observed intensities in DNA microarrays--a modeling study. Nucleic Acids Res
, 2006
"... A theoretical study of the physical properties which determine the variation in signal strength from probe to probe on a microarray is presented. A model which incorporates probe-target hybridization, as well as the subsequent dissociation which occurs during stringent washing of the microarray, is ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
A theoretical study of the physical properties which determine the variation in signal strength from probe to probe on a microarray is presented. A model which incorporates probe-target hybridization, as well as the subsequent dissociation which occurs during stringent washing of the microarray, is introduced and shown to reasonably describe publicly available spike-in experiments carried out at Affymetrix. In particular, this model suggests that probe-target dissociation during the stringent wash plays a critical role in determining the observed hybridization inten-sities. In addition, it is demonstrated that non-specific hybridization introduces uncertainties which signifi-cantly limit the ability of any model to accurately quantify absolute gene expression levels while, in contrast, target folding appears to have little effect on these results. Finally, for data from target spike-in experiments, our model is shown to compare favor-ably with an existing statistical model in determining target concentration levels.
Sequence dependence of cross-hybridization on short oligo microarrays
- Nucleic Acids Research
"... One of the critical problems in the short oligo microar-ray technology ishow to deal with cross-hybridization that produces spurious data. Little is known about the details of cross-hybridization effect at molecular level. Here, we report a free energy analysis of cross-hybridization on short oligo ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
One of the critical problems in the short oligo microar-ray technology ishow to deal with cross-hybridization that produces spurious data. Little is known about the details of cross-hybridization effect at molecular level. Here, we report a free energy analysis of cross-hybridization on short oligo microarrays using data from a spike-in study. Our analysis revealed that cross-hybridization on the arrays is mostly caused by oligo fragments with a run of 10–16 nt comple-mentary to the probes. Mismatches were estimated to be energetically much more costly in cross-hybridization than that in gene-specific hybridization, implying that the sources of cross-hybridization must be very different between a PM–MM probe pair. Consequently, it is unreliable to use MM probe signal to track cross-hybridizing signal on a corresponding PM probe. Our results also showed that the oligo frag-ments tend to bind to the 50 ends of the probes, and are rarely seen at the 30 ends. These results are useful for microarray design and data analysis.
Using control genes to correct for unwanted variation in microarray data
- In: Biostatistics
, 2011
"... Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted vari-ation. Several of these methods rely on factor analysis to infer the unwanted variation from the da ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted vari-ation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biologi-cal variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method “Remove Unwanted Variation, 2-step ” (RUV-2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV-2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differen-tially expressed with respect to gender in the brain and find that RUV-2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV-2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.
A comparison of statistical methods for analysis of high density oligonucleotide array data
- Bioinformatics
, 2003
"... Motivation: Gene expression profiling has become an invaluable tool in functional genomics. A wide variety of statistical methods have been employed to analyze the data generated in experiments using Affymetrix GeneChip R © microarrays. It is important to understand the relative performance of these ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
Motivation: Gene expression profiling has become an invaluable tool in functional genomics. A wide variety of statistical methods have been employed to analyze the data generated in experiments using Affymetrix GeneChip R © microarrays. It is important to understand the relative performance of these methods in terms of accuracy in detecting and quantifying relative gene expression levels and changes in gene expression. Results: Three different analysis approaches have been compared in this work: non-parametric statistical methods implemented in Affymetrix Microarray Analysis Suite v5.0 (MAS5); an error-modeling based approach implemented in Rosetta Resolver R © v3.1; and an intensity-modeling approach implemented in dChip v1.1. A Latin Square data set generated and made available by Affymetrix was used in the comparison. All three methods—Resolver, MAS5 and the version of dChip based on the difference between perfect match and mismatch intensities—perform well in quantifying gene expression. Presence calls made by MAS5 and Resolver perform well at high concentrations, but they cannot be relied upon at low concentrations. The performance of Resolver and MAS5 in detecting 2-fold changes in transcript concentration is superior to that of dChip. At a comparable false positive rate, Resolver and MAS5 are able to detect many more true changes in transcript concentration. Estimated fold changes calculated by all the methods are biased below the true values. Contact: