Results 1 - 10
of
10
iNuc-PhysChem: A sequence-based predictor for identifying nucleosomes via physicochemical properties
- PLoS One 2012
"... Nucleosome positioning has important roles in key cellular processes. Although intensive efforts have been made in this area, the rules defining nucleosome positioning is still elusive and debated. In this study, we carried out a systematic comparison among the profiles of twelve DNA physicochemical ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
Nucleosome positioning has important roles in key cellular processes. Although intensive efforts have been made in this area, the rules defining nucleosome positioning is still elusive and debated. In this study, we carried out a systematic comparison among the profiles of twelve DNA physicochemical features between the nucleosomal and linker sequences in the Saccharomyces cerevisiae genome. We found that nucleosomal sequences have some position-specific physicochemical features, which can be used for in-depth studying nucleosomes. Meanwhile, a new predictor, called iNuc-PhysChem, was developed for identification of nucleosomal sequences by incorporating these physicochemical properties into a 1788-D (dimensional) feature vector, which was further reduced to a 884-D vector via the IFS (incremental feature selection) procedure to optimize the feature set. It was observed by a cross-validation test on a benchmark dataset that the overall success rate achieved by iNuc-PhysChem was over 96 % in identifying nucleosomal or linker sequences. As a web-server, iNuc-PhysChem is freely accessible to the public at
The organization of nucleosomes around splice sites
- Nucleic Acids Res
, 2010
"... The occupancy of nucleosomes along chromosome is a key factor for gene regulation. However, except promoter regions, genome-wide properties and functions of nucleosome organization remain unclear in mammalian genomes. Using the com-putational model of Increment of Diversity with Quadratic Discrimina ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
The occupancy of nucleosomes along chromosome is a key factor for gene regulation. However, except promoter regions, genome-wide properties and functions of nucleosome organization remain unclear in mammalian genomes. Using the com-putational model of Increment of Diversity with Quadratic Discriminant (IDQD) trained from the microarray data, the nucleosome occupancy score (NOScore) was defined and applied to splice junction regions of constitutive, cassette exon, alternative 30 and 50 splicing events in the human genome. We found an interesting relation between NOScore and RNA splicing: exon regions have higher NOScores compared with their flanking intron sequences in both constitutive and alterna-tive splicing events, indicating the stronger nucleosome occupation potential of exon regions. In addition, NOScore valleys present at 25bp upstream of the acceptor site in all splicing events. By defining folding diversity-to-energy ratio to describe RNA structural flexibility, we demonstrated that primary RNA transcripts from nucleosome occupancy regions are relatively rigid and those from nucleosome depleted regions are relatively flexible. The negative correlation between nucleo-some occupation/depletion of DNA sequence and structural flexibility/rigidity of its primary transcript around splice junctions may provide clues to the deeper understanding of the unexpected role for nucleosome organization in the regulation of RNA splicing.
Perspective Bioinformatics in China: A Personal Perspective
"... In this personal perspective, we recall the history of bioinformatics and computational biology in China, review current research and education, and discuss future prospects and challenges. The field of bioinformatics in China has grown significantly in the past decade despite a delayed and patchy s ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
In this personal perspective, we recall the history of bioinformatics and computational biology in China, review current research and education, and discuss future prospects and challenges. The field of bioinformatics in China has grown significantly in the past decade despite a delayed and patchy start at the end of the 1980s by a few scientists from other disciplines, most noticeably physics and mathematics, where China’s traditional strength has been. In the late 1990s and early 2000s, rapid expansion of the field was fueled by the Internet boom and genomics boom worldwide and in China. Today bioinformatics research in China is characterized by a great variety of biological questions addressed and the close collaborative efforts between computational scientists and biologists, with a full spectrum of focuses ranging from database building and algorithm development to hypothesis generation and biological discoveries. Although challenges remain, the future of bioinformatics in China is promising thanks to advances in both computing infrastructure and experimental biology research, a steady increase of governmental funding, and most importantly a critical mass of bioinformatics scientists consisting of not only converts from other disciplines but also formally trained overseas returnees and a new generation of domestically trained bioinformatics Ph.D.s.
Perspective Bioinformatics in China: A Personal Perspective
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Title of Dissertation: FEATURE GENERATION AND ANALYSIS APPLIED TO SEQUENCE CLASSIFICATION FOR SPLICE-SITE PREDICTION
"... Sequence classification is an important problem in many real-world applications. Sequence data often contain no explicit "signals, " or features, to enable the construction of classification algorithms. Extracting and interpreting the most useful features is challenging, and hand construct ..."
Abstract
- Add to MetaCart
(Show Context)
Sequence classification is an important problem in many real-world applications. Sequence data often contain no explicit "signals, " or features, to enable the construction of classification algorithms. Extracting and interpreting the most useful features is challenging, and hand construction of good features is the basis of many classification algorithms. In this thesis, I address this problem by developing a feature-generation algorithm (FGA). FGA is a scalable method for automatic feature generation for sequences; it identifies sequence components and uses domain knowledge, systematically constructs features, explores the space of possible features, and identifies the most useful ones. In the domain of biological sequences, splice-sites are locations in DNA sequences that signal the boundaries between genetic information and intervening non-coding regions. Only when splice-sites are identified with nucleotide precision can the genetic information be translated to produce functional proteins. In this thesis, I address this fundamental process by developing a highly accurate splice-site prediction model that employs our sequence feature-generation framework. The FGA
International Journal of Pattern Recognition and Artificial Intelligence c ○ World Scientific Publishing Company FAST FEATURE SUBSET SELECTION IN BIOLOGICAL SEQUENCE ANALYSIS
"... Motivation: Biological research produces a wealth of measured data. Neither it is easy for biologists to postulate hypotheses about the behaviour or structure of the observed entity because the relevant properties measured are not seen in the ocean of measurements. Nor it is easy to design machine l ..."
Abstract
- Add to MetaCart
Motivation: Biological research produces a wealth of measured data. Neither it is easy for biologists to postulate hypotheses about the behaviour or structure of the observed entity because the relevant properties measured are not seen in the ocean of measurements. Nor it is easy to design machine learning algorithms to classify or cluster the data items for the same reason. Algorithms for automatically selecting a highly predictive subset of the measured features can help to overcome these difficulties. Results: We present an efficient feature selection strategy which can be applied to arbitrary feature selection problems. The core technique is a new method for estimating the quality of subsets from previously calculated qualities for smaller subsets by minimising the mean standard error of estimated values with an approach common to support vector machines. This method can be integrated in many feature subset search algorithms. We have applied it with sequential search algorithms and have been able to reduce the number of quality calculations for finding accurate feature subsets by about 70%. We show these improvements by applying our approach to the problem of finding highly predictive feature subsets for transcription factor binding sites.
ABSTRACT CHUANHUA XING. The Analysis and Identification of Protein-coding Sequences for Yeast Using a Free Energy Model. (Under direction of Dr. Donald L. Bitzer and
"... Biological systems are information rich systems. This means that it is reasonable to propose signal processing techniques to detect, extract, and decode the information provided by biological systems. Free energy is used to measure the interactions of molecules in my research. If a biological proces ..."
Abstract
- Add to MetaCart
(Show Context)
Biological systems are information rich systems. This means that it is reasonable to propose signal processing techniques to detect, extract, and decode the information provided by biological systems. Free energy is used to measure the interactions of molecules in my research. If a biological process consists of molecular interactions along a time or space (position) continuum, a variable free energy pattern could be produced in which the variation is the physical manifestation of the encoded infor-mation, or signal. Signal processing techniques can possibly be used to extract in-formation from these signals. In my dissertation, I used signal processing approaches to analyze and identify DNA sequences that encode protein coding information and splice sites in pre-mRNA molecules. In the first part of my dissertation, I used free energy to measure the interaction of the 3 ’ tail of 18S rRNA and mRNA for detecting the period-3 signal in coding regions. The extraction of the period-3, free energy signal using signal processing techniques was used to analyze and identify protein-coding sequences. Two species were tested, including Saccharomyces cerevisiae (S. cerevisiae) and Schizosaccharomyces pombe (S. pombe). The experiments produced
unknown title
, 2006
"... Splice site prediction using stochastic regular grammars 105 Splice site prediction using stochastic regular grammars ..."
Abstract
- Add to MetaCart
(Show Context)
Splice site prediction using stochastic regular grammars 105 Splice site prediction using stochastic regular grammars
unknown title
, 2009
"... Analysis and prediction of exon, intron, intergenic region and splice sites for A. thaliana and C. elegans genomes ..."
Abstract
- Add to MetaCart
(Show Context)
Analysis and prediction of exon, intron, intergenic region and splice sites for A. thaliana and C. elegans genomes