Results 1 - 10
of
19
Don’t be afraid of simpler patterns
- 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD
, 2006
"... Abstract. This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns accor ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Abstract. This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns according to a convex criterion, for the pattern languages of itemsets, multi-itemsets, sequences, trees and graphs. The criteria used in our investigation are the typical ones in data mining: computational cost and predictive accuracy and the domain is that of mining molecular graph databases. More specifically, we provide empirical answers to the following questions: how does the expressive power of the language affect the computational cost? and what is the trade-off between expressiveness of the pattern language and the predictive accuracy of the learned model? While answering the first question, we also introduce a novel stepwise approach to correlated pattern mining in which the results of mining a simpler pattern language are employed as a starting point for mining in a more complex one. This stepwise approach typically leads to significant speed-ups (up to a factor 1000) for mining graphs. 1
Integrating global proteomic and genomic expression profiles generated from islet alpha cells: opportunities and challenges to deriving reliable biological inferences
- Mol Cell Proteomics
, 2005
"... Systematic profiling of expressed gene products represents a promising research strategy for elucidating the molecular phenotypes of islet cells. To this end, we have combined complementary genomic and proteomic methods to better assess the molecular composition of murine pancreatic islet glucagon-p ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Systematic profiling of expressed gene products represents a promising research strategy for elucidating the molecular phenotypes of islet cells. To this end, we have combined complementary genomic and proteomic methods to better assess the molecular composition of murine pancreatic islet glucagon-producing �TC-1 cells as a model system, with the expectation of bypassing limitations inherent to either technology alone. Gene expression was measured with an Affymetrix MG_U74Av2 oligonucleotide array, while protein expression was examined by performing high-resolution gel-free shotgun MS/MS on a nuclear-enriched cell extract. Both analyses were carried out in triplicate to control for experimental variability. Using a stringent detection p value cutoff of 0.04, 48 % of all potential mRNA transcripts were predicted to be expressed
Automation, and Statistical Learning for Proteomics
- in Focus on Robotics and Intelligent Systems Research
, 2005
"... During the era of the Human Genome Project [1], the emphasis was on sequencing and annotating individual genes. At that time, the number of estimated human genes was thought to be 100 thousand genes. Yet, as the human genome project draws to a close [2], recent work has decreased the estimate to bet ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
During the era of the Human Genome Project [1], the emphasis was on sequencing and annotating individual genes. At that time, the number of estimated human genes was thought to be 100 thousand genes. Yet, as the human genome project draws to a close [2], recent work has decreased the estimate to between 20-25 thousand not far from the number of genes in a simple worm (i.e. C. elegans). Thus, the complex engineering of a human must be from other areas such as the interactions of the gene’s products, or proteins. Given this, the field of proteomics has quickly been drawn to center stage. While biologists seek to study proteins, methods have been rather primitive until recently. A sudden surge of engineering and other technical talent has led this field and associated research to grow dramatically in the last couple of years. In this chapter, the topic of proteomics is introduced to an engineering/technical audience with an emphasis on the robotics and intelligent systems technologies used in this field. These include issues in protein extraction, separation, and identification. The associated analysis algorithms and statistical learning methods are also discussed. Two case studies regarding the above topics are then explored. Lastly, the future direction of the field and its challenges are delineated. Clinical applications of proteomics such as cancer diagnosis and drug discovery are expounded upon as relevant.
Tuning Text Classification for Hereditary Diseases with Section Weighting
- JOURNAL ON COMMUNICATIONS
, 1994
"... Motivation: Information in life science publications is heterogeneously distributed over various sections. Depending on research questions, different sections cover more or less of the data needed to answer them. Our approach, called section weighting, seeks to make use of information coverage and d ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Motivation: Information in life science publications is heterogeneously distributed over various sections. Depending on research questions, different sections cover more or less of the data needed to answer them. Our approach, called section weighting, seeks to make use of information coverage and density found in typical life science publications. We study the impact of section weighting on text classification according to hereditary diseases. Results: Our results indicate that weighting sections can improve text classification. Our systems gain 7% in F1-measure when we add section weighting. Proper composition of features is equally crucial, improving our results by 11%. Combining both techniques, the system yields a performance 18 % higher than the baseline classifier. For our research question, favoring the sections Abstract, Introduction, and Materials and Methods yields the best results.
Benchmarking of linear and nonlinear approaches for quantitative structure-property relationship studies of metal complexation with ionophores
"... property relationships (QSPR) of stability constants logK1 for the 1:1 (M:L) and log�2 for 1:2 complexes of metal cations Ag + and Eu 3+ with diverse sets of organic molecules in water at 298 K and ionic strength 0.1 M. The methods were tested on three types of descriptors: molecular descriptors inc ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
property relationships (QSPR) of stability constants logK1 for the 1:1 (M:L) and log�2 for 1:2 complexes of metal cations Ag + and Eu 3+ with diverse sets of organic molecules in water at 298 K and ionic strength 0.1 M. The methods were tested on three types of descriptors: molecular descriptors including E-state values, counts of atoms determined for E-state atom types, and substructural molecular fragments (SMF). Comparison of the models was performed using a 5-fold external cross-validation procedure. Robust statistical tests (bootstrap and Kolmogorov-Smirnov statistics) were employed to evaluate the significance of calculated models. The Wilcoxon signed-rank test was used to compare the performance of methods. Individual structure-complexation property models obtained with nonlinear methods demonstrated a significantly better performance than the models built using multilinear regression analysis (MLRA). However, the averaging of several MLRA models based on SMF descriptors provided as good of a prediction as the most efficient nonlinear techniques. Support Vector Machines and Associative Neural Networks contributed in the largest number of significant models. Models based on fragments (SMF descriptors and E-state counts) had higher prediction ability than those based on E-state indices. The use of SMF descriptors and E-state counts provided
unknown title
, 2007
"... Software Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines ..."
Abstract
- Add to MetaCart
Software Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines
BMC Systems Biology BioMed Central Research article Seeded Bayesian Networks: Constructing genetic networks from microarray data
, 2008
"... which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background: DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the compl ..."
Abstract
- Add to MetaCart
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background: DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing large-scale expression data has been to extract biologically meaningful inferences regarding these processes – often represented as networks – in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or protein-protein interaction data as seeds for a Bayesian network analysis of microarray results. Results: Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify high-confidence
BMC Cancer BioMed Central
, 2007
"... Research article Bcl11b mutations identified in murine lymphomas increase the proliferation rate of hematopoietic progenitor cells ..."
Abstract
- Add to MetaCart
Research article Bcl11b mutations identified in murine lymphomas increase the proliferation rate of hematopoietic progenitor cells
REVIEW Evaluation of Combining Several Statistical Methods with a Flexible Cutoff for Identifying Differentially Expressed Genes in Pairwise Comparison of EST Sets
"... Abstract: The detection of differentially expressed genes from EST data is of importance for the discovery of potential biological or pharmaceutical targets, especially when studying biological processes in less characterized organisms and where large-scale microarrays are not an option. We present ..."
Abstract
- Add to MetaCart
Abstract: The detection of differentially expressed genes from EST data is of importance for the discovery of potential biological or pharmaceutical targets, especially when studying biological processes in less characterized organisms and where large-scale microarrays are not an option. We present a comparison of five different statistical methods for identifying up-regulated genes through pairwise comparison of EST sets, where one of the sets is generated from a treatment and the other one serves as a control. In addition, we specifically address situations where the sets are relatively small (~2,000– 10,000 ESTs) and may differ in size. The methods were tested on both simulated and experimentally derived data, and compared to a collection of cold stress induced genes identified by microarrays. We found that combining the method proposed by Audic and Claverie with Fisher’s exact test and a method based on calculating the difference in relative frequency was the best combination for maximizing the detection of up-regulated genes. We also introduced the use of a flexible cutoff, which takes the size of the EST sets into consideration. This could be considered as an alternative to a static cutoff. Finally, the detected genes showed a low overlap with those identified by microarrays, which indicates, as in previous studies, low overall concordance between the two platforms.

