Results 1 - 10
of
10
Molecular Biology for Computer Scientists
, 1993
"... ind as you read this: for every generalization I make about biology, there may well be thousands of exceptions. There are a lot of living things in the world, and precious few generalizations hold true for all of them. I will try to cover the principles; try to keep the existence of exceptions in mi ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
ind as you read this: for every generalization I make about biology, there may well be thousands of exceptions. There are a lot of living things in the world, and precious few generalizations hold true for all of them. I will try to cover the principles; try to keep the existence of exceptions in mind as you read. Another thing to remember is that an important part of understanding biology is learning its language. Biologists, like many scientists, use technical terms in order to be precise about reference. Getting a grasp on this terminology makes a great deal of the biological literature accessible to the non-specialist. The notes contain information about terminology and other basic matters. With that, let's begin at the beginning. 1. What Is Life? No simple definition of what it is to be a living thing captures our intuitions about what is alive and what is not. The central feature of life is its ability to reproduce itself. Reproductive ability alone is not enoug
An Information Model For Genome Map Representation And Assembly
, 1993
"... In this paper, we focus on some of the scientific data management problems faced by the Human Genome Project. In particular, we describe the design of an information model for the physical contig map assembly task. First, we present an object-oriented data schema that captures genomic data and their ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
In this paper, we focus on some of the scientific data management problems faced by the Human Genome Project. In particular, we describe the design of an information model for the physical contig map assembly task. First, we present an object-oriented data schema that captures genomic data and their relationships required for this task, including the raw experimental data and the derived data gotten through analysis. Our genome object representation efficiently supports the maintenance of unordered, partially ordered, and completely ordered sets of data based on an overlap refinement hierarchy of interval relationships. We describe operators we have developed to automate analysis steps currently performed manually by the scientists. Examples are operators for inverting local orientation frames, for combining information in different frames into another more informative one, and for inferring additional overlap information using transitivity rules. In conclusion, we provide a walk-throu...
A Constraint Based Structure Description Language for Biosequences
, 1997
"... We report an investigation into how constraint solving techniques can be used to search for patterns in sequences (or strings) of symbols over a finite alphabet. We define a constraint-based structure description language for biosequences, and give the definition of an algorithm to solve the stru ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We report an investigation into how constraint solving techniques can be used to search for patterns in sequences (or strings) of symbols over a finite alphabet. We define a constraint-based structure description language for biosequences, and give the definition of an algorithm to solve the structure searching problem as a CSP. The methodology which we have developed is able to describe two-dimensional structure of biosequences, such as tandem repeats, stem loops, palindromes and pseudo-knots. We also report on an implementation of the language in the constraint logic programming language clp(FD), with test results of a simple searching algorithm, and results from a preliminary implementation in C++ using consistency checking techniques from solving CSP. Keywords: constraints, biostructures, description language, searching. 1 Introduction The aim of the work described in this paper is to use constraint solving techniques to search for structural patterns in sequences (or st...
Good Maps are Straight
- Proc. 4th Int. Conf. on Intelligent Systems for Molecular Biology
, 1996
"... This paper proposes a simplified approach to the assembly of large physical genome maps. The approach focuses on two key problems: (i) the integration of diverse forms of data from numerous sources, and (ii) the detection and removal of errors and anomalies in the data. The approach simplifies map a ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper proposes a simplified approach to the assembly of large physical genome maps. The approach focuses on two key problems: (i) the integration of diverse forms of data from numerous sources, and (ii) the detection and removal of errors and anomalies in the data. The approach simplifies map assembly by dividing it into three phases---overlap, linkage and ordering. In the first phase, all forms of overlap data are integrated into a simple abstract structure, called clusters, where each cluster is a set of mutually-overlapping DNA segments. This phase filters out many questionable overlaps in the mapping data. In the second phase, clusters are linked together into a weighted intersection graph. False links between widely separated regions of the genome show up as crooked, branching structures in the graph. Removing these false links produces graphs that are straight, reflecting the linear structure of chromosomes. From these straight graphs, the third phase constructs a physical m...
The Complexity of Gene Placement
"... We focus on algorithmic problems related to deriving gene locations on DNA sequences of closely related species by using comparative mapping data. Conventional genetic mapping generates intervals on the genetic sequence of given species for potential gene positions. The simultaneous analysis of gene ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We focus on algorithmic problems related to deriving gene locations on DNA sequences of closely related species by using comparative mapping data. Conventional genetic mapping generates intervals on the genetic sequence of given species for potential gene positions. The simultaneous analysis of gene intervals in related species, e.g., man and mouse, may eliminate some of the ambiguities and lead to better estimates of gene locations. We address the problem of eliminating the ambiguities in gene orders by means of minimizing the number of conserved (synteny) regions among the species. We first show that the gene ordering problem is hard: there is no polynomial-time approximation scheme unless P = NP, even under the restrictions that: (1) the order of genes in one of the species is known, or (2) at most two intervals overlap at any location on the map of any of the species. Then we provide two polynomial-time algorithms under restriction (1) above; the first approximates the problem wit...
Using Temporal Reasoning for Genome Map Assembly
- In Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology
, 1995
"... Genomic maps are an indispensable tool for molecular biologists; their modelling has to take into account representation as well as computational issues. The algorithmic complexity of the assembly task is already huge and is even made worse when one wishes to deal with inconsistencies and provide ge ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Genomic maps are an indispensable tool for molecular biologists; their modelling has to take into account representation as well as computational issues. The algorithmic complexity of the assembly task is already huge and is even made worse when one wishes to deal with inconsistencies and provide generic tools. This work presents an algorithm tackling the assembly problem by using temporal reasoning techniques. The algorithm has to transform the initial input data, i.e. qualitative and quantitative relations between entities that appear on the maps, so that temporal reasoning algorithms can be applied successfully; this is achieved by performing a partition of these relations upon their relative orientation, creating islets of relations in which reasoning mechanisms are applied. The implementation of the algorithm is based on a temporal reasoning software, taken as is, which gives a high genericity since any improvement in this software (such as efficiency or the management of flexible...
An Active Oodb System For Genome Physical Map Assembly
- Information Systems, Special Issue on Scientific Databases
, 1994
"... In this paper, we describe the design and implementation of a scientific database for the map assembly tasks performed by the geneticists at the University of Michigan Human Genome Center. Our system not only manages complex genomic data, but it also supports the automation of the associated map ass ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, we describe the design and implementation of a scientific database for the map assembly tasks performed by the geneticists at the University of Michigan Human Genome Center. Our system not only manages complex genomic data, but it also supports the automation of the associated map assembly tasks. For the former, we present a genomic object model that integrates both experimental and derived data. For the latter, we describe operators to automate some of the analysis steps, such as inferring overlap information using transitivity rules. Strongly motivated by the need of the physical map assembly task for both inferencing capabilities as well as object modeling power, we have designed and implemented an active object-oriented database (OODB) system, called Crystal, on the GemStone OODB. Crystal seamlessly integrates rule inferencing with complex object modeling and other typical database capabilities, thus avoiding the overhead in moving data between systems for rule proce...
Revealing Hidden Interval Graph Structure in STS-Content Data
- Bioinformatics
, 1998
"... Motivation: STS-content data for genomic mapping contain numerous errors and anomalies resulting in cross-links among distant regions of the genome. Identification of contigs within the data is an important and difficult problem. Results: This paper introduces a graph algorithm which creates a simp ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Motivation: STS-content data for genomic mapping contain numerous errors and anomalies resulting in cross-links among distant regions of the genome. Identification of contigs within the data is an important and difficult problem. Results: This paper introduces a graph algorithm which creates a simplified view of STS-content data. The shape of the resulting structure graph provides a quality check --- coherent data produce a straight line, while anomalous data produce branches and loops. In the latter case, it is sometimes possible to disentangle the various paths into subsets of the data covering contiguous regions of the genome, i.e., contigs. These straight subgraphs can then be analyzed in standard ways to construct a physical map. A theoretical basis for the method is presented along with examples of its application to current STS data from human genome centers. Availability: Freely available on request. Contact: eharley@cs.toronto.edu Introduction Two epic efforts to construct...
Physical Map Assembler: An Active OODB System for Human Genome Applications
"... We describe the design and implementation of a scientific database for the map assembly tasks performed by the geneticists at the University of Michigan Human Genome Center. Our system manages complex genomic data and supports the automation of the associated map assembly tasks. For the former, we p ..."
Abstract
- Add to MetaCart
We describe the design and implementation of a scientific database for the map assembly tasks performed by the geneticists at the University of Michigan Human Genome Center. Our system manages complex genomic data and supports the automation of the associated map assembly tasks. For the former, we present a genomic object model that integrates both experimental and derived data. For the latter, we describe operators to automate some of the analysis steps. To develop a framework for implementing our rule-based approach to physical mapping, we have designed and implemented an active object-oriented database (OODB) system, called Crystal, on GemStone. Crystal seamlessly integrates inference capabilities with complex object modeling and other typical database capabilities as required for physical mapping. We also discuss the implementation of a physical map assembly tool on top of Crystal. In conclusion, we provide a walk-through example that demonstrates how our approach can be used to ef...
Applying an Association Rule Discovery Algorithm to Multipoint Linkage Analysis
"... Knowledge discovery in large databases (KDD) is being performed in several application domains, for example, the analysis of sales data, and is expected to be applied to other domains. We propose a KDD approach to multipoint linkage analysis, which is a way of ordering loci on a chromosome. Strict m ..."
Abstract
- Add to MetaCart
Knowledge discovery in large databases (KDD) is being performed in several application domains, for example, the analysis of sales data, and is expected to be applied to other domains. We propose a KDD approach to multipoint linkage analysis, which is a way of ordering loci on a chromosome. Strict multipoint linkage analysis based on maximum likelihood estimation is a computationally tough problem. So far various kinds of approximate methods have been implemented. Our method based on the discovery of association between genetic recombinations is so di erent from others that it is useful to recheck the result of them. In this paper, we describe how to apply the framework of association rule discovery to linkage analysis, and also discuss that ltering input data and interpretation of discovered rules after data mining are practically important as well as data mining process itself. 1

