Download:
|
by Eleazar Eskin, Eleazar Eskin, Eran Halperin, Eran Halperin, Richard M. Karp, Richard M. Karp
Journal of Bioinformatics and Computational Biology
http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-02-1196.ps
Add To MetaCart
Abstract:
Each person’s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person’s genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person’s genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site. We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results. 1
Citations
|
512
|
Optimization, Approximation, and Complexity Classes
– Papadimitriou, Yannakakis
- 1991
|
|
511
|
The complexity of theorem-proving procedures
– COOK
|
|
200
|
On the complexity of timetable and multicommodity flow problems
– Even, Itai, et al.
- 1976
|
|
143
|
A new statistical method for haplotype reconstruction from population data
– Stephens, Smith, et al.
- 2000
|
|
114
|
High resolution haplotype structure in the human genome
– Daly, Rioux, et al.
- 2001
|
|
112
|
Approximate max-flow min-(multi)cut theorems and their applications
– Garg, Vazirani, et al.
|
|
93
|
Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution
– Excoffier, Slatkin
- 1995
|
|
82
|
Inference of haplotypes from PCR-amplified samples of diploid populations
– Clark
- 1990
|
|
79
|
On selecting a satisfying truth assignment
– Papadimitriou
- 1991
|
|
79
|
Efficient Algorithms for Inferring Evolutionary Trees
– Gusfield
- 1991
|
|
71
|
Haplotyping as perfect phylogeny: conceptual framework and efficient solutions
– Gusfield
|
|
59
|
Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21
– Patil, Berno, et al.
|
|
45
|
Inference of haplotypes from samples of diploid populations: complexity and algorithms
– Gusfield
|
|
38
|
Haplotyping as perfect phylogeny: A direct approach
– Bafna, Gusfield, et al.
- 2003
|
|
33
|
A linear time algorithm for testing the truth of certain quantified boolean formulas
– Aspvall, Plass, et al.
- 1979
|
|
30
|
Haplo: a program using the em algorithm to estimate the frequencies of multi-site haplotypes
– Hawley, Kidd
- 1995
|
|
30
|
An e-m algorithm and testing strategy for multiple-locus haplotypes
– Long, Williams, et al.
- 1995
|
|
19
|
Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data
– Schork
|
|
16
|
An algorithm for determining whether a given binary matroid is graphic
– Tutte
- 1960
|
|
14
|
A practical algorithm for optimal inference of haplotypes from diploid populations
– Gusfield
- 2000
|
|
8
|
Snps problems, algorithms and complexity, european symposium on algorithms
– Lancia, Bafna, et al.
- 2001
|
|
8
|
approximation and complexity classes
– Optimization
- 1991
|
|
6
|
An almost linear time algorithm for graph realization
– Bixby, Wagner
- 1988
|
|
4
|
A linear time algorithm for testing the truth of certain quantified boolean formulas
– Aspval, Tarjan
- 1979
|
|
3
|
Large scale recovery of haplotypes from genotype data using imperfect phylogeny
– Eskin, Halperin, et al.
- 2003
|
|
2
|
A Itai, and A Shamir. On the complexity of timetable and multicommodity flow problems. SICOMP
– Even
- 1976
|
|
2
|
A practical solution for haplotype mapping. Unpublished Manuscript, 2002. [16] ME Hawley and KK Kidd. Haplo: a program using the em algorithm to estimate the frequencies of multi-site haplotypes
– Halperin, Eskin
- 1995
|