• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Finding motifs using random projections (2001)

Cached

  • Download as a PDF

Download Links

  • [www.cs.columbia.edu]
  • [www1.cs.columbia.edu]
  • [www.cs.washington.edu]
  • [b.web.umkc.edu]
  • [bipad.cmh.edu]
  • [www.cs.wisc.edu]
  • [research.cs.wisc.edu]
  • [www.quretec.com]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Jeremy Buhler , Martin Tompa
Citations:284 - 6 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Buhler01findingmotifs,
    author = {Jeremy Buhler and Martin Tompa},
    title = {Finding motifs using random projections},
    booktitle = {},
    year = {2001},
    pages = {69--76}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Pevzner and Sze [23] considered a precise version of the motif discovery problem and simultaneously issued an algorithmic challenge: find a motif Å of length 15, where each planted instance differs from Å in 4 positions. Whereas previous algorithms all failed to solve this (15,4)-motif problem, Pevzner and Sze introduced algorithms that succeeded. However, their algorithms failed to solve the considerably more difficult (14,4)-, (16,5)-, and (18,6)motif problems. We introduce a novel motif discovery algorithm based on the use of random projections of the input’s substrings. Experiments on simulated data demonstrate that this algorithm performs better than existing algorithms and, in particular, typically solves the difficult (14,4)-, (16,5)-, and (18,6)-motif problems quite efficiently. A probabilistic estimate shows that the small values of � for which the algorithm fails to recover the planted Ð � �-motif are in all likelihood inherently impossible to solve. We also present experimental results on realistic biological data by identifying ribosome binding sites in prokaryotes as well as a number of known transcriptional regulatory motifs in eukaryotes. 1. CHALLENGING MOTIF PROBLEMS Pevzner and Sze [23] considered a very precise version of the motif discovery problem of computational biology, which had also been considered by Sagot [26]. Based on this formulation, they issued an algorithmic challenge: Planted Ð � �-Motif Problem: Suppose there is a fixed but unknown nucleotide sequence Å (the motif) of length Ð. The problem is to determine Å, givenØ nucleotide sequences each of length Ò, and each containing a planted variant of Å. More precisely, each such planted variant is a substring that is Å with exactly � point substitutions. One instantiation that they labeled “The Challenge Problem ” was parameterized as finding a planted (15,4)-motif in Ø � sequences each of length Ò � �. These values of Ò, Ø, andÐ are

Keyphrases

random projection    motif problem    motif discovery problem    precise version    algorithmic challenge    planted variant    nucleotide sequence    simulated data demonstrate    realistic biological data    present experimental result    instance differs    novel motif discovery algorithm    small value    computational biology    input substring    probabilistic estimate show    challenge problem    unknown nucleotide sequence    point substitution    known transcriptional regulatory motif    ribosome binding site    challenging motif problem pevzner   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University