Results 1  10
of
16
On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching
, 2010
"... Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least o ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least one such match exists. So, for example for the alphabet Σ = {a, b, c}, the string s = abaccbabaaa has Parikh vector p(s) = (6, 3, 2), and the Parikh vector q = (2, 1, 1) appears once in s in position (1, 4). Like its more precise counterpart, the renown Exact String Matching, Jumbled Pattern Matching has ubiquitous applications, e.g., string matching with a dyslectic word processor, table rearrangements, anagram checking, Scrabble playing and, allegedly, also analysis of mass spectrometry data. We consider two simple algorithms for Jumbled Pattern Matching and use very complicated data structures and analytic tools to show that they are not worse than the most obvious algorithm. We also show that we can achieve nontrivial efficient average case behavior, but that’s less fun to describe in this abstract so we defer the details to the main part of the article, to be read at the reader’s risk... well, at the reader’s discretion.
Computing Fragmentation Trees from Metabolite Multiple Mass Spectrometry Data
"... Abstract. Since metabolites cannot be predicted from the genome sequence, highthroughput denovo identification of small molecules is highly sought. Mass spectrometry (MS) in combination with a fragmentation technique is commonly used for this task. Unfortunately, automated analysis of such data is ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Since metabolites cannot be predicted from the genome sequence, highthroughput denovo identification of small molecules is highly sought. Mass spectrometry (MS) in combination with a fragmentation technique is commonly used for this task. Unfortunately, automated analysis of such data is in its infancy. Recently, fragmentation trees have been proposed as an analysis tool for such data. Additional fragmentation steps (MS n) reveal more information about the molecule. We propose to use MS n data for the computation of fragmentation trees, and present the Colorful Subtree Closure problem to formalize this task: There, we search for a colorful subtree inside a vertexcolored graph, such that the weight of the transitive closure of the subtree is maximal. We give several negative results regarding the tractability and approximability of this and related problems. We then present an exact dynamic programming algorithm, which is parameterized by the number of colors in the graph and is swift in practice. Evaluation of our method on a dataset of 45 reference compounds showed that the quality of constructed fragmentation trees is improved by using MS n instead of MS 2 measurements. This is a preprint of:
Faster mass decomposition
"... Abstract. Metabolomics complements investigation of the genome, transcriptome, and proteome of an organism. Today, the vast majority of metabolites remain unknown, in particular for nonmodel organisms. Mass spectrometry is one of the predominant techniques for analyzing small molecules such as met ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. Metabolomics complements investigation of the genome, transcriptome, and proteome of an organism. Today, the vast majority of metabolites remain unknown, in particular for nonmodel organisms. Mass spectrometry is one of the predominant techniques for analyzing small molecules such as metabolites. A fundamental step for identifying a small molecule is to determine its molecular formula. Here, we present and evaluate three algorithm engineering techniques that speed up the molecular formula determination. For that, we modify an existing algorithm for decomposing the monoisotopic mass of a molecule. These techniques lead to a fourfold reduction of running times, and reduce memory consumption by up to 94%. In comparison to the classical search tree algorithm, our algorithm reaches a 1000fold speedup. 1
Determination of Glycan Structure from Tandem Mass Spectra
"... Abstract. Glycans are molecules made from simple sugars that form complex tree structures. Glycans constitute one of the most important protein modifications, and identification of glycans remains a pressing problem in biology. Unfortunately, the structure of glycans is hard to predict from the geno ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Glycans are molecules made from simple sugars that form complex tree structures. Glycans constitute one of the most important protein modifications, and identification of glycans remains a pressing problem in biology. Unfortunately, the structure of glycans is hard to predict from the genome sequence of an organism. We consider the problem of deriving the topology of a glycan solely from tandem mass spectrometry data. We want to generate glycan tree candidates that sufficiently match the sample mass spectrum. Unfortunately, the resulting problem is known to be computationally hard. We present an efficient exact algorithm for this problem based on fixedparameter algorithmics, that can process a spectrum in a matter of seconds. We also report some preliminary results of our method on experimental data. We show that our approach is fast enough in applications, and that we can reach very good de novo identification results. Finally, we show how to compute the number of glycan topologies of a given size. 1
Inferring Peptide Composition from Molecular Formulas
"... Abstract. With the advent of novel mass spectrometry techniques such as Orbitrap MS, it is possible to determine the exact molecular formula of an unknown molecule solely from its isotope pattern. But for protein mass spectrometry, one is facing the problem that many peptides have exactly the same m ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. With the advent of novel mass spectrometry techniques such as Orbitrap MS, it is possible to determine the exact molecular formula of an unknown molecule solely from its isotope pattern. But for protein mass spectrometry, one is facing the problem that many peptides have exactly the same molecular formula even when ignoring the order of amino acids. In this work, we present an efficient method to determine the amino acid composition of an unknown peptide solely from its molecular formula. Our solution is based on efficiently enumerating all solutions of the multidimensional equality constrained integer knapsack problem. 1
The single item uncapacitated lotsizing problem with timedependent batch sizes: NPhard and polynomial cases
, 2013
"... This paper considers the uncapacitated lot sizing problem with batch delivery, focusing on the general case of timedependent batch sizes. We study the complexity of the problem, depending on the other cost parameters, namely the setup cost, the fixed cost per batch, the unit procurement cost and the ..."
Abstract
 Add to MetaCart
This paper considers the uncapacitated lot sizing problem with batch delivery, focusing on the general case of timedependent batch sizes. We study the complexity of the problem, depending on the other cost parameters, namely the setup cost, the fixed cost per batch, the unit procurement cost and the unit holding cost. We establish that if any one of the cost parameters is allowed to be timedependent, the problem is NPhard. On the contrary, if all the cost parameters are stationary, and assuming no unit holding cost, we show that the problem is polynomially solvable in time O(T 3), where T denotes the number of periods of the horizon. We also show that, in the case of divisible batch sizes, the problem with time varying setup costs, a stationary fixed cost per batch and no unit procurement nor holding cost can be solved in time O(T 3 logT).
unknown title
"... SIRIUS: decomposing isotope patterns for metabolite identification † ..."
(Show Context)
Counting glycans revisited
"... Abstract. We present an algorithm for counting glycan topologies of order n that improves on previously described algorithms by a factor n in both time and space. More generally, we provide such an algorithm for counting rooted or unrooted dary trees with labels or masses assigned to the vertices, ..."
Abstract
 Add to MetaCart
Abstract. We present an algorithm for counting glycan topologies of order n that improves on previously described algorithms by a factor n in both time and space. More generally, we provide such an algorithm for counting rooted or unrooted dary trees with labels or masses assigned to the vertices, and we give a “recipe ” to estimate the asymptotic growth of the resulting sequences. We provide constants for the asymptotic growth of dary trees and labeled quaternary trees (glycan topologies). Finally, we show how a classical result from enumeration theory can be used to count glycan structures where edges are labeled by bond types. Our method also improves time bounds for counting alkanes.
unknown title
"... Vol. 24 ECCB 2008, pages i49–i55 doi:10.1093/bioinformatics/btn270 Towards de novo identification of metabolites by analyzing tandem mass spectra ..."
Abstract
 Add to MetaCart
(Show Context)
Vol. 24 ECCB 2008, pages i49–i55 doi:10.1093/bioinformatics/btn270 Towards de novo identification of metabolites by analyzing tandem mass spectra