Results 1  10
of
13
New algorithms for regular expression matching
 In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming
, 2006
"... In this paper we revisit the classical regular expression matching problem, namely, given a regular expression R and a string Q, decide if Q matches one of the strings specified by R. Let m and n be the length of R and Q, respectively. On a standard unitcost RAM with word length w ≥ log n, we show ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
(Show Context)
In this paper we revisit the classical regular expression matching problem, namely, given a regular expression R and a string Q, decide if Q matches one of the strings specified by R. Let m and n be the length of R and Q, respectively. On a standard unitcost RAM with word length w ≥ log n, we show that the problem can be solved in O(m) space with the following running times: m log w w ⎪ ⎨ O(n + m log w) if m> w O(nlog m + mlog m) if √ w < m ≤ w O(min(n + m 2, nlog m + m log m)) if m ≤ √ w. This improves the best known time bound among algorithms using O(m) space. Whenever w ≥ log 2 n it improves all known time bounds regardless of how much space is used. 1
Faster regular expression matching
 Automata, Languages and Programming
, 2009
"... Abstract. Regular expression matching is a key task (and often the computational bottleneck) in a variety of widely used software tools and applications, for instance, the unix grep and sed commands, scripting languages such as awk and perl, programs for analyzing massive data streams, etc. We show ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract. Regular expression matching is a key task (and often the computational bottleneck) in a variety of widely used software tools and applications, for instance, the unix grep and sed commands, scripting languages such as awk and perl, programs for analyzing massive data streams, etc. We show how to solve this ubiquitous task in linear space and O(nm(log log n)/(logn)3/2+n+m) time where m is the length of the expression and n the length of the string. This is the first improvement for the dominant O(nm / logn) term in Myers ’ O(nm / logn+ (n+m) logn) bound [JACM 1992]. We also get improved bounds for external memory. 1
String matching with variable length gaps
 In Proc. 17th SPIRE
, 2010
"... Abstract. We consider string matching with variable length gaps. Given a string T and a pattern P consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in T that match P. This problem is ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We consider string matching with variable length gaps. Given a string T and a pattern P consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in T that match P. This problem is a basic primitive in computational biology applications. Let m and n be the lengths of P and T, respectively, and let k be the number of strings in P. We present a new algorithm achieving time O((n+m) log k+α) and space O(m+A), where A is the sum of the lower bounds of the lengths of the gaps in P and α is the total number of occurrences of the strings in P within T. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of m, n, k, A, and α. Our algorithm is surprisingly simple and straightforward to implement. 1
Realizing a SubLinear Time StringMatching Algorithm With a Hardware Accelerator Using Bloom Filters
"... Abstract—Many network security applications rely on string matching to detect intrusions, viruses, spam, and so on. Since software implementation may not keep pace with the highspeed demand, turning to hardwarebased solutions becomes promising. This work presents an innovative architecture to real ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Many network security applications rely on string matching to detect intrusions, viruses, spam, and so on. Since software implementation may not keep pace with the highspeed demand, turning to hardwarebased solutions becomes promising. This work presents an innovative architecture to realize string matching in sublinear time based on algorithmic heuristics, which come from parallel queries to a set of spaceefficient Bloom filters. The algorithm allows skipping characters not in a match in the text, and in turn simultaneously inspect multiple characters in effect. The techniques to reduce the impact of certain bad situations on performance are also proposed: the badblock heuristic, a linear worstcase time method and a nonblocking interface to hand over the verification job to a verification module. This architecture is simulated with both behavior simulation in C and timing simulation in HDL for antivirus applications. The simulation shows that the throughput of scanning Windows executable files for more than 10 000 virus signatures can achieve 5.64 Gb/s, while the worstcase performance is 1.2 Gb/s if the signatures are properly specified. Index Terms—Algorithms, fieldprogrammable gate arrays (FPGAs), string matching. I.
Methodology for Fast Pattern Matching by Deterministic Finite Automaton with Perfect Hashing Ondrej Lengal
"... AbstractAs the speed of current computer networks increases, it is necessary to protect networks by security systems such as firewalls and Intrusion Detection Systems operating at multigigabit speeds. Pattern matching is the timecritical operation of current IDS on multigigabit networks. Regular ..."
Abstract
 Add to MetaCart
(Show Context)
AbstractAs the speed of current computer networks increases, it is necessary to protect networks by security systems such as firewalls and Intrusion Detection Systems operating at multigigabit speeds. Pattern matching is the timecritical operation of current IDS on multigigabit networks. Regular expressions are often used to describe malicious network patterns. This paper deals with fast regular expression matching using the Deterministic Finite Automaton (DFA) with perfect hash function. We introduce decomposition of the problem on two parts: transformation of the input alphabet and usage of a fast DFA, and usage of perfect hashing to reduce space/speed tradeoff for DFA transition table.
Fast BitParallel Matching for Network and Regular Expressions
, 2010
"... In this paper, we extend the SHIFTAND approach by BaezaYates and Gonnet (CACM 35(10), 1992) to the matching problem for network expressions, which are regular expressions without Kleeneclosure and useful in applications such as bioinformatics and event stream processing. Following the study of ..."
Abstract
 Add to MetaCart
In this paper, we extend the SHIFTAND approach by BaezaYates and Gonnet (CACM 35(10), 1992) to the matching problem for network expressions, which are regular expressions without Kleeneclosure and useful in applications such as bioinformatics and event stream processing. Following the study of Navarro (RECOMB, 2001) on the extended string matching, we introduce new operations called Scatter, Gather, and Propagate to efficiently compute εmoves of the Thompson NFA using the Extended SHIFTAND approach with integer addition. By using these operations and a property called the bimonotonicity of the Thompson NFA, we present an efficient algorithm for the network expression matching that runs in O(ndm/w) time using O(dm) preprocessing and O(dm/w) space, where m and d are the length and the depth of a given network expression, n is the length of an input text, and w is the word length of the underlying computer. Furthermore, we show a modified matching algorithm for the class of regular expressions that runs in O(ndm log(m)/w) time.
Regular Expression Matching Using BitParallelism A Review of the Fundamentals and of Two BitParallel Algorithms
, 2006
"... In this master’s thesis we are concerned with the classical regular expression matching problem, where it must be decided whether some substring of a text string T consisting of n characters matches one of the patterns specified by the regular expression r consisting of m characters. First, we descr ..."
Abstract
 Add to MetaCart
(Show Context)
In this master’s thesis we are concerned with the classical regular expression matching problem, where it must be decided whether some substring of a text string T consisting of n characters matches one of the patterns specified by the regular expression r consisting of m characters. First, we describe the fundamentals of the problem in a practical and historical context. Then two specific algorithms for solving the problem are described in detail. Both algorithms make use of the technique of bitparallelism, which is then described along with the machine model used throughout the thesis. The first algorithm is an O ( mn log s) algorithm, where s is the space used. It was developed in ’04 by Navarro & Raffinot and is one of two algorithms implemented in the grep tool nrgrep; we implement our own version of the first and most basic of these two. The tacit assumption is that m ≤ w (w being the length of the machine word in bits), and we have generalized this complexity for m = O(w) as part of our work. The second algorithm is
FIRE/J Optimizing Regular Expression Searches with Generative Programming
"... Regular expressions are a powerful tool for analyzing and manipulating text. Their theoretical background lies within automata theory and formal languages. The FIRE/J (Fast Implementation of Regular Expressions for Java) regular expression library is designed to provide maximum execution speed, whil ..."
Abstract
 Add to MetaCart
(Show Context)
Regular expressions are a powerful tool for analyzing and manipulating text. Their theoretical background lies within automata theory and formal languages. The FIRE/J (Fast Implementation of Regular Expressions for Java) regular expression library is designed to provide maximum execution speed, while remaining portable across different machine architectures. To achieve that, FIRE/J transforms each regular expression into a tailormade class file, which is compiled directly to Java virtual machine (JVM) bytecodes. The library is compatible with the POSIX standard.
Improving RegularExpression Matching on Strings Using Negative Factors
"... The problem of finding matches of a regular expression (RE) on a string exists in many applications such as text editing, biosequence search, and shell commands. Existing techniques first identify candidates using substrings in the RE, then verify each of them using an automaton. These techniques ..."
Abstract
 Add to MetaCart
(Show Context)
The problem of finding matches of a regular expression (RE) on a string exists in many applications such as text editing, biosequence search, and shell commands. Existing techniques first identify candidates using substrings in the RE, then verify each of them using an automaton. These techniques become inefficient when there are many candidate occurrences that need to be verified. In this paper we propose a novel technique that prunes false negatives by utilizing negative factors, which are substrings that cannot appear in an answer. A main advantage of the technique is that it can be integrated with many existing algorithms to improve their efficiency significantly. We give a full specification of this technique. We develop an efficient algorithm that utilizes negative factors to prune candidates, then improve it by using bit operations to process negative factors in parallel. We show that negative factors, when used together with necessary factors (substrings that must appear in each answer), can achieve much better pruning power. We analyze the large number of negative factors, and develop an algorithm for finding a small number of highquality negative factors. We conducted a thorough experimental study of this technique on real data sets, including DNA sequences, proteins, and text documents, and show the significant performance improvement when applying the technique in existing algorithms. For instance, it improved the search speed of the popular Gnu Grep tool by 11 to 74 times for text documents.