Results 1 -
3 of
3
Tries for Approximate String Matching
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Tries offer text searches with costs which are independent of the size of the document being searched, and so are important for large documents requiring spelling checkers), case insensitivity, and limited approximate regular secondary storage. Approximate searches, in which the search pattern d ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Tries offer text searches with costs which are independent of the size of the document being searched, and so are important for large documents requiring spelling checkers), case insensitivity, and limited approximate regular secondary storage. Approximate searches, in which the search pattern differs from the document by k substitutions, transpositions, insertions or deletions, have hitherto been carried out only at costs linear in the size of the document. We present a trie-based method whose cost is independent of document size. H. Shang and T.H. Merrett are at the School of Computer Science, McGill University, Montr'eal, Qu'ebec, Canada H3A 2A7, Email: fshang, timg@cs.mcgill.ca 100 Our experiments show that this new method significantly outperforms the nearest competitor for k=0 and k=1, which are arguably the most important cases. The linear cost (in k) of the other methods begins to catch up, for our small files, only at k=2. For larger files, complexity arguments i...
Window-Accumulated Subsequence matching Problem is linear
- In Proceedings of the Eighteenth ACM SIGMODSIGACT -SIGART Symposium on Principles of Database Systems: PODS 1999, ACM
, 1999
"... Given two strings, text t of length n, and pattern p = p1 : : : pk of length k, and given a natural number w, the subsequence matching problem consists in finding the number of size w windows of text t which contain pattern p as a subsequence, i.e. the letters p1 ; : : : ; pk occur in the window, i ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Given two strings, text t of length n, and pattern p = p1 : : : pk of length k, and given a natural number w, the subsequence matching problem consists in finding the number of size w windows of text t which contain pattern p as a subsequence, i.e. the letters p1 ; : : : ; pk occur in the window, in the same order as in p, but not necessarily consecutively (they may be interleaved with other letters). Subsequence matching is used for finding frequent patterns and association rules in databases. We generalize the Knuth-Morris-Pratt (KMP) pattern matching algorithm; we define a non-conventional kind of RAM, the MP--RAMs which model more closely the microprocessor operations; we design an O(n) on-line algorithm for solving the subsequence matching problem on MP--RAMs. Keywords: Subsequence matching, algorithms, frequent patterns, episode matching, datamining. 1 Introduction We address the following problem. Given a text t of length n and a pattern p = p 1 \Delta \Delta \Delta p k of l...
Database Structures, Based on Tries, for Text, Spatial, and General Data
- In International Symposium on Cooperative Database Systems for Advanced Applications
, 1996
"... Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval of substrings from large texts. They were exploited for this, as a well-known example, by the University of Waterloo project to put the New Oxford English Dictionary onto CD-ROM. We have recently improved the p ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval of substrings from large texts. They were exploited for this, as a well-known example, by the University of Waterloo project to put the New Oxford English Dictionary onto CD-ROM. We have recently improved the performance of trie techniques for text and shown their use in searches for approximations to a given string. We have also shown that tries have excellent retrieval properties for spatial data. We have shown how to use tries to represent, without redundancy, spatial data which can be displayed to any resolution, retrieving from disk or from network only the amount of data that will finally be displayed. We have done this particularly for two-dimensional vector data, such as makes up very large maps, but have also established that the trie techniques apply to raster data and to data of other than two dimensions. These results are the basis for a claim that tries offer the best storage ...

