MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  An algebra for structured text search and a framework for its implementation (1995) [92 citations — 15 self]

Download:
Download as a PDF | Download as a PS
by Charles L. A. Clarke, G. V. Cormack, F. J. Burkowski
The Computer Journal
ftp://cs-archive.uwaterloo.ca/cs-archive/CS-94-30/structxt.ps
Add To MetaCart

Abstract:

A query algebra is presented that expresses searches on structured text. In addition to traditional full-text boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup. The algebra has seven operators, which combine intervals to yield new ones: containing, not containing, contained in, not contained in, one of, both of, followed by. The ultimate result of a query is the set of intervals that satisfy it. An implementation framework is given based on four primitive access functions. Each access function finds the solution to a query nearest to a given position in the database. Recursive definitions for the seven operators are given in terms of these access functions. Search time is at worst proportional to the time required to solve the elementary terms in the query. Inverted indices yield search times that compare favourably to those for full-text boolean searches.

Citations

1463 The definition of Standard ML – Milner, Tofte, et al. - 1990
1446 The Art of Computer Programming – Knuth - 1973
105 The T E Xbook – Knuth - 1984
76 Mind your grammar | a new approach to modelling text – Tompa - 1987
45 SGML | An Author's Guide to the Standard Generalized Markup Language – Bryan - 1988
44 An algebra for structured office documents – Guting, Zicari, et al. - 1989
26 A grammar-based approach towards unifying hierarchical data models – Gyssens, Paredaens, et al. - 1989
23 An algebra for hierarchically organized text-dominated databases – Burkowski - 1992
10 NROFF / TROFF user’s manual’, Computing Science – Ossanna - 1976
8 Introduction to Modern Information Retrieval, chapter 2 – Salton, McGill - 1983
5 Surrogate subsets: A free space management strategy for the index of a text retrieval system – Burkowski - 1990
1 Who owns the law? Wired – Wolf - 1994