by A Linear-time Algorithm, Craig G. Nevill-manning, Ian H. Witten
http://www.cs.washington.edu/research/jair/volume7/nevill97a.ps
Add To MetaCart
Abstract:
SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences. 1.
Citations
|
627
|
Language identification in the limit
– Gold
|
|
557
|
Text Compression
– Bell, Cleary, et al.
- 1990
|
|
545
|
An introduction to hidden markov models
– Rabiner, Juang
- 1986
|
|
124
|
Inference of reversible languages
– Angluin
- 1982
|
|
91
|
Inducing probabilistic grammars by Bayesian model merging
– Stolcke, Omohundro
- 1994
|
|
36
|
Inferring Sequential Structure
– Nevill-Manning
- 1996
|
|
33
|
Learning syntax by automata induction
– Berwick, Pilato
- 1987
|
|
33
|
Discrete sequence prediction and its applications
– Laird
- 1994
|
|
31
|
A version space approach to learning context-free grammars
– VanLehn, Ball
- 1987
|
|
30
|
Attention and structure in sequence learning
– Cohen, Ivry, et al.
- 1990
|
|
26
|
Browsing in digital libraries: A phrase-based approach
– Nevill-Manning, Witten, et al.
|
|
22
|
Manual of information to accompany the LancasterOslo/Bergen corpus of British English, for use with digital computers
– Johansson, Leech, et al.
- 1978
|
|
21
|
Grammatical inference by hill climbing
– Cook, Rosenfeld, et al.
- 1976
|
|
17
|
Behaviour/Structure Transformations Under Uncertainty
– Gaines
- 1976
|
|
16
|
Language acquisition and the discovery of phrase structure
– Wolff
- 1980
|
|
13
|
Simplicity and representation change in grammar induction. Unpublished mss
– LANGLEY
- 1994
|
|
13
|
The discovery of segments in natural language
– Wolff
- 1977
|
|
11
|
An algorithm for the segmentation of an artificial language analogue
– Wolff
- 1975
|
|
8
|
The art of computer programming 1: fundamental algorithms
– Knuth
- 1968
|
|
8
|
Grammar enumeration and inference
– Wharton
- 1977
|
|
5
|
Thinking With The Teachable Machine
– Andreae
- 1977
|