Results 1 -
2 of
2
Cascaded Markov Models
, 1999
"... This paper presents a new approach to partial parsing of context-free structures. The approach is based ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
This paper presents a new approach to partial parsing of context-free structures. The approach is based
A Fast Japanese Sentence Analyzer
- In Proceedings of the First International Workshop on MultiMedia Annotation
, 2001
"... A deterministic nite state transducer is a fast device for analyzing strings. It takes O(n) time to analyze a string of length n. In this paper, an application of this technique to Japanese sentence analysis will be described. The Japanese analysis includes a morphological analyzer (Keitaiso-Kaiseki ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A deterministic nite state transducer is a fast device for analyzing strings. It takes O(n) time to analyze a string of length n. In this paper, an application of this technique to Japanese sentence analysis will be described. The Japanese analysis includes a morphological analyzer (Keitaiso-Kaiseki), a bunsetsu analyzer and a dependency analyzer (Kakariuke-Kaiseki). We achieved the speed at a small cost in accuracy. The morphological analysis was implemented based on the Saichou-Icchi-Hou (longest matching method), a traditional method in the morphological analysis, extended by registering compound words as one words. The bunsetsu analyzer is a simple N-gram method, although we noticed some improvement can be seen by introducing lexical information. The dependency analysis is the crucial part, as it normally takes long time to analyze, normally cubic to the sentence length. However our system takes about 0.17 millisecond to analyze one sentence (average length is 10 bunsetsu, based on PentiumIII 650MHz PC, Linux) and we actually observed the analysis time to be proportional to the sentence length. The accuracy is about 81% even though very little lexical information is used. This is about 17% and 9% better than the default and a simple system, respectively. We believe the gap between our performance and the best current performance on the same task, about 7%, can be lled by introducing lexical or semantic information. 1

