Results 1 -
5 of
5
Regular Expression Acceleration on the Micron Automata Processor: Brill Tagging as a Case Study
"... Abstract-Brill tagging is a classic rule-based algorithm for part-of-speech (POS) tagging that assigns tags, such as nouns, verbs, adjectives, etc., to input tokens. Due to the the intense memory requirements of rule matching, CPU implementations of the Brill tagging algorithm have been found to be ..."
Abstract
- Add to MetaCart
Abstract-Brill tagging is a classic rule-based algorithm for part-of-speech (POS) tagging that assigns tags, such as nouns, verbs, adjectives, etc., to input tokens. Due to the the intense memory requirements of rule matching, CPU implementations of the Brill tagging algorithm have been found to be slow. We show that Micron's Automata Processor (AP)d-a new computing architecture that can perform massively parallel pattern matching-can greatly accelerate the second stage of Brill tagging via rule template matching. The 218 contextual rules are first converted into regular expressions (regex). Regex is used widely in natural language processing (NLP) tasks, thus, this case study involving Brill Tagging also shows how the AP might accelerate other applications that are able to be framed as regexes. We compare single-threaded, and multithreaded versions of Regex matching on an Intel i7 CPU, an Intel XeonPhi co-processor, and the AP. The results show a 63.90X speed-up using the AP as a regex accelerator over the fastest multi-threaded CPU version. We also investigate how performance of regex matching on both CPU architectures varies depending on the complexity of the regex. Taken together, these results demonstrate the potential for significant performance improvements by using accelerators for various NLP computational tasks, particularly those that involve rulebased or pattern-matching approaches.
STATISTICAL ANALYSIS OF PART OF SPEECH (POS) TAGGING ALGORITHMS FOR ENGLISH CORPUS
"... Abstract-Part of speech (POS) Tagging is the procedure of allocating the portion of speech tag or supplementary philological class signal to every single and every single word in a sentence. In countless Usual Speech Processing presentations such as word intellect disambiguation, data recovery, dat ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-Part of speech (POS) Tagging is the procedure of allocating the portion of speech tag or supplementary philological class signal to every single and every single word in a sentence. In countless Usual Speech Processing presentations such as word intellect disambiguation, data recovery, data grasping, analyzing, interrogating, and contraption clarification, POS tagging is imitated as the one of the frank obligatory tool. Categorizing the uncertainties in speech philological items is the mystifying goal in the procedure of growing an effectual and correct POS Tagger. In this paper we difference the presentation of a insufficient POS tagging methods for Bangla speech, e.g. statistical way (n-gram, HMM) and perceptron established approach. A supervised POS tagging way needs a colossal number of annotated training corpus to tag properly. At this early period of POS-tagging for English. In this work we craft a earth truth set that encompasses tagged words from sampled corpus. We additionally investigated the presentation of POS taggers for disparate kinds of words. Keyboards: Part-of-speech tagging, HMM, Unigram, Perceptron. I. PART-OF-SPEECH TAGGING Part-of-speech tagging mentions to a procedure of allocating part-of-speech labels to words in a corpus. Frank part-of-speech labels contain such word classes as nouns, verbs and adjectives but the labelling procedure usually goes beyond that. Features such as singular/plural forms, grammatical gender and case are additionally considered. A part-of-speech tagger begins the tagging procedure by consulting a machine-readable lexicon to ascertain whether the word it has encountered is present in the lexicon. If it is present next a catalog of partof-speech tags that corresponds alongside the word is returned. A word could have countless probable labels, but in most cases it just has one. There are additionally periods after the tagger won't be able to find the word in the lexicon. If that's the case the tagger will normally tolerate by dispatching the word to the morphological analyzer. There it will endeavor to decompose the word and ascertain if it encompasses a morpheme ending. If it does the rest of the word will be matched after once more alongside the lexicon. If that match fails too next the final tagging will be reliant on the disambiguation procedure that follows next. The disambiguation procedure is vital both for words that have countless probable parts-of-speech in the lexicon as well as those that haven't been discovered in the lexicon at all. The disambiguation procedure is completed contrarily reliant on that kind of tagging algorithm that you use. There are three dominant kinds of tagging algorithms inside part-of-speech tagging, namely stochastic tagging, rule-based tagging and transformation-based tagging. The last merges features of the two preceding ones. II. CLASSIFICATION OF POS TAGGER A Part-Of-Speech Tagger (POS Tagger) is described as a portion of multimedia that assigns portions of speech to every single word of a speech that it reads. The ways of POS tagging can be tear into three categories; law established tagging, statistical tagging and hybrid
A Survey on Parts of Speech Tagging for Indian Languages
"... This paper describes the survey on POS (Part of Speech) tagging for various Indian Languages. Various approaches concerned for POS tagging of sentences written in Indian languages are discussed in this paper. Indian Languages have rich morphological effect so a no. of problems occur while tagging th ..."
Abstract
- Add to MetaCart
(Show Context)
This paper describes the survey on POS (Part of Speech) tagging for various Indian Languages. Various approaches concerned for POS tagging of sentences written in Indian languages are discussed in this paper. Indian Languages have rich morphological effect so a no. of problems occur while tagging the sentences written in various languages. A lot of POS tagging work has been done by the researchers for various languages using different approaches HMM ( Hidden Marcov Model) , SVM (Support Vector Machine) , ME (Maximum Entropy) etc.
North Maharashtra University, Jalgaon.
"... Part-of-speech tagging in Marathi language is a very complex task as Marathi is highly inflectional in nature & free word order language. In this paper we have demonstrated a rule-based Part-of-Speech tagger for Marathi Language. The hand– constructed rules that are learned from corpus and some ..."
Abstract
- Add to MetaCart
Part-of-speech tagging in Marathi language is a very complex task as Marathi is highly inflectional in nature & free word order language. In this paper we have demonstrated a rule-based Part-of-Speech tagger for Marathi Language. The hand– constructed rules that are learned from corpus and some manual addition after studying the grammar of Marathi language are added and that are used for developing the tagger. Disambiguation is done by analyzing the linguistic feature of the word, its preceding word, its following word, etc. After testing the system with three data sets we got encouraging results. The accuracy of our system is of an average 78.82 % after testing it on three different data sets.