29 citations found. Retrieving documents...
J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22--31, 1968.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Improving Stemming for Arabic Information Retrieval.. - Larkey, Ballesteros, al. (2002)   (9 citations)  (Correct)

....mismatch problem, in which query words do not match document words. Stemmers equate or conflate certain variant forms of the same word like (paper, papers) and (fold, folds, folded, folding. In English and many other western European languages, stemming is primarily a process of suffix removal [32, 40]. Such stemmers do not conflate irregular forms such as (goose, geese) and (swim, swam) In this work, we use the term stemming to refer to any process which conflates related forms or groups forms into equivalence classes, including but not restricted to suffix stripping. Stemming has been shown ....

Lovins, J. B. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11, pp. 22-31, 1968.


Original Articles - The Stanford Digital (1997)   (Correct)

....(e.g. the type of proximity operators supported) We need to be aware of the stopwords used. We need to know the vocabulary of the collection, i.e. the set of words indexed by the service. This is used, for instance, to enumerate words that match the stem of a particular word when stemming [14, 15] must be emulated because it is not a supported capability. We need to know the details of other features, e.g. for truncation, the supported truncation patterns. 2.4 Result analysis The introduction of canonical attribute models into the metadata architecture is valuable not only for query ....

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2):2231, 1968


PiQASso: Pisa Question Answering System - Giuseppe Attardi Antonio (2001)   (1 citation)  (Correct)

....questions expecting a location as an answer, day , date , year in questions expecting a date and so on. We use about a dozen of such exceptions, which correspond to cases in which the type makes these words superfluous. Stemming is performed using Linh Huynh implementation of Lovins s stemmer [6]. In the second expansion cycle we broaden the search by adding (in or) the synonyms of the search terms. Synonyms are looked up in WordNet by means of WNSense. Synonyms are stemmed as well. In the third and fourth expansion cycles, we increase recall by dropping some search terms. During the ....

J. B. Lovins, Development of a Stemming Algorithm. Mechanical Translations and Computational Linguistics, 11: 22-31, 1968.


On Arabic Search: Improving the Retrieval Effectiveness Via.. - Aljlayl, Frieder (2002)   (Correct)

....Latin based languages such as English. Several different techniques were proposed for stemming English text. One of the simplest techniques is suffix stripping; it uses lists of suffixes to reduce words to their bare form. The most common stemming algorithms for English are Porter [16] and Lovins [13]. Kraaij and Pohlmann [11] concluded that stemming improves recall. A comparative evaluation performed by Hull [9] to investigate the retrieval precision using stemming found little precision improvement as compared to no stemming. Krovetz [12] proposed a different approach to stemming. The ....

Lovins, J. Development of a stemming algorithm, Mechanical Translation and Computational Linguistics, 11,22-31, 1968.


A Clustering Interface For Web Search Results In Polish And English - Weiss   (Correct)

....corpus would have to be built specifically for every language the weighting formula should be applied to. In Carrot we tried to use Basel term frequencies, kindly published by Andy McFarlane, but, unfortunately, to prepare the list, he extracted terms from Basel using a version of Lovins stemmer [Lovins, 68] so all not English words were destroyed. In the end we decided to use the original tfidf formula, with the idf factor taken from the distribution of terms in the query result. 3.1.3 ORDER OF WORDS IN PHRASES Suffix tree clustering was created according to an observation that language ....

.... i absorbability I absorbed i absorbent i absorber i absorbing ; poznaniak] i poznaniacy i poznaniaka i poznaniakami i poznaniakiem poznaniakom I poznaniakovi I poznaniaku I poznaniak6v I poznanianki ; A number of stemmers were proposed for the English language: Paice stemmer, Lovins stemmer [Lovins, 68] or, the most popular, Porter stemmer [Porter, 80] All of them work as finite state machines given a set of rules (states and transitions between them) they identify a suffix of the word being processed (a state) and strip it if a matching transition is found. This example also shows how the ....

Lovins J. B.: Development of a Stemming Algorithm, Mechanical Translation and computational Linguistics, 11 (1):23-31, March 1968. 106


An Architecture for Information Retrieval from Distributed.. - Ryan, O'Riordan   (Correct)

....at hand. Common approaches include: Stemming: Stemming algorithms remove common suffices from terms occurring in the documents. The goal is to reduce similar words to a common root form by identifying morphological derivations of words. Commonly used algorithms include Lovin s stemming algorithm[14] and Porter s stemming algorithm[15] Thesauri construction: This is often used to identify synonyms within the texts. Thesauri can be constructed via manual or automated approaches. The former is created with knowledge of the language at hand; the latter is based on calculating statistics ....

J.B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 1:22--31, March 1968.


Tequesta: The University of Amsterdam's Textual Question.. - Monz, de Rijke (2001)   (1 citation)  (Correct)

....article, we fixed slope at 0.2; the pivot was set to the average number of unique words occurring in the collection. To increase precision, we decided to use a lexical based stemmer, or lemmatizer, because it tends to be less aggressive than rule based stemmers such as Porter s [14] or Lovins [9] stemmer. The lemmatizer is part of the TreeTagger part ofspeech tagger [17] Each word is assigned its syntactic root through lexical look up. Mainly number, case, and tense information is removed, leaving other morphological processes such as nominalization intact. 2.3 Document Analysis 2.3.1 ....

J.B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1--2):22--31, 1968.


A Detailed Analysis of English Stemming Algorithms - Hull, Grefenstette (1996)   (6 citations)  (Correct)

....has been approached with a wide variety of different methods, as detailed in Lennon [10] including suffix removal, strict truncation of character strings, word segmentation, letter bigrams, and linguistic morphology. Two of the most popular algorithms in information retrieval, the Lovins stemmer [11] and the Porter stemmer [13] are based on suffix removal. Lovins finds the longest match from a large list of endings while Porter uses an iterative algorithm with a smaller number of suffixes and a few context sensitive recoding rules. Krovetz [9] accurately describes the problems associated ....

....of five different stemming algorithms and compare them to a baseline which consists of no stemming at all. Two stemmers are included in the SMART collection; they are a simple algorithm which removes s s from the end of the word and an extensively modified version of the Lovins algorithm [11]. In addition, we will study the Porter stemmer [13] and versions of the Xerox English inflectional and derivational analyzers [15] slightly modified for the conflation problem. 7 Experimental Results The SMART system is used to index the queries and documents separately for each stemming ....

Janet Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22--31, 1968.


Application of K-NN and FPTC Based Text Categorization Algorithms.. - Ilhan (2001)   (Correct)

....1 7 69 71 j art c sars nt lar s ur uyor j istanbul d uzce kas m depremin art c sars nt lar s ur uyor bo gazi ci universitesi kandilli rasathanesi deprem ara st rma enstit us u d uzce b uy ukl u g unde art c sars nt Figure 1.2: The Preprocessed News Report 1.1. 2 Wild Card Matching Lovins [22] de nes the stemming as a procedure to reduce all words with the same stem to a common form, usually by stripping each word of its derivational and in ectional suxes. Stemming is generally achived by means of sux dictionaries that contain lists of possible word endings, and this approach has ....

Lovins, J. B., Development of a Stemming Mechanical Algorithm, Mechanical Translation and Computational Linguistics, 1968.


Feature Engineering for a Symbolic Approach to Text Classification - Scott (1998)   (4 citations)  (Correct)

....(RIPPER is designed specifically with this in mind [COH96a] but more typically, there is some attempt to make the features more general and thus reduce dependencies and redundancies. The most common way to generalize words is to apply a stemming algorithm such as the one developed by Lovins [LOV68] to remove suffixes from words. Stemming has the effect of mapping several morphological forms of words to a common feature. For example the words learner, learning, and learned would all map to the common root learn, and this latter string would be placed in the feature set rather than the former ....

J.B. Lovins. Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11. 1968. 22-31.


Linguistically Motivated Information Retrieval - Arampatzis, van der Weide.. (2000)   (6 citations)  (Correct)

....of the number of relevant retrieved documents to the total number of relevant documents in the document collection. For an extended introduction to the IR problem, its history, widely accepted techniques, and retrieval evaluation metrics, the reader should refer to the classical books [1] and [2]; for a collection of classical articles in IR, to [3] all in Readings for Further Study) The tremendous increase over the last decade in information in digital form has led to a new challenge in IR. A World Wide Web search today involves large amounts of information, and going through hundreds ....

....be done in a linguistic fashion, taking into account the function and the part of speech of a word, or in a nonlinguistic fashion, disregarding a word s context. Lovins and Porter developed nonlinguistic algorithms for sux stripping based on a list of frequent suxes to reduce words to their stems [2, 3]. It is a common belief that stemmers improve recall without losing too much precision, however, a comparison of the Lovins stemmer, the S stemmer, and the Porter stemmer with a baseline of no stemming at all, concluded after detailed evaluation that none of the three stemming algorithms ....

J. B. Lovins. Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics, 11(1):22-31, 1968.


Cross-language Retrieval In English and Vietnamese - Van Be Hai (1997)   (1 citation)  (Correct)

....to Vietnamese and one for Vietnamese to English) are those complied by Nguyen [3] In their raw format, they are not suitable for such language transformation, so we have filtered them, eliminating all irrelevant information. This format we call original . By using the Lovins stemming algorithm [2] (for English) to stem the entries in the English Vietnamese dictionary, the entries with the same root can be merged; this version we call Lovins. The Vietnamese English dictionary has 36,187 entries; the original English Vietnamese dictionary has 25,807 entries; and the Lovins EnglishVietnamese ....

J. Lovins. Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics.


Linguistically-motivated Information Retrieval - Arampatzis, van der Weide.. (2000)   (6 citations)  (Correct)

....of the number of relevant retrieved documents to the total number of relevant documents in the document collection. For an extended introduction to the IR problem, its history, widely accepted techniques, and retrieval evaluation metrics, the reader should refer to the classical books [1] and [2]; for a collection of classical articles in IR, to [3] all in Readings for Further Study) The tremendous increase over the last decade of information in digital form has led to a new challenge in IR. A World Wide Web search today involves large amounts of information, and going through hundreds ....

....be done in a linguistic fashion, taking into account the function and the part of speech of a word, or a non linguistic fashion, disregarding a word s context. Lovins and Porter developed non linguistic algorithms for su x stripping based on a list of frequent su xes to reduce words to their stem [2, 3]. It is a common belief that stemmers improve recall without losing too much precision, however, a comparison of the Lovins stemmer, the S stemmer, and the Porter stemmer with a baseline of no stemming at all, concluded after detailed evaluation that none of the three stemming algorithms ....

J. B. Lovins. Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics, 11(1):22-31, 1968.


Conflation-based Comparison of Stemming Algorithms - Fuller, Zobel (1998)   (Correct)

....of affixes. Terms match if they have a common root. For example, trains might be stemmed to train , which would match train , training , and of course trains . Each word thus has a set of conflations , that is, words that have the same root. Several stemming algorithms have been proposed [8, 9, 10, 11, 13, 17, 18], based on different principles; each produces rather different sets of conflations. Stemming has usually been measured by its impact on querying: since stemming changes the documents that are retrieved in response to a query, it has the potential to change the quality of the set of answers. ....

....that surprise the user. Porter s method. The Porter stemmer [13] removes about 60 suffixes in a multi step approach, successively removing short suffixes without exceptions. Each step results in the removal of a suffix or the transformation of the root. Lovins s method. The Lovins stemmer [11] uses a longest match algorithm and exception list to remove over 260 different suffixes. It is the most aggressive of the three algorithmic approaches to stemming. We used public domain implementations of the S stemmer, the Porter stemmer, and the Lovins stemmer for the experiments described ....

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, Volume 11, Number 12, pages 22--31, 1968.


Applying NLP to IR: Why and How - Hui (1998)   (1 citation)  (Correct)

.... the Word Level Morphology can be applied to IR in four areas: i) stemming algorithms, ii) the development of machine readable dictionaries and thesauri, iii) word sense indexing, and (iv) word sense disambiguation (WSD) The most popular stemming algorithms used in IR are the Lovins algorithm [Lov68] and the Porter algorithm [Por80] There are other stemmers as well, such as the one by Paice [Pai90a] but they will not be discussed here. These algorithms attempt to identify words that have common roots. For example, connects , connected , and connecting all get stemmed into connect . ....

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1):22-31, 1968.


Predicate Rewriting for Translating Boolean Queries in .. - Chang, Garcia-Molina, .. (1996)   (1 citation)  (Correct)

....operators nN and N may be used instead. The terminal Word can be either an exact word like cat, or an expanded word such as cat (which matches any words starting with cat if truncation is supported) or stem(cat) which matches any words with the same stem as cat under some stemming algorithm [19, 20]) Phrase patterns (Figure 2, Construct 4) on the other hand, are expressions consisting of phrases (the terminal Phrase) connected by AND or OR operators. A phrase is a quoted string, in our notation, which is supposed to be the complete content of a field. Like words, if a phrase is completely ....

....if the target does apply implicit expansion (on the exact words w i ) the results may be broader than expected, as discussed in Section 5.2.1. b) Pattern e is supported at T but the interpretations are inconsistent. For example, the target may support stemming but with a different algorithm [19, 20] from that used by the front end. In this case we have the following strategies: ffl Tolerate the interpretation inconsistency. We may regard the target s interpretation as an approximation of the desired expansion, because users may not insist on (and usually are not aware of) the actual ....

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2):22--31, 1968.


Boolean Query Mapping across Heterogeneous Information .. - Chang, Garcia-Molina, .. (1996)   (24 citations)  (Correct)

....operator is used when the distance is implicitly zero. If the order does not matter, operators (nN) and (N) may be used instead. However, these operators may not available in other systems, e.g. Folio supports none of these. Other features where systems differ include truncation, stemming [15][8], stopwords, etc. 17] 5] To illustrate, Table III provides feature comparison from our survey of several Boolean query languages. For example, all the systems define their own sets of stopwords, except AltaVista in which all words are indexed. For systems having stopwords, if given a query ....

J. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2):22--31, 1968.


SCAM: A Copy Detection Mechanism for Digital Documents - Shivakumar, Garcia-Molina (1995)   (15 citations)  (Correct)

....we use a new similarity measure that more accurately characterizes copy overlap, while traditional IR systems look for semantic similarity. Several schemes have been proposed to enhance IR schemes, such as use of signature files [8] lexical analysis [1] stoplists [13, 9] stemming algorithms [12, 15], thesaurus [21] and ranking algorithms [19] Since our approach is based on IR, such schemes are orthogonal to our model, and one or more of these schemes could be used to enhance our document comparison mechanism. Our scheme is based on words, which are easier to detect than sentences, and hence ....

J.B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2), 1968.


The Stanford Digital Library Metadata Architecture - Baldonado, Chang, Gravano.. (1997)   (22 citations)  (Correct)

....the type of proximity operators supported) We need to be aware of the stopwords used. We need to know the vocabulary of the collection, i.e. the set of words indexed by the service. This is used, for instance, to enumerate words that match the stem of a particular word when stemming [14, 15] must be emulated because it is not a supported capability. We need to know the details of other features, e.g. for truncation, the supported truncation patterns. 2.4 Result Analysis The introduction of canonical attribute models into the metadata architecture is valuable not only for query ....

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2):22--31, 1968.


Profiling with the INFOrmer Text Filtering Agent - Sorensen (1997)   (4 citations)  (Correct)

.... which reduces all words with the same root to a common form, usually by stripping each word of its inflectional and derivational suffixes (e.g. the words computer, computerisation, computing are all stemmed to a common form comput) The algorithm used in this system is based on Lovin s algorithm [Lovins 1968], a longest match, context sensitive algorithm, which uses a list of ordered endings, with context sensitive rules associated with these endings. A second phase of the algorithm uses a set of respelling rules to convert stemmed words to the same root term (e.g. absorption and absorbed will be ....

J.B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 1:22--31, March 1968.


Domain-Specific Keyphrase Extraction - Frank, Paynter, Witten (1999)   (8 citations)  (Correct)

....of these initial phrases up to length three as candidate phrases. It then eliminates those phrases that begin, or end, with a stopword. It also deletes phrases that consist merely of a proper noun. In the next step, all words are case folded and stemmed using the iterated Lovins stemmer [ Lovins, 1968 ] and stemmed phrases that occur only once in the document are removed. 2.2 Building the Model So far we have shown how candidate phrases are generated. However, in conventional machine learning terms, phrases by themselves are useless it is their properties, or attributes, that are ....

J.B. Lovins. Development of a stemming algorithm. Mechanical translation and computational linguistics, 11:22--31, 1968.


Using Linguistic Knowledge in Information Retrieval - Kraaij, Pohlmann (1996)   (3 citations)  (Correct)

....will increase when morphological variance of terms (e.g. singular plural) is reduced. Several different techniques have been proposed to achieve this goal. One of the simplest of these techniques, suffix stripping, uses a list of frequent affixes to reduce words to their base form or stem e.g. [15], 19] Suffix stripping algorithms are very efficient because they do not involve dictionary look up, but due to their lack of linguistic information, e.g. about word class, they frequently introduce mistakes. Words can be reduced to the wrong stem resulting in conflation with semantically ....

....using more linguistically motivated stemming algorithms. We will discuss their approach and the approach chosen in the UPLIFT project in sections 2.1, 2.2 and 2.3 below. 2. 1 Suffix stripping Harman [6] compared three well known suffixing algorithms for English: the S stemmer, the Lovins stemmer [15] and the Porter stemmer [19] Harman contrasted these suffixing algorithms with a baseline of no stemming at all. After a detailed evaluation Harman reached the conclusion that none of the stemming algorithms consistently improve performance. The number of queries that benefit from the use of a ....

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22--31, 1968.


D2.2.3: State of the art on ontology - Alignment Coordinator Jrme   (Correct)

No context found.

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22--31, 1968.


Evaluation of N-Grams Conflation Approach in Text-Based.. - Kosinov (2001)   (Correct)

No context found.

J. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, (11):22--31, 1968.


Evaluating High Accuracy Retrieval Techniques - Shah, Croft (2004)   (Correct)

No context found.

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2):22--31, 1968.


Combining Machine Learning and Hierarchical Structures for Text.. - Ruiz (2001)   (1 citation)  (Correct)

No context found.

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistic, 11:22--31, 1968.


A Logical Model of Information Retrieval based on Propositional.. - Carril (2001)   (Correct)

No context found.

J. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:2231, 1968.


Evaluating High Accuracy Retrieval Techniques - Chirag Shah Bruce   (Correct)

No context found.

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2):22--31, 1968.


Query and Data Mapping across Heterogeneous Information Sources - Chang (2001)   (Correct)

No context found.

J. B. Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2):22-31, 1968.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC