Results 1 - 10
of
19
Evolutionary Approach to Natural Language Word Sense Disambiguation through Global Coherence Optimization
"... shore, set, etc. Automatic selection of the sense intended in a given text has crucial importance in many applications of text processing, such as information retrieval or machine translation: e.g., “(my account in the) bank ” is to be translated into Spanish as “(mi cuenta en el) banco ” whereas “( ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
shore, set, etc. Automatic selection of the sense intended in a given text has crucial importance in many applications of text processing, such as information retrieval or machine translation: e.g., “(my account in the) bank ” is to be translated into Spanish as “(mi cuenta en el) banco ” whereas “(on the) bank (of the lake) ” as “(en la) orilla (del lago). ” Current methods of such disambiguation involve local maximization of coherence: for every word they select the sense that has more in common with the surrounding words, taking into account all their senses. Since the words are processed independently, this is logically inconsistent: the choice of a sense for a word can be affected by the nearby words ’ senses other than the ones chosen by the algorithm when processing those words. This leads to sub-optimal coherence between the chosen senses of close words. In this paper, we consider global optimization of such coherence and show that it can be improved as compared with the best existing approaches, leading to superior results. Due to high dimensionality of the search space, a genetic algorithm is used to find a near-optimal combination of sense choices.
Detecting inflection patterns in natural language by minimization of morphological model
- Proceedings of Progress in Pattern Recognition, Image Analysis and Applications, 9th Iberoamerican Congress on Pattern Recognition, CIARP ’04, volume 3287 of Lecture Notes in Computer Science
, 2004
"... Abstract. One of the most important steps in text processing and information retrieval is stemming—reducing of words to stems expressing their base meaning, e.g., bake, baked, bakes, baking → bak-. We suggest an unsupervised method of recognition such inflection patterns automatically, with no a pri ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Abstract. One of the most important steps in text processing and information retrieval is stemming—reducing of words to stems expressing their base meaning, e.g., bake, baked, bakes, baking → bak-. We suggest an unsupervised method of recognition such inflection patterns automatically, with no a priori information on the given language, basing exclusively on a list of words extracted from a large text. For a given word list V we construct two sets of strings: stems S and endings E, such that each word from V is a concatenation of a stem from S and ending from E. To select an optimal model, we minimize the total number of elements in S and E. Though such a simplistic model does not reflect many phenomena of real natural language morphology, it shows surprisingly promising results on different European languages. In addition to practical value, we believe that this can also shed light on the nature of human language. 1
Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases
- Polibits (37) 2008, Special section: Natural Langugage Processing, Journal of Research and Developement in Computer Science and Engeneering, ed. Grigori Sidorov, Centro Innovacion y Desarrollo Tecnologico en Computo, Instututo Politecnico Nacional, Mexico
, 2008
"... Abstract—The selection of words chosen for a query, crucial for the quality of results obtained by the query, can be substantially improved by using various lexical resources. Thus, for example, morphological dictionaries enable morphological expansion of queries, which is very important in highly i ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The selection of words chosen for a query, crucial for the quality of results obtained by the query, can be substantially improved by using various lexical resources. Thus, for example, morphological dictionaries enable morphological expansion of queries, which is very important in highly inflective languages, such as Serbian. This paper discusses issues related to improvement of queries using a rule based procedure implemented in WS4LR, a workstation for manipulating heterogeneous lexical resources developed by the Human Language Technology Group at the University of Belgrade. The procedure is used for automatic production of lemmas for a morphological dictionary from a given list of compounds, and its evaluation on several different sets of data is given. Several examples illustrate how this procedure can be used for improvement of queries for web search engines. Results obtained for these examples show that the number of documents obtained through a query by using our approach can be remarkably increased. Index Terms—Electronic dictionary, inflection, compounds, query expansion.
Modified Makagonov’s Method for Testing Word Similarity and its Application to Constructing Word Frequency Lists
"... Abstract. By (morphologically) similar wordforms we understand wordforms (strings) that have the same base meaning (roughly, the same root), such as sadly and sadden. The task of deciding whether two given strings are similar (in this sense) has numerous applications in text processing, e.g., in inf ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. By (morphologically) similar wordforms we understand wordforms (strings) that have the same base meaning (roughly, the same root), such as sadly and sadden. The task of deciding whether two given strings are similar (in this sense) has numerous applications in text processing, e.g., in information retrieval, for which usually stemming is employed as an intermediate step. Makagonov has suggested a weakly supervised approach for testing word similarity, based on empirical formulae comparing the number of equal and different letters in the two strings. This method gives good results on English, Russian, and a number of Romance languages. However, his approach does not deal well with slight morphological alterations in the stem, such as Spanish pensar vs. pienso. We propose a simple modification of the method using n-grams instead of letters. We also consider four algorithms for compiling a word frequency list relying on these formulae. Examples from Spanish and English are presented. 1
SMM: Detailed, Structured Morphological Analysis for Spanish
"... Abstract—We present a morphological analyzer for Spanish called SMM. SMM is implemented in the grammar development framework Malaga, which is based on the formalism of Left-Associative Grammar. We briefly present the Malaga framework, describe the implementation decisions for some interesting morpho ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—We present a morphological analyzer for Spanish called SMM. SMM is implemented in the grammar development framework Malaga, which is based on the formalism of Left-Associative Grammar. We briefly present the Malaga framework, describe the implementation decisions for some interesting morphological phenomena of Spanish, and report on the evaluation results from the analysis of corpora. SMM was originally only designed for analyzing word forms; in this article we outline two approaches for using SMM and the facilities provided by Malaga to also generate verbal paradigms. SMM can also be embedded into applications by making use of the Malaga programming interface; we briefly discuss some application scenarios. Index Terms—Natural language processing, morphology, Malaga, Spanish.
To cite this version:
, 2014
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
unknown title
"... Abstract—The selection of words chosen for a query, crucial for the quality of results obtained by the query, can be substantially improved by using various lexical resources. Thus, for example, morphological dictionaries enable morphological expansion of queries, which is very important in highly i ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—The selection of words chosen for a query, crucial for the quality of results obtained by the query, can be substantially improved by using various lexical resources. Thus, for example, morphological dictionaries enable morphological expansion of queries, which is very important in highly inflective languages, such as Serbian. This paper discusses issues related to improvement of queries using a rule based procedure implemented in WS4LR, a workstation for manipulating heterogeneous lexical resources developed by the Human Language Technology Group at the University of Belgrade. The procedure is used for automatic production of lemmas for a morphological dictionary from a given list of compounds, and its evaluation on several different sets of data is given. Several examples illustrate how this procedure can be used for improvement of queries for web search engines. Results obtained for these examples show that the number of documents obtained through a query by using our approach can be remarkably increased. Index Terms—Electronic dictionary, inflection, compounds,
READS
, 2008
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
KOREA
"... discuss the requirements for the system that performs the analysis of natural language at the syntactic level. We also present the environment that allows development of context-free grammars for natural language parsers. The environment was tested for Spanish language, resulting on the development ..."
Abstract
- Add to MetaCart
discuss the requirements for the system that performs the analysis of natural language at the syntactic level. We also present the environment that allows development of context-free grammars for natural language parsers. The environment was tested for Spanish language, resulting on the development of a Spanish morphological analyzer. The environment gives the user the possibilities to develop and debug grammars of new languages. It has an option of ordering different parsing variants according to their probabilities on the basis of a specialized dictionary of government patterns. Key-words: Context-free grammars, natural language parsing, language engineering, knowledge engineering. 1
Some Linguistic Methods of Improving of the Quality of Document Retrieval on the Internet
"... Abstract. One of the problems of e-Business is to find relevant documents for making correct decisions. The main problem of the Internet is the huge amount of documents that makes it difficult to find the relevant ones, hence the importance of the methods allowing for improving the quality of docume ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. One of the problems of e-Business is to find relevant documents for making correct decisions. The main problem of the Internet is the huge amount of documents that makes it difficult to find the relevant ones, hence the importance of the methods allowing for improving the quality of document retrieval. We discuss some linguistic problems of document retrieval on the Internet related to the following natural language phenomena: (1) morphological processes: e.g., takes, took, taken are grammar forms of take, (2) polysemy and homonymy: most words have several senses, e.g., bank is a financial institution, shore, bench, etc., (3) non-linearity of syntactic relations: in case of a query that contains word combinations, the words forming a word combination can be separated by other words in the documents. Some linguistic-based methods and strategies related to the discussed problems are proposed that improve the quality of document retrieval or show the necessity of application of linguistic methods. 1.