Results 1 - 10
of
31
Automatic Word Sense Discrimination
- Journal of Computational Linguistics
, 1998
"... This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closen ..."
Abstract
-
Cited by 272 (0 self)
- Add to MetaCart
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training insta,nces or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words
Automatic Identification of Word Translations from Unrelated English and German Corpora
, 1999
"... Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is
Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis
- Journal of Artificial Intelligence research
, 2005
"... We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Ha ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris' distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.
Automatic Bilingual Lexicon Acquisition Using Random Indexing
- Journal of Natural Language Engineering, Special Issue on Parallel Texts
, 2004
"... This paper presents a very simple and effective approach to automatic bilingual lexicon acquisition. The approach is cooccurrence-based, and uses the Random Indexing vector space methodology applied to aligned bilingual data. The approach is simple, efficient and scalable, and generate promising res ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
This paper presents a very simple and effective approach to automatic bilingual lexicon acquisition. The approach is cooccurrence-based, and uses the Random Indexing vector space methodology applied to aligned bilingual data. The approach is simple, efficient and scalable, and generate promising results when compared to a manually compiled lexicon. The paper also discusses some of the methodological problems with the prefered evaluation procedure.
Learning and Inference for Clause Identification
, 2002
"... This paper presents an approach to partial parsing of natural language sentences that makes global inference on top of the outcome of hierarchically learned local classifiers. The best decomposition of a sentence into clauses is chosen using a dynamic programming based scheme that takes into acc ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
This paper presents an approach to partial parsing of natural language sentences that makes global inference on top of the outcome of hierarchically learned local classifiers. The best decomposition of a sentence into clauses is chosen using a dynamic programming based scheme that takes into account previously identified partial solutions. This inference scheme applies learning at several levels---when identifying potential clauses and when scoring partial solutions. The classifiers are trained in a hierarchical fashion, building on previous classifications. The method presented significantly outperforms the best methods known so far for clause identification.
Selection Restrictions Acquisition from Corpora
- In Proceedings EPIA-01
, 2001
"... This paper describes an automatic clustering strategy for acquiring selection restrictions. We use a knowledge-poor method merely based on word cooccurrence within basic syntactic constructions; hence, neither semantic tagged corpora nor man-made lexical resources are needed for generalising sem ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper describes an automatic clustering strategy for acquiring selection restrictions. We use a knowledge-poor method merely based on word cooccurrence within basic syntactic constructions; hence, neither semantic tagged corpora nor man-made lexical resources are needed for generalising semantic restrictions. Our strategy relies on two basic linguistic assumptions. First, we assume that two syntactically related words impose semantic selectional restrictions to each other (co- specification). Second, it is also claimed that two syntactic contexts impose the same selection restrictions if they cooccur with the same words (contextual hypothesis). In order to test our learning method, preliminary experiments have been performed on a Portuguese corpus.
The Computation of Word Associations: Comparing Syntagmatic and Paradigmatic Approaches
, 2002
"... It is shown that basic language processes such as the production of free word associations and the generation of synonyms can be simulated using statistical models that analyze the distribution of words in large text corpora. According to the law of association by contiguity, the acquisition o ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
It is shown that basic language processes such as the production of free word associations and the generation of synonyms can be simulated using statistical models that analyze the distribution of words in large text corpora. According to the law of association by contiguity, the acquisition of word associations can be explained by Hebbian learning. The free word associations as produced by subjects on presentation of single stimulus words can thus be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. The reason is that synonyms rarely occur together but appear in similar lexical neighborhoods. Both approaches are systematically compared and are validated on empirical data. It turns out that for both tasks the performance of the statistical system is comparable to the performance of human subjects.
Taxonomy Learning - Factoring the Structure of a Taxonomy Into a Semantic Classification Decision
, 2002
"... The paper examines different possibilities to take advantage of the taxonomic organization of a thesaurus to improve the accuracy of classifying new words into its classes. The results of the study demonstrate that taxonomic similarity between nearest neighbors, in addition to their distributional s ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The paper examines different possibilities to take advantage of the taxonomic organization of a thesaurus to improve the accuracy of classifying new words into its classes. The results of the study demonstrate that taxonomic similarity between nearest neighbors, in addition to their distributional similarity to the new word, may be useful evidence on which classification decision can be based.
Syntactic-Based Methods for Measuring Word Similarity
- In: Proceedings TSD-01, Springer-Verlag (2001
, 2001
"... This paper explores di#erent strategies for extracting similarity relations between words from partially parsed text corpora. The strategies we have analysed do not require supervised training nor semantic information available from general lexical resources. ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This paper explores di#erent strategies for extracting similarity relations between words from partially parsed text corpora. The strategies we have analysed do not require supervised training nor semantic information available from general lexical resources.

