Results 1 - 10
of
5,062
Textmining and Organization in Large Corpus
, 2005
"... Nowadays a common size of document corpus might have more than 5000 documents. It is almost impossible for a reader to read thought all documents within the corpus and find out relative information in a couple of minutes. In this master thesis project we propose text clustering as a potential soluti ..."
Abstract
- Add to MetaCart
solution to organizing large document corpus. As a sub-field of data mining, text mining is to discover useful information from written resources. Text clustering is one of topics in text mining, which is to find out the groups information from the text documents and cluster these documents into the most
A method for disambiguating word senses in a large corpus
- Computers and the Humanities
, 1992
"... Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Both quantitive and qualitative methods have been tried, but much of this work has been stymied by difficulties in acquiring appropriate lexical resources, such as semantic ..."
Abstract
-
Cited by 273 (14 self)
- Add to MetaCart
for testing and training. We have achieved considerable progress recently by taking advantage of a new source of testing and training materials. Rather than depending on small amounts of hand-labeled text, we have been making use of relatively large amounts of parallel text, text such as the Canadian Hansards
Building a Large Annotated Corpus of English: The Penn Treebank
- COMPUTATIONAL LINGUISTICS
, 1993
"... There is a growing consensus that significant, rapid progress can be made in both text understanding and spoken language understanding by investigating those phenomena that occur most centrally in naturally occurring unconstrained materials and by attempting to automatically extract information abou ..."
Abstract
-
Cited by 2740 (10 self)
- Add to MetaCart
and comparison of the adequacy of parsing models.
In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus 1 consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989
LARGE CORPUS EXPERIMENTS FOR BROADCAST NEWS RECOGNITION
, 2003
"... This paper investigates the use of a large corpus for the training of a Broadcast News speech recognizer. A vast body of speech recognition algorithms and mathematical machinery is aimed at smoothing estimates toward accurate modeling with scant amounts of data. In most cases, this research is motiv ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper investigates the use of a large corpus for the training of a Broadcast News speech recognizer. A vast body of speech recognition algorithms and mathematical machinery is aimed at smoothing estimates toward accurate modeling with scant amounts of data. In most cases, this research
On the Ranking of Text Documents from Large Corpuses
"... Abstract- Ranking text documents based on their relevance to a topic is of great importance in information retrieval. However, giving the increasingly available avalanche of digital documents, the size of collection pool from which these documents are drawn makes this task more challenging. In addit ..."
Abstract
- Add to MetaCart
. In addition, current computing infrastructure is unable to deal with very large corpuses directly. Thus, new algorithms are needed to seek parallel solutions and utilize more processing power to solve this problem. In this paper we propose a new algorithm that partitions a large collection of documents (a
Predicting the Semantic Orientation of Adjectives
, 1997
"... We identify and validate from a large corpus constraints from conjunctions on the positive or negative semantic orientation of the conjoined adjectives. A log-linear regression model uses these constraints to predict whether conjoined adjectives are of same or different orientations, achiev- ..."
Abstract
-
Cited by 473 (5 self)
- Add to MetaCart
We identify and validate from a large corpus constraints from conjunctions on the positive or negative semantic orientation of the conjoined adjectives. A log-linear regression model uses these constraints to predict whether conjoined adjectives are of same or different orientations, achiev
Experiments in identifying frozen sentences in a large corpus *
"... This paper describes an experiment on the identification of frozen sentences (or verbal idioms) from European Portuguese on large corpus of journalistic text. It aims at identifying the main difficulties (or shortcomings) resulting from the intersection of linguistic information encoded in the lexic ..."
Abstract
- Add to MetaCart
This paper describes an experiment on the identification of frozen sentences (or verbal idioms) from European Portuguese on large corpus of journalistic text. It aims at identifying the main difficulties (or shortcomings) resulting from the intersection of linguistic information encoded
Working Knowledge
, 1998
"... While knowledge is viewed by many as an asset, it is often difficult to locate particular items within a large electronic corpus. This paper presents an agent based framework for the location of resources to resolve a specific query, and considers the associated design issue. Aspects of the work ..."
Abstract
-
Cited by 527 (0 self)
- Add to MetaCart
While knowledge is viewed by many as an asset, it is often difficult to locate particular items within a large electronic corpus. This paper presents an agent based framework for the location of resources to resolve a specific query, and considers the associated design issue. Aspects of the work
in a large corpus of Java methods and C functions
"... Empirical analysis of the relationship between CC and SLOC ..."
The Penn Chinese treebank: Phrase structure annotation of a large corpus. Natural Language Engineering
, 2005
"... With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora wi ..."
Abstract
-
Cited by 170 (23 self)
- Add to MetaCart
with dierent segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are diÆcult. As a rst step towards addressing this issue, we have been preparing a large bracketed corpus since late 1998. The rst two installments of the corpus, 250 thousand words of data, fully
Results 1 - 10
of
5,062