• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 6,693
Next 10 →

Building a Large Annotated Corpus of English: The Penn Treebank

by Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz - COMPUTATIONAL LINGUISTICS , 1993
"... There is a growing consensus that significant, rapid progress can be made in both text understanding and spoken language understanding by investigating those phenomena that occur most centrally in naturally occurring unconstrained materials and by attempting to automatically extract information abou ..."
Abstract - Cited by 2740 (10 self) - Add to MetaCart
for enterprises as diverse as the automatic construction of statistical models for the grammar of the written and the colloquial spoken language, the development of explicit formal theories of the differing grammars of writing and speech, the investigation of prosodic phenomena in speech, and the evaluation

A Maximum-Entropy-Inspired Parser

by Eugene Charniak , 1999
"... We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" se ..."
Abstract - Cited by 971 (19 self) - Add to MetaCart
" sections of the Wall Street Journal tree- bank. This represents a 13% decrease in error rate over the best single-parser results on this corpus [9]. The major technical innova- tion is the use of a "maximum-entropy-inspired" model for conditioning and smoothing that let us successfully to test

Probabilistic Latent Semantic Indexing

by Thomas Hofmann , 1999
"... Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized ..."
Abstract - Cited by 1225 (10 self) - Add to MetaCart
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized

Query Expansion Using Local and Global Document Analysis

by Jinxi Xu, W. Bruce Croft - In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , 1996
"... Automatic query expansion has long been suggested as a technique for dealing with the fundamental issue of word mismatch in information retrieval. A number of approaches to expansion have been studied and, more recently, attention has focused on techniques that analyze the corpus to discover word re ..."
Abstract - Cited by 610 (24 self) - Add to MetaCart
Automatic query expansion has long been suggested as a technique for dealing with the fundamental issue of word mismatch in information retrieval. A number of approaches to expansion have been studied and, more recently, attention has focused on techniques that analyze the corpus to discover word

A Program for Aligning Sentences in Bilingual Corpora

by William A. Gale , Kenneth W. Church , 1993
"... This paper will describe a method and a program (align) for aligning sentences based on a simple statistical model of character lengths. The program uses the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend ..."
Abstract - Cited by 529 (5 self) - Add to MetaCart
to be translated into shorter sentences. A probabilistic score is assigned to each proposed correspondence of sentences, based on the scaled difference of lengths of the two sentences (in characters) and the variance of this difference. This probabilistic score is used in a dynamic programming framework to find

Predicting the Semantic Orientation of Adjectives

by Vasileios Hatzivassiloglou, Kathleen R. McKeown , 1997
"... We identify and validate from a large corpus constraints from conjunctions on the positive or negative semantic orientation of the conjoined adjectives. A log-linear regression model uses these constraints to predict whether conjoined adjectives are of same or different orientations, achiev- ..."
Abstract - Cited by 473 (5 self) - Add to MetaCart
We identify and validate from a large corpus constraints from conjunctions on the positive or negative semantic orientation of the conjoined adjectives. A log-linear regression model uses these constraints to predict whether conjoined adjectives are of same or different orientations, achiev

Statistical Parsing with a Context-free Grammar and Word Statistics

by Eugene Charniak , 1997
"... We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previou ..."
Abstract - Cited by 414 (18 self) - Add to MetaCart
previous schemes. As this is the third in a series of parsers by different authors that are similar enough to invite detailed comparisons but different enough to give rise to different levels of performance, we also report on some experiments designed to identify what aspects of these systems best

Motion Graphs

by Lucas Kovar, Michael Gleicher, Frederic Pighin , 2002
"... In this paper we present a novel method for creating realistic, controllable motion. Given a corpus of motion capture data, we automatically construct a directed graph called a motion graph that encapsulates connections among the database. The motion graph consists both of pieces of original motion ..."
Abstract - Cited by 380 (6 self) - Add to MetaCart
In this paper we present a novel method for creating realistic, controllable motion. Given a corpus of motion capture data, we automatically construct a directed graph called a motion graph that encapsulates connections among the database. The motion graph consists both of pieces of original motion

Twitter as a corpus for sentiment analysis and opinion mining

by Er Pak, Patrick Paroubek - In Proceedings of the Seventh Conference on International Language Resources and Evaluation
"... Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared rela ..."
Abstract - Cited by 235 (2 self) - Add to MetaCart
Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared

The TIGER Treebank

by Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, George Smith , 2002
"... This paper reports on the TIGER Treebank, a corpus of currently 35.000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the tr ..."
Abstract - Cited by 332 (3 self) - Add to MetaCart
This paper reports on the TIGER Treebank, a corpus of currently 35.000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation
Next 10 →
Results 1 - 10 of 6,693
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University