• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 685
Next 10 →

An Empirical Study of Smoothing Techniques for Language Modeling

by Stanley F. Chen , 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract - Cited by 1224 (21 self) - Add to MetaCart
.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very

Attacking decipherment problems optimally with low-order n-gram models

by Sujith Ravi, Kevin Knight - In Proceedings of EMNLP 2008 , 2008
"... We introduce a method for solving substitution ciphers using low-order letter n-gram models. This method enforces global constraints using integer programming, and it guarantees that no decipherment key is overlooked. We carry out extensive empirical experiments showing how decipherment accuracy var ..."
Abstract - Cited by 12 (6 self) - Add to MetaCart
We introduce a method for solving substitution ciphers using low-order letter n-gram models. This method enforces global constraints using integer programming, and it guarantees that no decipherment key is overlooked. We carry out extensive empirical experiments showing how decipherment accuracy

Reranking an N-Gram Supertagger

by John Chen, Srinivas Bangalore, Michael Collins, Owen Rambow - In Proceedings of the TAG+ Workshop , 2002
"... this paper, we investigate an approach to such a choice based on reranking a set of candidate supertags and their confidence scores. RankBoost (Freund et al., 1998) is the boosting algorithm that we use in order to learn to rerank outputs. It also has been used with good effect in reranking outputs ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
of a statistical parser (Collins, 2000) and ranking sentence plans (Walker, Rambow and Rogati, 2001). RankBoost may learn to correct biases that are inherent in n-gram modeling which lead to systematic errors in supertagging (cf. (van Halteren, 1996)). RankBoost can also use a variety of local and long

Growing an n-gram language model

by Vesa Siivola, Bryan L. Pellom - In Proceedings of 9th European Conference on Speech Communication and Technology , 2005
"... Traditionally, when building an n-gram model, we decide the span of the model history, collect the relevant statistics and estimate the model. The model can be pruned down to a smaller size by manipulating the statistics or the estimated model. This paper shows how an n-gram model can be built by ad ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
by adding suitable sets of n-grams to a unigram model until desired complexity is reached. Very high order n-grams can be used in the model, since the need for handling the full unpruned model is eliminated by the proposed technique. We compare our growing method to entropy based pruning. In Finnish speech

Beyond N in N-gram Tagging

by Robbert Prins
"... The Hidden Markov Model (HMM) for part-of-speech (POS) tagging is typically based on tag trigrams. As such it models local context but not global context, leaving long-distance syntactic relations unrepresented. Using n-gram models for n> 3 in order to incorporate global context is problematic as ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
The Hidden Markov Model (HMM) for part-of-speech (POS) tagging is typically based on tag trigrams. As such it models local context but not global context, leaving long-distance syntactic relations unrepresented. Using n-gram models for n> 3 in order to incorporate global context is problematic

N.: An effective combination of different order n-grams

by Sen Zhang, Na Dong - In: Proceedings of O-COCOSDA 2003 , 2003
"... In this paper an approach is proposed to combine different order N-grams based on the discriminative estimation criterion, on which the parameters of n-gram can be optimized. To raise the power of modeling language information, we propose several schemes to combine conventional different order n-gra ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
In this paper an approach is proposed to combine different order N-grams based on the discriminative estimation criterion, on which the parameters of n-gram can be optimized. To raise the power of modeling language information, we propose several schemes to combine conventional different order n-gram

A note on topical n-grams

by Xuerui Wang, Andrew Mccallum - University of Massachusetts , 2005
"... Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assumption: bag of words. However, text is indeed a sequence of discrete word tokens, and without considering the order of words (in another word, the nearby context where a word is located), the accurate meani ..."
Abstract - Cited by 27 (1 self) - Add to MetaCart
Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assumption: bag of words. However, text is indeed a sequence of discrete word tokens, and without considering the order of words (in another word, the nearby context where a word is located), the accurate

RESTORING PUNCTUATION AND CAPITALIZATION IN TRANSCRIBED SPEECH

by Agustín Gravano, Martin Jansche, Michiel Bacchiani
"... Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n = 3to n = ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n = 3to n

Consider for N-Gram analysis in Bangla?

by Naira Khan, Md. Tarek Habib, Md. Jahangir Alam, Rajib Rahman, Naushad Uzzaman, Mumit Khan
"... This paper presents a directional advantage of n-gram modeling in terms of backward or forward n-gram modeling in Bangla. The most commonly used n-gram analysis is predominantly a forward n-gram. However in Bangla it appears that a backward n-gram is repeatedly more successful and yields more gramma ..."
Abstract - Add to MetaCart
grammatical results than a forward n-gram. This paper hypothesizes that the rationale behind this success is the syntactic ordering of constituents in Bangla. Bangla is a head-final specifier-initial language as opposed to English, which is head-initial specifier-initial. Hence in Bangla, the head comes after

History (forward n-gram) or Future (backward n-gram)? Which model to consider for n-gram analysis in Bangla?

by Naira Khan, Md. Tarek Habib, Md. Jahangir Alam, Rajib Rahman, Naushad Uzzaman
"... This paper presents a directional advantage of n-gram modeling in terms of backward or forward n-gram modeling in Bangla. The most commonly used n-gram analysis is predominantly a forward n-gram. However in Bangla it appears that a backward n-gram is repeatedly more successful and yields more gramma ..."
Abstract - Add to MetaCart
grammatical results than a forward n-gram. This paper hypothesizes that the rationale behind this success is the syntactic ordering of constituents in Bangla. Bangla is a headfinal specifier-initial language as opposed to English, which is head-initial specifier-initial. Hence in Bangla, the head comes after
Next 10 →
Results 1 - 10 of 685
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University