4 citations found. Retrieving documents...
S. Sekine, "Automatic Sublanguage Identification for a New Text", Second Annual Workshop on Very Large Corpora, Kyoto, Japan, pp.109-120, August 1994. 86

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Modeling Long Distance Dependence in Language: Topic Mixtures .. - Iyer, Ostendorf (1996)   (22 citations)  (Correct)

....on the assumption that an entire article comes from a single topic. Starting with single article clusters, clusters are progressively grouped by computing the similarity and grouping the most similar two clusters. The similarity measure is based on the combination of inverse document frequencies [10], specifically S ij = w2A i #A j N ij jA 1 jA i jjA j j (3) where jA i j is the number of unique words in article i, jA j is the number of articles containing the word w and N ij = r N i N j N i # N j (4) is a normalization factor with N i being the number of articles in ....

S. Sekine, "Automatic Sublanguage Identification for a New Tex " , Second Annual Workshop on Very Large Corpora, Kyoto, Japan, pp.109-120, August 1994.


Class-Based Language Model Adaptation Using Mixtures of.. - Moore, Young (2000)   (4 citations)  (Correct)

....merging the two articles found to be most similar according to a word cooccurrence metric, until a given number of article clusters was reached. Each of these groups of articles was then treated as a distinct topic. The article clustering method employed was that used in [2] as based on [4]. Each article is initially placed in a singleton group, and then given two article groups, A a and A b , the similarity between the two groups, S ab , is defined as S ab = w2A a A b N ab jA w j 1 jA a j jA b j (1) where jA w j is the number of article groups that contain the word ....

S. Sekine, "Automatic Sublanguage Identification for a New Text"; Second Annual Workshop on Very Large Corpora, Kyoto


NYU/BBN 1994 CSR Evaluation - Sekine, Sterling, Grisham (1995)   Self-citation (Sekine)   (Correct)

....the new text. In the mixture approach, the corpus was statically clustered into a small number of very broad topics . We have previously reported on the effectiveness of sublanguage identification measured in terms of the frequency of overlapping words between the article and the mini corpus [6] [7] This is the first report on the application of the technique to speech recognition. For speech recognition, the scores calculated by the sublanguagecomponent are linearly combined with BBN s scores, with the result used to select the best hypothesis from the N best sentences. We optimized ....

Satoshi Sekine: "Automatic Sublanguage Identification for a New Text" Second Annual Workshop on Very Large Corpora (1994)


Language Modeling With Sentence-Level Mixtures - Iyer (1994)   (11 citations)  (Correct)

No context found.

S. Sekine, "Automatic Sublanguage Identification for a New Text", Second Annual Workshop on Very Large Corpora, Kyoto, Japan, pp.109-120, August 1994. 86

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC