80 citations found. Retrieving documents...
M. Hearst. 1994. Multi-paragraph segmentation of expository text. In Proc. of the ACL.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Topic Segmentation: Algorithms and Applications - Reynar (1998)   (11 citations)  (Correct)

....of a large number of first uses of open class words marking a new segment. This is a fragment of a transcript of an episode of National Public Radio s show All Things Considered. Words in bold are used for the first time. The gap separates two news stories. 32 using the vector space model [Hearst, 1994b] our optimization algorithm [Reynar, 1994] and a number of other algorithms in the literature approximate the identification of simple lexical cohesion relationships by looking at patterns of word repetition. Figure 3.5 shows the number of word repetitions within an excerpt of NPR s program All ....

....repetition without a statistical model of language. In fact, assuming no preprocessing is done, we could segment text in any language which is not highly agglutinative using our optimization algorithm [Reynar, 1994] or the version of Hearst s TextTiling which does not normalize for term frequency [Hearst, 1994b] Algorithms which rely on word frequency, such as the language modeling technique developed by Beeferman et al. Beeferman et al. 1997b] require knowing the language of the text and make assumptions about its content as well. Such assumptions are necessary because the frequency of occurrence ....

[Article contains additional citation context not shown here]

Hearst, M. A. (1994b). Multi-paragraph segmentation of expository text. pages 9--16, Las Cruces, New Mexico.


A Statistical Information Extraction System for Turkish - Tür (2000)   (Correct)

....of this technique to text structuring uses word repetition information to divide a text into those regions determined to be most coherent by an optimization algorithm. The method has been successfully used to discover the document boundaries in concatenations of Wall Street Journal articles. Hearst [1994; 1997] uses cosine similarity in a word vector space as an indi cator of topic similarity. This algorithm, called TextTiling, is a simple, domain independent technique, that assigns a score to each topic boundary candidate (sentence boundaries) Topic boundaries are placed at the locations of ....

Hearst, M. A. 1994. Multi-paragraph Segmentation of Expository Text. In Proceedings of the 3nd Annual Meeting of the Association for Com- putational Linguistics, New Mexico State University, Las Cruces, NM. 9 16.


Beyond Broadcast - Livingston, Dredze, Hammond, Birnbaum   (Correct)

....them with options, we are also trying to determine the quantities and extent of information that is appropriate. Segmentation independent of the CC topic change marker must be improved. Several methods, such as discourse cues studied by Hirschberg and Litman [7] or Hearst s Text Tiling algorithm [6] will likely increase accuracy. More general goals include providing information via different forms of media, as well as integrating the user interface with the television itself. Finally, the long term focus is to continue to develop an overall theory of viewer interaction with television. For ....

Hearst, M.A. Multi-paragraph Segmentation of Expository Text. Proceedings of the ACL, (1994).


Text Segmentation by Product Partition Models and.. - Kehagias.. (2002)   (2 citations)  (Correct)

....segment deals with a particular subject while contiguous segments deal with different subjects. In this manner documents relevant to a query can be retrieved from a large database of unformatted (or loosely formatted) text. For an overview of the problem and various methods for its solution see [3, 4, 7, 8, 12, 13, 15, 18, 19, 20]. Department of Math. Phys. and Comp. Sciences, Electrical mxd Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Greece. IDepartment of Business Administration, University of Macedonia, Thessaloniki, Greece. Department of Electrical and Computer Engineering, ....

M. Hearst. "Multi-paragraph segmentation of expository text". In Proc. of the $2nd Annual Meet- ing of the Association for Computational Linguistics, Las Cruces, NM, 1994.


Improving the search on the Internet by using WordNet and.. - Moldovan, Mihalcea (1998)   (Correct)

....proves to be an easy task, one could just make use of the punctuation to solve this problem. Instead, paragraph segmentation is much more difficult, and this is due first of all to the highly unstructured texts that can be found on the Web. Work developed in this direction is presented in [Hearst 1994] and [Callan 1994] But these methods work only for structured texts, containing apriori known lexical separators (i.e. a tag, an empty line etc. Thus, we had to use a method that covers almost all the possible paragraph separators that can occur in the texts on the web. The paragraph separators ....

Hearst, M.A. Multi-paragraph segmentation of expository text. Proceedings of the 32th Annual Meeting of the Association for Computational Linguistics, 9-16, Las Cruces, New Mexico, 1994.


Improving the search on the Internet by using WordNet and.. - Moldovan, Mihalcea (1998)   (Correct)

....proves to be an easy task, one could just make use of the punctuation to solve this problem. Instead, paragraph segmentation is much more difficult, and this is due first of all to the highly unstructured texts that can be found on the Web. Work developed in this direction is presented in [Hearst 1994] and [Callan 1994] But these methods work only for structured texts, containing apriori known lexical separators (i.e. a tag, an empty line e tc. Thus, we had to use a method that covers almost all the possible paragraph separators that can occur in the texts on the web. The paragraph ....

Hearst, M.A. Multi-paragraph segmentation of expository text. Proceedings of the 32th Annual Meeting of the Association for Computational Linguistics, 9-16, Las Cruces, New Mexico, 1994.


A Statistical Information Extraction System for Turkish - Tür (2000)   (Correct)

....of this technique to text structuring uses word repetition information to divide a text into those regions determined to be most coherent by an optimization algorithm. The method has been successfully used to discover the document boundaries in concatenations of Wall Street Journal articles. Hearst [1994; 1997] uses cosine similarity in a word vector space as an indi cator of topic similarity. This algorithm, called TextTiling, is a simple, domain independent technique, that assigns a score to each topic boundary candidate (sentence boundaries) Topic boundaries are placed at the locations of ....

Hearst, M. A. 1994. Multi-paragraph Segmentation of Expository Text. In Proceedings of the 32nd Annual Meeting of the Association for Com- putational Linguistics, New Mexico State University, Las Cruces, NM. 9-16.


A Clustering-Based Algorithm for Automatic Document.. - Collins-Thompson, Nickolov (2002)   (Correct)

....Story detection segmentation [Allan98] track has also spawned work on identifying topic boundaries in text and spoken audio. For example, Beeferman et al. [Beefer99] use an exponential model based on topicality and cue word features to partition text into coherent segments. Earlier work of Hearst [Hearst94] on TextTiling used a cosine similarity measure as part of an algorithm to subdivide texts into multi paragraph subtopics. We are unaware of any published work on the related problem for document images: performing automatic document separation. Current document image management applications ....

....Layout features can be obtained by image segmentation techniques, and do not require full OCR, although that is how we obtain them in our implementation. If reliable text from OCR is available, we can include a simple word based cohesion measure between two pages. Similar to TextTiling [Hearst94], we use a vector space model where each page is represented by a vector of word frequencies, and the similarity measure is the normalized cosine between the word vectors of the two pages. We exclude very common words, and stem words using Porter s algorithm. Because the text from OCR may contain ....

M. A. Hearst. Multi-paragraph segmentation of expository text. Proceedings of the 32 nd Meeting of the Association for Computational Linguistics (ACL '94), June 1994. Las Cruces, NM, USA.


Effective Ranking with Arbitrary Passages - Kaszkiel, Zobel (2001)   (5 citations)  (Correct)

....properties of documents such as sentences , paragraphs,andsections [14, 32, 47, 52] Each of these individual structures are considered as passages or are used as building blocks for larger passages. Other passage types are based on topics derived by segmenting documents into single topic units [2, 13, 26, 27, 28, 33, 36]. Yet other passage types are based on fixed length blocks [5, 41] The individual results reported in the literature show that passage level access is of benefit in full text databases. One of the outcomes of this paper is an evaluation of the e#ectiveness of di#erent passage types in a common ....

....Retrieved passages were used to assess documents as being either relevant or non relevant. The samples of documents judged were as accurate as the o#cial judgments [44, 45] strongly suggesting that short passages can be used to indicate relevance. Hearst and Plaunt s TextTiling algorithm [13, 14] partitions full length documents into multi paragraph units in order to approximate a document s subtopic structure. Such an approach is particularly useful when document structure is absent or does not reflect the text content. Passages can also be used in relevance feedback and automatic query ....

[Article contains additional citation context not shown here]

M. Hearst. Multi-paragraph segmentation of expository texts. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pages 9--16, Las Cruces, New Mexico, USA, June 1994.


How Small a Distinction Among Summaries Can The Evaluation Method.. - Nakao (2001)   (Correct)

....the same size and thus can systematically detect thematic textual segments of different sizes, ranging from segments slightly smaller than the entire text to segments of about one paragraph. The thematic hierarchy detection algorithm decomposes a text in a similar way as the TextTiling algorithm[2] does. The algorithm calculates a cohe sion score at fixed width intervals in a source text. A cohesion score is calculated based on the lexical sim ilarity of two adjacent blocks of a fixed size by the following formula: C(bl, b = Etwt,bt (1) where bl and b,are the textual block in the left ....

M. A. Hearst. Multi-paragraph segmentation of expository text. In Proc. of the 32nd Annual Meeting of Association for Computational Linguistics, pages 9 16, 1994.


A Statistical Information Extraction System for Turkish - Tür (2000)   (Correct)

....of this technique to text structuring uses word repetition information to divide a text into those regions determined to be most coherent by an optimization algorithm. The method has been successfully used to discover the document boundaries in concatenations of Wall Street Journal articles. Hearst [1994; 1997] uses cosine similarity in a word vector space as an indicator of topic similarity. This algorithm, called TextTiling, is a simple, domain independent technique, that assigns a score to each topic boundary candidate (sentence boundaries) Topic boundaries are placed at the locations of ....

Hearst, M. A. 1994. Multi-paragraph Segmentation of Expository Text. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico State University, Las Cruces, NM. 9--16.


Using Attribute Grammars to Uniformly Represent Structured.. - Gancarski   (Correct)

....of the text. This measure is based on both the frequency of the term in a document and the frequency of the term in all the documents. With the emergence of structured documents, the IR evolved in two directions: 1) retrieving documents taking into account the structural parts relevance ( Wil94] [Hea94]) and (2) 2 enriching the query formats with structural information to retrieve certain parts of documents ( NBY95] KM93] Recent works tried to establish some form of relevance ranking in the results ( Lal00] WFC99] HTK00] SN00] but it is still an opened research area. The DASTIR ....

M. Hearst, Multi-paragraph segmentation of expository text, 23nd Annual Meeting of the Association for Computational Linguistics, pages 9-16, New Mexico State University, Las Cruces, New Mexico, 1994.


Rich Interaction in the Digital Library - Rao, Pedersen, Hearst.. (1995)   (39 citations)  Self-citation (Hearst)   (Correct)

....of the term sets in the document, and (iii) the distribution of the term sets with respect to the document and to one another. To facilitate display of distribution information, each document is partitioned in advance into a set of subtopical segments using an algorithm called TextTiling [7]. Figure 4 shows an example run on the query (virus) AND (vaccination protection cure) AND (illegal fbi damage police crime) with implicit ORs among the terms within each term set. Each large rectangle indicates a document, and each square within the document represents a coherent text segment ....

Hearst, M.A. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Meeting of the Association for Computational Linguistics, June 1994.


Discourse Segmentation of Multi-Party Conversation - Michel Galley Kathleen (2003)   (4 citations)  (Correct)

No context found.

M. Hearst. 1994. Multi-paragraph segmentation of expository text. In Proc. of the ACL.


Sentence Ordering in Multidocument Summarization - Regina Barzilay Department (2001)   (3 citations)  (Correct)

No context found.

M. Hearst. Multi-paragraph segmentation of expository text. In Proceedings of the 32th Annual Meeting of the Association for Computational Linguistics, 1994.


ASKMi: A Japanese Question Answering System based.. - Sakai, Saito.. (2004)   (Correct)

No context found.

Hearst, M. A. (1994). Multi-Paragraph Segmentation of Expository Text, ACL '94 Proceedings (pp. 9--16).


Unity Is Strength: Coupling Media for Thematic Segmentation - Mekhaldi, Lalanne, Ingold (2004)   (Correct)

No context found.

Hearst M., Multi-Paragraph Segmentation of Expository Text, In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics 1994, Las Cruces, New Mexico.


Comparing Lexical Chain-based Summarisation.. - Doran, Stokes.. (2004)   (Correct)

No context found.

Hearst, M. 1994, Multi-paragraph segmentation of expository text, In Proceedings of the 32th Annual Meeting of the Association for Computational Linguistics, 9--16. Las Cruces, New Mexico: Association for Computational Linguistics.


Toward Semantics-Based Answer Pinpointing - Hovy, Gerber, Hermjakob.. (2001)   (5 citations)  (Correct)

No context found.

Hearst, M.A. 1994. Multi-Paragraph Segmentation of Expository Text. Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL94) .


Thematic Alignment Of Recorded Speech With Documents - Dalila Mekhaldi Chemin (2003)   (1 citation)  (Correct)

No context found.

Hearst M., Multi-Paragraph Segmentation of Expository Text, In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics 1994.


Machine Learning for Anaphora Resolution - Judita Preiss August   (Correct)

No context found.

M. Hearst. Multi-paragraph segmentation of expository text. In 32nd Annual meeting of the association for computational linguistics, 1994.


Document Visualization Using Labeled Context Blocks - Hegde, Ramanathan, Rao.. (2002)   (Correct)

No context found.

M. A. Hearst. Multi-Paragraph Segmentation of Expository Texts. UC Berkeley Computer Science Technical Report Number UCB/CSD-94-790, 1994.


Sentence Ordering in Multidocument Summarization - Regina Barzilay Department (2001)   (3 citations)  (Correct)

No context found.

M. Hearst. Multi-paragraph segmentation of expository text. In Proceedings of the 32th Annual Meeting of the Association for Computational Linguistics, 1994.


Identifying the Interactions of Multi-Criteria in Turkish.. - Yöndem (2001)   (Correct)

No context found.

Hearst, M. A., `Multi Paragraph Segmentation of Expository Text', Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 9-16, Las Cruces, New Mexico, 1994. 134


A Dynamic Programming Algorithm for Linear Text Segmentation - Fragkou, Petridis, Kehagias (2002)   (Correct)

No context found.

Hearst, M. A. (1994). "Multi-paragraph segmentation of expository texts". In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistic, pp. 9-16.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC