| B. Arons. Speechskimmer: Interactively skimming recorded speech. In Proceedings of ACM Symposium on User Interface Software and Technology, pages 187--196. ACM, 1993. |
....interface, TattleTrail performs speech processing techniques for skimming recorded chat messages. Therefore, techniques for skimming audio documents are examined below. 2.4. 1 SpeechSkimmer SpeechSkimmer was a user interface for interactively browsing through speech recordings at di#erent speeds [2]. A user could rapidly skim recorded speech through an interactive touchpad supporting skimming both forward and backward at di#erent levels. The system used the synchronized overlap add algorithm for time scale modification to compress speech at varying rates, while applying techniques to ....
....would not be intelligible. Instead, it plays 4 second chunks of audio at the user s desired speed, but in reverse order. Therefore, 4 second bits of audio are heard, then TattleTrail jumps back 4 seconds to play the next chunk. This technique is similar to the one implemented by SpeechSkimmer [2]. When playing backward, and the speed is decreased, the rate of compression is actually increased, 37 which will allow the user to play backward faster. To jump to maximum speed going backward, the user can double click down. Double clicking up or down typically jumps to the maximum speed of ....
[Article contains additional citation context not shown here]
B. Arons. Speechskimmer: Interactively skimming recorded speech. In Proceedings of ACM Symposium on User Interface Software and Technology, pages 187--196. ACM, 1993.
....robust against variations in the speaker and background noise, as it needs neither a language nor acoustic model in advance. In investigating the acquisition of useful information from speech without linguistic knowledge, there are methods for segmenting speech according to emphasis using pauses [1] or prosodic information [2] but none have so far used repetition. We also propose an algorithm called Incremental Reference At present, Sharp Corporation At present, Kawasaki Steel Corporation Interval free Continuous Dynamic Programming (IRIFCDP) for detecting pairs of phonetically similar ....
B. Arons, "SpeechSkimmer: Interactively Skimming Recorded Speech", Proc. UIST '93, pp.187-196, 1993.
....information within an audio recording deemed relevant by a user, we need to identify entry points jump locations. Further, multiple contiguous segments may form a relevant and useful news item. 7 As a starting point both a change of speaker and long pauses can serve to identify entry points [2]. For long pause detection, we use short time energy (E n ) which provides a measurement for distinguishing speech from silence for a frame (consisting of a fixed number of samples) which can be calculated by the following equation [18] n m 1 N n m 2 x(m) ....
B. Arons, "SpeechSkimmer: Interactively Skimming Recorded Speech," in Proc. of ACM Symposium on User Interface Software and Technology, pp. 187-196, Nov 1993.
....the name for the process through which concepts are associated with audio objects. The completeness and accuracy of the annotation process contributes to the success of customization. In this section, we describe techniques for automatic and manual annotation. Using long pauses or speaker changes [4] after the segmentation of broadcast audio into what we call audio objects, we need to associate these objects with concepts of ontologies. Note that broadcast audio consists of multiple news items. Automatic Annotation: Word spotting techniques can provide the selected content extraction to make ....
B. Arons. Speech Skimmer: Interactively Skimming Records Speech. Ph.D. Thesis, MIT Media Lab, 1994.
....than a generic model for language phenomenons. Our solution to the document navigation problem is incomplete. Although C99 is signi cantly more accurate than existing algorithms, a 12 error rate is unacceptable for human readers. Discourse parsing [15, 7] had similar problems. The ndings of [11, 2, 21, 8, 17] suggests a more practical and robust solution rely on the logical structure, e.g. paragraph and section hierarchy. Thus, future work will focus on the automatic extraction of logical structure from printed documents. ....
Barry Arons. Speechskimmer: Interactively skimming recorded speech. In Processings of UIST'93: ACM Symposium on User Interface Software and Technology, pages 187-196. ACM Press, 1993. Atlanta.
....jump. The structural description of the audio provides meaningful jump locations. Skimming the contents of a recording by listening to segments of audio following these locations (rather than random locations) has been shown to be a more effective way to get the gist of the contents of a recording [Arons]. The hand held interface is the result of an iterative process of design and usability testing. 1.5 Overview of the Document Chapter 2 reviews related work in both structured audio and audio interface design. Chapter 3 describes the speech processing algorithms which are used in NewsComm to ....
....and outlines future directions for the NewsComm system. Related Work 16 Chapter 2 Related Work This chapter reviews several research systems which have addressed issues related to this thesis. 2. 1 SpeechSkimmer SpeechSkimmer is a hand held interface for interactively skimming recorded speech [Arons]. The interface enables the user to listen to a speech recording at four levels of skimming. At the lowest level the entire recording is heard. At the second level pauses are shortened. At the third level, only short segments (called highlights in this thesis) of the recording following long ....
[Article contains additional citation context not shown here]
Arons, B. Speech Skimmer: Interactively Skimming Recorded Speech. Ph.D. thesis, MIT Media Lab, 1994.
....generation, but a number of problems need to be addressed in this effort. These problems are discussed within the context of a specific speech recognizer in Section 1.4.3. The audio conveys other information besides just dialog. Researchers have made progress in identifying pauses and silence [Arons93], as well as specialized audio parsers for music, laughter, and other highly distinct acoustic phenomena [Hawley93] This information can supplement the other structured descriptors, and some such as pauses may be especially useful to identify natural start and stop times for video paragraphing as ....
Arons, B. "SpeechSkimmer: Interactively Skimming Recorded Speech," Proc. of ACM Symposium on User Interface Software and Technology (UIST)`93, Nov. 3-5, 1993, Atlanta, GA, pp. 187-196.
....process through speech recognition, we currently rely on manual transcription and closed captions. Techniques in speech recognition will also be used to segment video based on transitions between speakers and topics which are usually marked by silence or low energy areas in the acoustic signal[2]. 2.2 Scene Segmentation To analyze each segment as individual scenes, we must first identify frames where scene changes occur. Several techniques have been developed for detecting scene breaks [11] 4] 13] We choose to segment video through the use of a comparative histogram difference ....
Arons, B. "SpeechSkimmer: Interactively Skimming Recorded Speech," Proc. of ACM Symposium on User Interface Software and Technology (UIST)'93, November 3-5, 1993, Atlanta, GA, pp. 187-196.
....information. The problem with audio interfaces (e.g. voice mail) is that they tend to be slow and serial, and do not allow random access. Researchers have begun to address these problems by providing the user with interactive control over audio playback and the ability to skim audio recordings [Arons 1993, Stifelman 1993] Several researchers have proposed building audio interfaces that make use of the cocktail party effect the human s ability to selectively attend to a single talker or stream of audio among a cacophony of others to reduce the amount of time required to listen [Arons 1992, ....
B. Arons. SpeechSkimmer: Interactively Skimming Recorded Speech. In Proceedings of the ACM Symposium on User Interface Software and Technology, pages 187-195. ACM, 1993.
....now possible to listen to recordings in entirely new ways. For example, recordings may be played faster than real time without pitch distortion [1] although comprehension is lost after about a factor of two. It is also possible to skim speech by listening only to segments following a long silence [2]. Random access to audio allows a user to instantly skip forward or backward to a desired location within a recording, without requiring fast forward or reverse as in sequential media. Taking advantage of this capability, however, requires generation of necessary indices, as well as an interface ....
B. Arons, "SpeechSkimmer: Interactively skimming recorded speech," Proc. UIST: ACM Symposium on User Interface and Speech Technology, pp. 187-196, November 1993.
....framework is also presented. The speaker indexing system is currently being incorporated into several application systems in the Speech Group at the MIT Media Lab. 1. Introduction The Speech group at the MIT Media Lab is exploring methods for accessing large amounts of recorded speech efficiently (Arons, 1994; Mullins, 1995; Schmandt, 1994) One approach we are taking is to tag salient segments of a speech recording, and then design interfaces to navigate through the speech using those tags (Arons, 1994; Mullins, 1995) Early versions of these systems relied primarily on pause and pitch information to ....
.... at the MIT Media Lab is exploring methods for accessing large amounts of recorded speech efficiently (Arons, 1994; Mullins, 1995; Schmandt, 1994) One approach we are taking is to tag salient segments of a speech recording, and then design interfaces to navigate through the speech using those tags (Arons, 1994; Mullins, 1995) Early versions of these systems relied primarily on pause and pitch information to locate salient segments of audio. For example, SpeechSkimmer plays short segments of a speech recording which directly follow long pauses as a way of skimming the entire contents of the recording ....
[Article contains additional citation context not shown here]
Arons, B. (1994). Speech Skimmer: Interactively Skimming Recorded Speech. Ph.D. thesis, MIT Media Lab.
....busy. NewsComm is aimed at mobile users who are performing some other simultaneous task such as walking, exercising, or the morning commute. Visually impaired individuals are also potential users. RELATED WORK SpeechSkimmer is a hand held interface for interactively skimming speech recordings [1]. The interface enables the user to listen to a single speech recording at four levels of skimming. At the lowest level the entire recording is heard. At the second level pauses are shortened. At the third level, only short segments (highlights) of the recording following long pauses are played. ....
....the recording. Ideally these annotations would point to semantically meaningful events such as a change of speaker, change of topic, or start of a new sentence. Arons found that simple techniques using pause and pitch analysis can be used to locate features which are often semantically significant [1]. We adopt this approach in the NewsComm system which uses pause and voice analysis to locate long pauses and changes in speaker [7] The locations and lengths of all pauses in a recording constitute a structural description of a recording. The locations of all speaker changes constitute a second ....
Arons, B. Speech Skimmer: Interactively Skimming Recorded Speech. Ph.D. thesis, MIT Media Lab, 1994.
....used to control the interface. Speech is fast for the author but slow and tedious for the listener. Speech is sequential and exists only temporally; the ear cannot browse around a set of recordings the way the eye can scan a screen of text and images. Hence techniques such as interactive skimming [2], non linear access and indexing [15, 24] and audio spatialization [10, 21] must be considered for browsing audio. Many of these audio techniques can be used to augment existing visual wearable interfaces. We stress that design for wearable audio computing requires attention to the affordances and ....
Barry Arons. "SpeechSkimmer: Interactively Skimming Recorded Speech". Proceedings of UIST' 93, November 1993.
....can be used to find pauses in recordings [Rabiner93] A variety of methods based on auto correlation based peak picking may be used to estimate the fundamental frequency of speech. Arons reports that indexing into speech recordings based on pause and pitch structure increased access efficiency [Arons93]. Speaker identification is the process of identifying which member of a previously known set of speakers is most likely to have generated a speech sample [Furui96] 1 Although we should not be too worried about this in the long run; people once thought that telephones would not succeed since no ....
Barry Arons. "SpeechSkimmer: Interactively Skimming Recorded Speech". Proceedings of UIST' 93, ACM Press, Nov. 1993.
....manually added. It is an interesting research question to ask how recorded information from the lecture (e.g. gestures gleaned from the video recording, segmenting the audio) can be processed to determine when audio links should be created and how they can meaningfully be attached to the material [2]. 5 EXPERIENCE AND INITIAL EVALUATION One of the major contributions of this work is its application in a large scale educational setting. Table 1 summarizes the variety of experiences we have had so far in applying Classroom 2000 technology. Up to this point in the paper, we have focused ....
....and video links was so time consuming that after the fourth lecture we were no longer able to devote the resources to continue the service. We see now the advantage of having tools that use natural actions of the teacher to automate the audio and video augmentation of Web pages, as described in [2, 9]. Such actions could include long pauses in their speech or mouse movements to draw attention to some area on the page. In the FCE seminar, we both videotaped and used the MessagePad outline annotator. We provided a template file to prepare the outline of the class discussion; this was a useful ....
B. Arons. SpeechSkimmer: Interactively skimming recorded speech. In Proceedings of the ACM UIST'93 Symposium, pages 187--196, 1993.
....[2] Pauses within algebraic prosody divide an expression into terms and indicate the onset and termination of other syntactic groups. Pitch also helps to group items and indicate the start of new information. Pauses are thought to afford valuable processing time for each chunk of information [1], and in this context would allow more time for performing mathematical tasks. These extra cues, the shorter utterance and a better pace for the interaction combine to make the prosodic condition much more usable. In the nothing condition, despite having no lexical cues, the recall of content was ....
Barry Arons. Speechskimmer: Interactively skimming recorded speech. In Proceedings of the ACM Symposium on User Interface Software and Technology, pages 187--196, 1993.
....build audio retrieval and indexing systems. The most popular approach has been to index audio based on content words using either large vocabulary speech recognition or keyword spotting [1, 2, 3, 4, 5, 6] Other cues including pitch contour, pause locations and speaker changes have also been used [7, 8, 9]. In one system the closed caption text of television news broadcasts was aligned to the audio track based on pause locations enabling users to perform searches on text and then access corresponding audio [10] Audio retrieval systems will continue to grow in importance as digital archives become ....
B. Arons. Speech Skimmer: Interactively Skimming Recorded Speech. Ph.D. thesis, MIT Media Laboratory, 1994.
....(discussed in section 4.7.6) Speech is fast for the author but slow and tedious for the listener. Speech is sequential and exists only temporally; the ear cannot browse around a set of recordings the way the eye can scan a screen of text and images. Hence, techniques such as interactive skimming [Arons93], non linear access and indexing [Stifleman93, Wilcox97] and audio spatialization [Mullins96] must be considered for browsing audio. Design for wearable audio computing requires (1) attention to the affordances and constraints of speech and audio in the interface (2) coupled with the physical form ....
Arons, Barry. SpeechSkimmer: Interactively Skimming Recorded Speech. Proceedings of UIST' 93, November 1993.
....unlike the tape transport controls on modern audio video playback devices. The impending end of a sentence could also be exaggerated by artificially lowering the pitch of the recording. Pitch correction techniques can be utilized to ensure that speed up playback still produces intelligible output [Aro93]. Hyper linked Audio Navigation All audio content can be conceived of as nodes within a hypertextual framework. Audio nodes can be grouped within other abstract containers and links between the audio content of individual nodes can be established. Navigational access is permitted by using a ....
Barry Arons. SpeechSkimmer: Interactively Skimming Recorded Speech. Proceedings of UIST' 93, ACM Press, Nov. 1993.
....feedback to reduce the time needed to listen. SpeechSkimmer presents a multi level structural approach to auditory skimming, and user interface techniques for interacting with recorded speech. The SpeechSkimmer user interface and a pause based technique for segmenting recordings are detailed in [7] [8] This research was performed at Interval Research Corporation and the MIT Media Laboratory, and was funded by Interval Research Corporation. This paper describes a technique for finding emphasized portions in a speech recording. The algorithm adapts to the pitch range of a talker, and then ....
B. Arons. SpeechSkimmer: Interactively Skimming Recorded Speech. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST), ACMSIGGRAPH and ACM SIGCHI, ACM Press, Nov. 1993, pp. 187--196.
.... due to the path length difference to the ears to produce a virtual sound S when presented over headphones (figure 4B) It is useful to be able to present time compressed speech in a virtual acoustic display, such as in user interfaces that allow skimming or browsing of recorded audio material [18, 19], or systems that attempt to present multiple streams of recorded speech simultaneously [20, 21] Presenting speech that has been time compressed using the basic sampling or SOLA techniques in a spatial audio display system is straightforward, as it can be treated like any other audio source. ....
Arons, B. SpeechSkimmer: Interactively Skimming Recorded Speech. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST), ACM SIGGRAPH and ACM SIGCHI, ACM Press, Nov. 1993, 187--196.
No context found.
B. Arons, "SpeechSkimmer: Interactively Skimming Recorded Speech," in Proc. of ACM Symposium on User Interface Software and Technology, pp. 187196, Nov 1993.
No context found.
Arons, B. "SpeechSkimmer: Interactively Skimming Recorded Speech". Proc. USIT 1993: ACM Symposium on User Interface Software and Technology, November 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC