Results 1 -
5 of
5
CALIBRATION OF AUDIO-VIDEO SENSORS FOR MULTI-MODAL EVENT INDEXING
"... This paper addresses the coordinated use of video and audio cues to capture and index surveillance events with multimodal labels. The focus of this paper is the development of a joint-sensor calibration technique that uses audio-visual observations to improve the calibration process. One significant ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
This paper addresses the coordinated use of video and audio cues to capture and index surveillance events with multimodal labels. The focus of this paper is the development of a joint-sensor calibration technique that uses audio-visual observations to improve the calibration process. One significant feature of this approach is the ability to continuously check and update the calibration status of the sensor suite, making it resilient to independent drift in the individual sensors. We present scenarios in which this system is used to enhance surveillance. Index Terms — Calibration, Multimedia system 1.
Online Diarization of Streaming Audio-Visual Data for Smart Environments
- Selected Topics in Signal Processing
, 2010
"... Abstract—For an environment to be perceived as being smart, contextual information has to be gathered to adapt the system’s behavior and its interface towards the user. Being a rich source of context information speech can be acquired unobtrusively by microphone arrays and then processed to extract ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract—For an environment to be perceived as being smart, contextual information has to be gathered to adapt the system’s behavior and its interface towards the user. Being a rich source of context information speech can be acquired unobtrusively by microphone arrays and then processed to extract information about the user and his environment. In this paper, a system for joint temporal segmentation, speaker localization, and identification is presented, which is supported by face identification from video data obtained from a steerable camera. Special attention is paid to latency aspects and online processing capabilities, as they are important for the application under investigation, namely ambient communication. It describes the vision of terminal-less, session-less and multi-modal telecommunication with remote partners, where the user can move freely within his home while the communication follows him. The speaker diarization serves as a context source, which has been integrated in a service-oriented middleware architecture and provided to the application to select the most appropriate I/O device and to steer the camera towards the speaker during ambient communication. Index Terms—ambient communication, diarization, middleware. I.
Association of Sound to Motion in Video using Perceptual Organization
"... Traditionally, video surveillance has mostly made use use of visual data. However, in light of the new behavioral and physiological studies which demonstrated the existence of cross modality effects in human perception, similar cues are being used to develop a surveillance system based on both audio ..."
Abstract
- Add to MetaCart
(Show Context)
Traditionally, video surveillance has mostly made use use of visual data. However, in light of the new behavioral and physiological studies which demonstrated the existence of cross modality effects in human perception, similar cues are being used to develop a surveillance system based on both audio and visual data. Human beings can easily associate a particular sound to an object in the surrounding. Drawing from such studies, we demonstrate a techinque by which we can isolate concurrent audio and video events and associate them based on perceptual grouping principles. Simple cues from both audio and video suffice to make this association. By representing audio in the pitch-time domain, we can use image processing algorithms such as line detection to extract elementary audio events. These events are then grouped using Gestalt principles of similarity and proximity into appropriate auditory events. Properties such as time of occurence and periodicty are easily calculated from these groups. In video, we extract motion and shape periodicities. By comparing all the periodicities in audio and video using a simple index we can easily associate audio to video. 1.
Real-Time Fine-Grained Multiple-Target Tracking on a Extensible Virtual Fab Architecture Using Multi-Agents
"... The concept of a “Virtual Fab ” (VF) stems from the IC industry and emphasizes the idea of intangible manufacturing service provision. In addition to satisfying customers ’ order fulfillment, providing remote service accessibility and real-time data granularity with gratifying performance and flexib ..."
Abstract
- Add to MetaCart
(Show Context)
The concept of a “Virtual Fab ” (VF) stems from the IC industry and emphasizes the idea of intangible manufacturing service provision. In addition to satisfying customers ’ order fulfillment, providing remote service accessibility and real-time data granularity with gratifying performance and flexibility are among major challenges of a VF system. In this paper, we propose an architecture for a VF based on multi-agents that provides fine-grained real-time multiple-target tracking service for both the customers and internal managerial personnel of an IC foundry. Each function module is constructed as an autonomous agent and performs dedicated tasks. To provide flexibility and allow for future enhancements, our proposed extensible architecture also can utilize currently available RFID (Radio Frequency Identification) techniques to cooperate with other tracking sensors for the purpose of improving tracking accuracy and mitigating the power consumption issues inherent in RFID systems. Finally, an integrated simulation which utilizes technologies including RFID, Web Services and embedded-systems has also been conducted to demonstrate the effectiveness of this architecture.