Results 1 -
2 of
2
Bridging Vision and commonsense for Multimodal Situation Recognition in Pervasive Systems
"... Pervasive services may have to rely on multimodal classification to implement situation-recognition. However, the effectiveness of current multimodal classifiers is often not satisfactory. In this paper, we describe a novel approach to multimodal classification based on integrating a vision sensor w ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Pervasive services may have to rely on multimodal classification to implement situation-recognition. However, the effectiveness of current multimodal classifiers is often not satisfactory. In this paper, we describe a novel approach to multimodal classification based on integrating a vision sensor with a commonsense knowledge base. Specifically, our approach is based on extracting the individual objects perceived by a camera and classifying them individually with nonparametric algorithms; then, using a commonsense knowledge base, classifying the overall scene with high effectiveness. Such classification results can then be fused together with other sensors, again on a commonsense basis, for both improving classification accuracy and dealing with missing labels. Experimental results are presented to assess, under different configurations, the effectiveness of our vision sensor and its integration with other kinds of sensors, proving that the approach is effective and able to correctly recognize a number of situations in open-ended environments.
Recognizing Activities from Context and Arm Pose using Finite State Machines
"... Abstract—We present an activity-recognition system for assisted living applications and smart homes. While existing systems tend to rely on expensive computation of comparatively largedimension data sets, ours leverages information from a small number of fundamentally different sensor measurements t ..."
Abstract
- Add to MetaCart
Abstract—We present an activity-recognition system for assisted living applications and smart homes. While existing systems tend to rely on expensive computation of comparatively largedimension data sets, ours leverages information from a small number of fundamentally different sensor measurements that provide context information pertaining the person’s location, and action information by observing the motion of the body and arms. Camera nodes are placed on the ceiling to track people in the environment, and place them in the context of a building map where areas and objects of interest are premarked. Additionally, a single inertial sensor node is placed on the subject’s arm to infer arm pose, heading and motion frequency using an accelerometer, gyroscope and magnetometer. These four measurements are parsed using a lightweight hierarchy of finite state machines, yielding recognition rates with high precision and recall values (0.92 and 0.93, respectively). I.

