Results 1  10
of
13
VisualInertial Navigation, Mapping and Localization: A Scalable RealTime Causal Approach
, 2010
"... We present a model to estimate motion from monocular visual and inertial measurements. We analyze the model and characterize the conditions under which its state is observable, and its parameters are identifiable. These include the unknown gravity vector, and the unknown transformation between the c ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
We present a model to estimate motion from monocular visual and inertial measurements. We analyze the model and characterize the conditions under which its state is observable, and its parameters are identifiable. These include the unknown gravity vector, and the unknown transformation between the camera coordinate frame and the inertial unit. We show that it is possible to estimate both state and parameters as part of an online procedure, but only provided that the motion sequence is “rich enough,” a condition that we characterize explicitly. We then describe an efficient implementation of a filter to estimate the state and parameters of this model, including gravity and cameratoinertial calibration. It runs in realtime on an embedded platform, and its performance has been tested extensively. We report experiments of continuous operation, without failures, reinitialization, or recalibration, on paths of length up to 30Km. We also describe an integrated approach to “loopclosure,” that is the recognition of previouslyseen locations and the topological readjustment of the traveled path. It represents visual features relative to the global orientation reference provided by the gravity vector estimated by the filter, and relative to the scale provided by their known position within the map; these features are organized into “locations ” defined by visibility constraints, represented in a topological graph, where loop closure can be performed without the need to recompute past trajectories or perform bundle adjustment. The software infrastructure as well as the embedded platform is described in detail in a technical report (Jones and Soatto (2009).)
1 Invariant Scattering Convolution Networks
, 1203
"... Abstract—A wavelet scattering network computes a translation invariant image representation, which is stable to deformations and preserves high frequency information for classification. It cascades wavelet transform convolutions with nonlinear modulus and averaging operators. The first network laye ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
(Show Context)
Abstract—A wavelet scattering network computes a translation invariant image representation, which is stable to deformations and preserves high frequency information for classification. It cascades wavelet transform convolutions with nonlinear modulus and averaging operators. The first network layer outputs SIFTtype descriptors whereas the next layers provide complementary invariant information which improves classification. The mathematical analysis of wavelet scattering networks explain important properties of deep convolution networks for classification. A scattering representation of stationary processes incorporates higher order moments and can thus discriminate textures having same Fourier power spectrum. State of the art classification results are obtained for handwritten digits and texture discrimination, with a Gaussian kernel SVM and a generative PCA classifier. 1
Sparse Occlusion Detection with Optical Flow
 INT J COMPUT VIS
, 2011
"... We tackle the problem of detecting occluded regions in a video stream. Under assumptions of Lambertian reflection and static illumination, the task can be posed as a variational optimization problem, and its solution approximated using convex minimization. We describe efficient numerical schemes tha ..."
Abstract

Cited by 22 (9 self)
 Add to MetaCart
We tackle the problem of detecting occluded regions in a video stream. Under assumptions of Lambertian reflection and static illumination, the task can be posed as a variational optimization problem, and its solution approximated using convex minimization. We describe efficient numerical schemes that reach the global optimum of the relaxed cost functional, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance, in addition to optical flow.
On the set of images modulo viewpoint and contrast changes
, 2009
"... We consider regions of images that exhibit smooth statistics, and pose the question of characterizing the “essence” of these regions that matters for recognition. Ideally, this would be a statistic (a function of the image) that does not depend on viewpoint and illumination, and yet is sufficient fo ..."
Abstract

Cited by 19 (16 self)
 Add to MetaCart
(Show Context)
We consider regions of images that exhibit smooth statistics, and pose the question of characterizing the “essence” of these regions that matters for recognition. Ideally, this would be a statistic (a function of the image) that does not depend on viewpoint and illumination, and yet is sufficient for the task. In this manuscript, we show that such statistics exist. That is, one can compute deterministic functions of the image that contain all the “information ” present in the original image, except for the effects of viewpoint and illumination. We also show that such statistics are supported on a “thin ” (zeromeasure) subset of the image domain, and thus the “information ” in an image that is relevant for recognition is sparse. Yet, from this thin set one can reconstruct an image that is equivalent to the original up to a
Videobased descriptors for object recognition
 Image and Vision Computing, 29(10):639
"... We describe a visual recognition system operating on a handheld device, based on a videobased feature descriptor, and characterize its invariance and discriminative properties. Feature selection and tracking are performed in realtime, and used to train a templatebased classifier during a capture ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
We describe a visual recognition system operating on a handheld device, based on a videobased feature descriptor, and characterize its invariance and discriminative properties. Feature selection and tracking are performed in realtime, and used to train a templatebased classifier during a capture phase prompted by the user. During normal operation, the system scores objects in the field of view based on their ranking. Severe resource constraints have prompted a reevaluation of existing algorithms improving their performance (accuracy and robustness) as well as computational efficiency. We motivate the design choices in the implementation with a characterization of the stability properties of local invariant detectors, and of the conditions under which a templatebased descriptor is optimal. The analysis also highlights the role of time as “weak supervisor ” during training, which we exploit in our implementation.
Distributed synthesis of control protocols for smart camera networks
 in ACM/IEEE International Conference on CyberPhysical Systems
, 2011
"... Abstract — We considered the problem of designing control protocols for pantiltzoom (PTZ) cameras within a smart camera network where the goal is to guarantee certain temporal logic specifications related to a given surveillance task. We first present a centralized control architecture for assigni ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
(Show Context)
Abstract — We considered the problem of designing control protocols for pantiltzoom (PTZ) cameras within a smart camera network where the goal is to guarantee certain temporal logic specifications related to a given surveillance task. We first present a centralized control architecture for assigning PTZ cameras to targets so that the specification is met for any admissible behavior of the targets. Then, in order to alleviate the computational complexity associated with LTL synthesis and to enable implementation of local control protocols on individual PTZ cameras, we propose a distributed synthesis methodology. The main idea is to decompose the global specification into local specifications for each PTZ camera. These decompositions allow the protocols for each camera to be separately synthesized and locally implemented while guaranteeing the global specifications to hold. A thorough design example is presented to illustrate the steps of the proposed procedure. I.
Appearancebased Active, Monocular, Dense Reconstruction for Micro Aerial Vehicles
"... Abstract—In this paper, we investigate the following problem: given the image of a scene, what is the trajectory that a robotmounted camera should follow to allow optimal dense depth estimation? The solution we propose is based on maximizing the information gain over a set of candidate trajectories ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we investigate the following problem: given the image of a scene, what is the trajectory that a robotmounted camera should follow to allow optimal dense depth estimation? The solution we propose is based on maximizing the information gain over a set of candidate trajectories. In order to estimate the information that we expect from a camera pose, we introduce a novel formulation of the measurement uncertainty that accounts for the scene appearance (i.e., texture in the reference view), the scene depth and the vehicle pose. We successfully demonstrate our approach in the case of realtime, monocular reconstruction from a micro aerial vehicle and validate the effectiveness of our solution in both synthetic and real experiments. To the best of our knowledge, this is the first work on active, monocular dense reconstruction, which chooses motion trajectories that minimize perceptual ambiguities inferred by the texture in the scene. I.
Learning the Irreducible Representations of Commutative Lie Groups
"... We present a new probabilistic model of compact commutative Lie groups that produces invariantequivariant and disentangled representations of data. To define the notion of disentangling, we borrow a fundamental principle from physics that is used to derive the elementary particles of a system from ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present a new probabilistic model of compact commutative Lie groups that produces invariantequivariant and disentangled representations of data. To define the notion of disentangling, we borrow a fundamental principle from physics that is used to derive the elementary particles of a system from its symmetries. Our model employs a newfound Bayesian conjugacy relation that enables fully tractable probabilistic inference over compact commutative Lie groups – a class that includes the groups that describe the rotation and cyclic translation of images. We train the model on pairs of transformed image patches, and show that the learned invariant representation is highly effective for classification. 1.
Actionable Saliency Detection: Independent Motion Detection Without Independent Motion Estimation
"... We present a model and an algorithm to detect salient regions in video taken from a moving camera. In particular, we are interested in capturing small objects that move independently in the scene, such as vehicles and people as seen from aerial or ground vehicles. Many of the scenarios of interest c ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We present a model and an algorithm to detect salient regions in video taken from a moving camera. In particular, we are interested in capturing small objects that move independently in the scene, such as vehicles and people as seen from aerial or ground vehicles. Many of the scenarios of interest challenge existing schemes based on background subtraction (background motion too complex), multibody motion estimation (insufficient parallax), and occlusion detection (uniformly textured background regions). We adopt a robust statistical inference approach to simultaneously estimate a maximally reduced regressor, and select regions that violate the null hypothesis (covisibility under an epipolar domain deformation) as “salient”. We show that our algorithm can perform even in the absence of camera calibration information: while the resulting motion estimates would be incorrect, the partition of the domain into salient vs. nonsalient is unaffected. We demonstrate our algorithm on video footage from helicopters, airplanes, and ground vehicles. Figure 1: Detecting salient regions under camera motion: Left: Tracked feature points (blue) are classified as inliers (green) or outliers (red). Right: Estimated salient point density obtained by our algorithm. 1.
1 Actionable Information in Vision
"... Summary. A notion of visual information is introduced as the complexity not of the raw images, but of the images after the effects of nuisance factors such as viewpoint and illumination are discounted. It is rooted in ideas of J. J. Gibson, and stands in contrast to traditional information as entrop ..."
Abstract
 Add to MetaCart
(Show Context)
Summary. A notion of visual information is introduced as the complexity not of the raw images, but of the images after the effects of nuisance factors such as viewpoint and illumination are discounted. It is rooted in ideas of J. J. Gibson, and stands in contrast to traditional information as entropy or coding length of the data regardless of its use, and regardless of the nuisance factors affecting it. The noninvertibility of nuisances such as occlusion and quantization induces an “information gap ” that can only be bridged by controlling the data acquisition process. Measuring visual information entails early vision operations, tailored to the structure of the nuisances so as to be “lossless ” with respect to visual decision and control tasks (as opposed to data transmission and storage tasks implicit in communications theory). The definition of visual information suggests desirable properties that a visual representation should possess to best accomplish visionbased decision and control tasks. 1.1 Preamble This paper discusses the role visual perception plays in the “signaltosymbol barrier ” problem.