Results 1 -
3 of
3
Learning to Detect Roads in High-Resolution Aerial Images
"... Abstract. Reliably extracting information from aerial imagery is a difficult problem with many practical applications. One specific case of this problem is the task of automatically detecting roads. This task is a difficult vision problem because of occlusions, shadows, and a wide variety of non-roa ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Reliably extracting information from aerial imagery is a difficult problem with many practical applications. One specific case of this problem is the task of automatically detecting roads. This task is a difficult vision problem because of occlusions, shadows, and a wide variety of non-road objects. Despite 30 years of work on automatic road detection, no automatic or semi-automatic road detection system is currently on the market and no published method has been shown to work reliably on large datasets of urban imagery. We propose detecting roads using a neural network with millions of trainable weights which looks at a much larger context than was used in previous attempts at learning the task. The network is trained on massive amounts of data using a consumer GPU. We demonstrate that predictive performance can be substantially improved by initializing the feature detectors using recently developed unsupervised learning methods as well as by taking advantage of the local spatial coherence of the output labels. We show that our method works reliably on two challenging urban datasets that are an order of magnitude larger than what was used to evaluate previous approaches. 1
Convolutional Networks and Applications in Vision
"... Abstract — Intelligent tasks, such as visual perception, auditory perception, and language understanding require the construction of good internal representations of the world (or ”features”), which must be invariant to irrelevant variations of the input while, preserving relevant information. A maj ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract — Intelligent tasks, such as visual perception, auditory perception, and language understanding require the construction of good internal representations of the world (or ”features”), which must be invariant to irrelevant variations of the input while, preserving relevant information. A major question for Machine Learning is how to learn such good features automatically. Convolutional Networks (ConvNets) are a biologicallyinspired trainable architecture that can learn invariant features. Each stage in a ConvNets is composed of a filter bank, some non-linearities, and feature pooling layers. With multiple stages, a ConvNet can learn multi-level hierarchies of features. While ConvNets have been successfully deployed in many commercial applications from OCR to video surveillance, they require large amounts of labeled training samples. We describe new unsupervised learning algorithms, and new non-linear stages that allow ConvNets to be trained with very few labeled samples. Applications to visual object recognition and vision navigation for off-road mobile robots are described.
Large-scale FPGA-based . . .
- CHAPTER IN MACHINE LEARNING ON VERY LARGE DATA SETS
, 2011
"... Micro-robots, unmanned aerial vehicles (UAVs), imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene. Many successful object recognition sy ..."
Abstract
- Add to MetaCart
Micro-robots, unmanned aerial vehicles (UAVs), imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene. Many successful object recognition systems use dense features extracted on regularly-spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D gabor functions), a non-linear operation (quantization, winner-take-all, sparsification, normalization, and/or point-wise saturation) and finally a pooling operation (max, average or histogramming). For example, the scale-invariant feature transform (SIFT (Lowe, 2004)) operator applies oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form local orientation histogram. Some recognition systems use a single stage of feature extractors (Lazebnik

