• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Spatial pyramid pooling in deep convolutional networks for visual recognition. (2014)

by K He, X Zhang, S Ren, J Sun
Venue:In Proc. ECCV,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 52
Next 10 →

Very deep convolutional networks for large-scale image recognition

by Karen Simonyan, Andrew Zisserman , 2014
"... ar ..."
Abstract - Cited by 154 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...G (2 nets) 24.0 7.1 7.0 VGG (1 net) 24.8 7.5 7.3 VGG (ILSVRC submission, 7 nets) 24.7 7.5 7.3 VGG (ILSVRC submission, 1 net) 24.9 8.0 - GoogLeNet [23] (1 net) - 7.9 GoogLeNet [23] (7 nets) - 6.7 MSRA =-=[9]-=- (11 nets) - - 8.1 MSRA [9] (1 net) 27.9 9.1 9.1 Clarifai [18] (multiple nets) - - 11.7 Clarifai [18] (1 net) - - 12.5 Zeiler & Fergus [25] (6 nets) 36.0 14.7 14.8 Zeiler & Fergus [25] (1 net) 37.5 16...

Return of the Devil in the Details: Delving Deep into Convolutional Nets

by Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman , 2014
"... The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in chal-lenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compar ..."
Abstract - Cited by 71 (8 self) - Add to MetaCart
The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in chal-lenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compare with each other and with previous state-of-the-art shallow representations such as the Bag-of-Visual-Words and the Improved Fisher Vector. This paper conducts a rigorous evaluation of these new techniques, exploring different deep architectures and comparing them on a common ground, identifying and disclosing important implementation details. We identify several useful properties of CNN-based representations, including the fact that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance. We also identify aspects of deep and shallow methods that can be successfully shared. In particular, we show that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost. Source code and models to reproduce the experiments in the paper is made publicly available.

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun , 2015
"... Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU ..."
Abstract - Cited by 40 (0 self) - Add to MetaCart
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94 % top-5 test error on the ImageNet 2012 classification dataset. This is a 26 % relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66 % [33]). To our knowledge, our result is the first1 to surpass the reported human-level performance (5.1%, [26]) on this dataset.
(Show Context)

Citation Context

...mplexity (e.g., increased depth [29, 33], enlarged width [37, 28], and the use of smaller strides [37, 28, 2, 29]), new nonlinear activations [24, 23, 38, 22, 31, 10], and sophisticated layer designs =-=[33, 12]-=-. On the other hand, better generalization is achieved by effective regularization 1reported in Feb. 2015. techniques [13, 30, 10, 36], aggressive data augmentation [18, 14, 29, 33], and large-scale d...

Fully convolutional networks for semantic segmentation

by Jonathan Long, Evan Shelhamer, Trevor Darrell , 2014
"... Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolu-tional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmen-tation. Our key insight is to build “fully convolutional” networks that take ..."
Abstract - Cited by 37 (0 self) - Add to MetaCart
Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolu-tional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmen-tation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolu-tional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [17], the VGG net [28], and GoogLeNet [29]) into fully convolu-tional networks and transfer their learned representations by fine-tuning [2] to the segmentation task. We then de-fine a novel architecture that combines semantic informa-tion from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20 % rela-tive improvement to 62.2 % mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image. 1.

Hypercolumns for object segmentation and fine-grained localization. arXiv:1411.5752

by Bharath Hariharan, Ross Girshick, Jitendra Malik , 2014
"... Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as a fea-ture representation. However, the information in this layer may be too coarse spatially to allow precise localization. On the contrary, earlier layers may be precise in localiza-tion but ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as a fea-ture representation. However, the information in this layer may be too coarse spatially to allow precise localization. On the contrary, earlier layers may be precise in localiza-tion but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hy-percolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation [22], where we improve state-of-the-art from 49.7 mean APr [22] to 60.0, keypoint localization, where we get a 3.3 point boost over [20], and part labeling, where we show a 6.6 point gain over a strong baseline. 1.
(Show Context)

Citation Context

...cost. Extracting features from region foregrounds is expensive and doubles the time taken. Further, while CNN-based bounding box detection [18] can be speeded up dramatically using approaches such as =-=[23]-=-, no such speedups exist for region classification. To address these drawbacks, we propose as our second system the pipeline shown in Figure 3. This pipeline starts with bounding box detections after ...

Deformable part models are convolutional neural networks

by Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik - CoRR
"... ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...of CNNs, this method dates back to (at least) early work on face detection in [40], and has been used again in contemporary works, including OverFeat [35], DetectorNet [4], DenseNet [24], and SPP-net =-=[23]-=-, a recently proposed method for speeding up R-CNNs. We follow this approach and use as our front-end CNN a network that maps an image pyramid to a feature pyramid. To do this, we use a standard singl...

Pcanet: A simple deep learning baseline for image classification?” arXiv preprint arXiv:1404.3606

by Tsung-han Chan, Kui Jia, Shenghua Gao, Jiwen Lu, Senior Member, Zinan Zeng, Yi Ma , 2014
"... Abstract — In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal com-ponent analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Abstract — In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal com-ponent analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets
(Show Context)

Citation Context

...y seen in object databases. Another advantage of equipping SPP with the PCANet is that it can generate a fixed-length representation regardless of image size/scale. This strategy has been explored in =-=[55]-=- and [56], where the SPP-net clearly increases the accuracy of various no-SPP counterparts. We also observe that the PCANet with SPP essentially improves the accuracy of object recognition on CIFAR10!...

Faster R-CNN: Towards real-time object detection with region proposal networks.

by Shaoqing Ren , Kaiming He , Ross Girshick , Jian Sun - In NIPS, , 2015
"... Abstract State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Abstract State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet
(Show Context)

Citation Context

...5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. Code is available at https://github.com/ShaoqingRen/faster_rcnn. 1 Introduction Recent advances in object detection are driven by the success of region proposal methods (e.g., [22]) and region-based convolutional neural networks (R-CNNs) [6]. Although region-based CNNs were computationally expensive as originally developed in [6], their cost has been drastically reduced thanks to sharing convolutions across proposals [7, 5]. The latest incarnation, Fast R-CNN [5], achieves near real-time rates using very deep networks [19], when ignoring the time spent on region proposals. Now, proposals are the computational bottleneck in state-of-the-art detection systems. Region proposal methods typically rely on inexpensive features and economical inference schemes. Selective Search (SS) [22], one of the most popular methods, greedily merges superpixels based on engineered low-level features. Yet when compared to efficient detection networks [5], Selective Search is an order of magnitude slower, at 2s per image in a CPU impl...

Discriminative unsupervised feature learning with convolutional neural networks

by Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox - arXiv:1406.6909
"... Current methods for training convolutional neural networks depend on large amounts of labeled samples for supervised training. In this paper we present an approach for training a convolutional neural network using only unlabeled data. We train the network to discriminate between a set of surrogate c ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Current methods for training convolutional neural networks depend on large amounts of labeled samples for supervised training. In this paper we present an approach for training a convolutional neural network using only unlabeled data. We train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled ’seed ’ image patch. We find that this simple feature learning algorithm is surprisingly successful when applied to visual object recognition. The feature representation learned by our algorithm achieves classification results matching or outperforming the current state-of-the-art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101). 1
(Show Context)

Citation Context

...emplar-CNN (64c5-64c5-128f) 67.1± 0.3 69.7± 0.3 75.7 79.8± 0.5† 256 Exemplar-CNN (64c5-128c5-256c5-512f) 72.8± 0.4 75.3± 0.2 82.0 85.5± 0.4‡ 960 Supervised state of the art 70.1[31] — 91.2 [32] 91.44 =-=[33]-=- — produced by each method before final pooling. The small network was trained on 8000 surrogate classes containing 150 samples each and the large one on 16000 classes with 100 samples each. The featu...

Deformable Part Models with CNN Features

by Pierre-andre ́ Savalle, Stavros Tsogkas, George Pap
"... Abstract. In this work we report on progress in integrating deep convo-lutional features with Deformable Part Models (DPMs). We substitute the Histogram-of-Gradient features of DPMs with Convolutional Neu-ral Network (CNN) features, obtained from the top-most, fifth, convolu-tional layer of Krizhevs ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Abstract. In this work we report on progress in integrating deep convo-lutional features with Deformable Part Models (DPMs). We substitute the Histogram-of-Gradient features of DPMs with Convolutional Neu-ral Network (CNN) features, obtained from the top-most, fifth, convolu-tional layer of Krizhevsky’s network [8]. We demonstrate that we thereby obtain a substantial boost in performance (+14.5 mAP) when compared to the baseline HOG-based models. This only partially bridges the gap between DPMs and the currently top-performing R-CNN method of [4], suggesting that more radical changes to DPMs may be needed. 1
(Show Context)

Citation Context

...ds a state-of-the-art mean average precision (mAP) of 58.5% on VOC2007, and of 31.4% on ILSVRC2013. A substantial acceleration and a (moderate) further improvement in performance has been achieved in =-=[6]-=- by combining R-CNNs with spatial pyramid pooling. Combining CNN features with DPMs: The region proposal strategy of R-CNN only partially captures the complexity of visual objects; in particular, for ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University