Results 31 - 40
of
167
Saliency moments for image categorization
- In ICMR 2011, 1st ACM International Conference on Multimedia Retrieval, April 17-20, 2011
"... 2229 route des crêtes ..."
(Show Context)
Exploiting Textons Distributions on Spatial Hierarchy for Scene Classification
, 2010
"... This paper proposes a method to recognize scene categories using bags of visual words obtained by hierarchically partitioning into subregion the input images. Specifically, for each subregion the Textons distribution and the extension of the corresponding subregion are taken into account. The bags ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This paper proposes a method to recognize scene categories using bags of visual words obtained by hierarchically partitioning into subregion the input images. Specifically, for each subregion the Textons distribution and the extension of the corresponding subregion are taken into account. The bags of visual words computed on the subregions are weighted and used to represent the whole scene. The classification of scenes is carried out by discriminative methods (i.e., SVM, KNN). A similarity measure based on Bhattacharyya coefficient is proposed to establish similarities between images, represented as hierarchy of bags of visual words. Experimental tests, using fifteen different scene categories, show that the proposed approach achieves good performances with respect to the state-of-the-art methods.
From generic to specific deep representations for visual recognition
- CoRR
"... Evidence is mounting that CNNs are currently the most efficient and successful way to learn visual representations. This paper address the questions on why CNN representations are so effective and how to improve them if one wants to maximize performance for a single task or a range of tasks. We asse ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Evidence is mounting that CNNs are currently the most efficient and successful way to learn visual representations. This paper address the questions on why CNN representations are so effective and how to improve them if one wants to maximize performance for a single task or a range of tasks. We assess experimentally the importance of different aspects of learning and choosing a CNN representation to its performance on a diverse set of visual recognition tasks. In particular, we investigate how altering the parameters in a network’s architecture and its training impacts the representation’s ability to specialize and generalize. We also study the effect of fine-tuning a generic network towards a particular task. Extensive exper-iments indicate the trends; (a) increasing specialization increases performance on the target task but can hurt the ability to generalize to other tasks and (b) the less specialized the original network the more likely it is to benefit from fine-tuning. As by-products we have learnt several deep CNN image representations which when combined with a simple linear SVM classifier or similarity measure pro-duce the best performance on 12 standard datasets measuring the ability to solve visual recognition tasks ranging from image classification to image retrieval. 1
Place classification using visual object categorization and global information
- in Proc. Can. Conf. Comput. Robot Vision
, 2011
"... Abstract—Places in an environment are locations where activities occur, and can be described by the objects they contain. This paper discusses the completely automated integration of object detection and global image properties for place classification. We first determine object counts in various pl ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Places in an environment are locations where activities occur, and can be described by the objects they contain. This paper discusses the completely automated integration of object detection and global image properties for place classification. We first determine object counts in various place types based on LabelMe images, which contain annotations of places and segmented objects. We then train object detectors on some of the most frequently occurring objects. Finally, we use object detection scores as well as global image properties to perform place classification of images. We show that our object-centric method is superior and more generalizable when compared to using global properties in indoor scenes. In addition, we show enhanced performance by combining both methods. We also discuss areas for improvement and the application of this work to informed visual search. Finally, through this work we display the performance of a state-of-the-art technique trained using automatically-acquired labeled object instances (i.e., bounding boxes) to perform place classification of realistic indoor scenes. Keywords-place classification; object recognition; scene recognition I.
Taxonomic Multi-class Prediction and Person Layout using Efficient Structured Ranking
"... Abstract. In computer vision efficient multi-class classification is becoming a key problem as the field develops and the number of object classes to be identified increases. Often objects might have some sort of structure such as a taxonomy in which the mis-classification score for object classes c ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
Abstract. In computer vision efficient multi-class classification is becoming a key problem as the field develops and the number of object classes to be identified increases. Often objects might have some sort of structure such as a taxonomy in which the mis-classification score for object classes close by, using tree distance within the taxonomy, should be less than for those far apart. This is an example of multi-class classification in which the loss function has a special structure. Another example in vision is for the ubiquitous pictorial structure or parts based model. In this case we would like the mis-classification score to be proportional to the number of parts misclassified. It transpires both of these are examples of structured output ranking problems. However, so far no efficient large scale algorithm for this problem has been demonstrated. In this work we propose an algorithm for structured output ranking that can be trained in a time linear in the number of samples under a mild assumption common to many computer vision problems: that the loss function can be discretized into a small number of values. We show the feasibility of structured ranking on these two core computer vision problems and demonstrate a consistent and substantial improvement over competing techniques. Aside from this, we also achieve state-of-the art results for the PASCAL VOC human layout problem. 1
Nesvm: a fast gradient method for support vector machines
- In Data Mining (ICDM), 2010 IEEE 10th International Conference on
, 2010
"... Abstract—Support vector machines (SVMs) are invaluable tools for many practical applications in artificial intelligence, e.g., classification and event recognition. However, popular SVM solvers are not sufficiently efficient for applications with a great deal of samples as well as a large number of ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Support vector machines (SVMs) are invaluable tools for many practical applications in artificial intelligence, e.g., classification and event recognition. However, popular SVM solvers are not sufficiently efficient for applications with a great deal of samples as well as a large number of features. In this paper, thus, we present NESVM, a fast gradient SVM solver that can optimize various SVM models, e.g., classical SVM, linear programming SVM and least square SVM. Compared against SVM-Perf [1][2] (whose convergence rate in solving the dual SVM is upper bounded by O(1 / √ k) where k is the number of iterations) and Pegasos [3] (online SVM that converges at rate O(1/k) for the primal SVM), NESVM achieves the optimal convergence rate at O(1/k 2) and a linear time complexity. In particular, NESVM smoothes the nondifferentiable hinge loss and ℓ1-norm in the primal SVM. Then the optimal gradient method without any line search is adopted to solve the optimization. In each iteration round, the current gradient and historical gradients are combined to determine the descent direction, while the Lipschitz constant determines the step size. Only two matrix-vector multiplications are required in each iteration round. Therefore, NESVM is more efficient than existing SVM solvers. In addition, NESVM is available for both linear and nonlinear kernels. We also propose “homotopy NESVM ” to accelerate NESVM by dynamically decreasing the smooth parameter and using the continuation method. Our experiments on census income categorization, indoor/outdoor scene classification event recognition and scene recognition suggest the efficiency and the effectiveness of NESVM. The MATLAB code of NESVM will be available on our website for further assessment. Keywords-Support vector machines; smooth; hinge loss; ℓ1 norm; Nesterov’s method; continuation method. I.
Learning hybrid part filters for scene recognition
- In ECCV
, 2012
"... Abstract. This paper introduces a new image representation for scene recognition, where an image is described based on the response maps of object part filters. The part filters are learned from existing datasets with object location annotations, using deformable part-based models trained by latent ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract. This paper introduces a new image representation for scene recognition, where an image is described based on the response maps of object part filters. The part filters are learned from existing datasets with object location annotations, using deformable part-based models trained by latent SVM [1]. Since different objects may contain similar parts, we describe a method that uses a semantic hierarchy to automatically deter-mine and merge filters shared by multiple objects. The merged hybrid fil-ters are then applied to new images. Our proposed representation, called Hybrid-Parts, is generated by pooling the response maps of the hybrid filters. Contrast to previous scene recognition approaches that adopted object-level detections as feature inputs, we harness filter responses of ob-ject parts, which enable a richer and finer-grained representation. The use of the hybrid filters is important towards a more compact representation, compared to directly using all the original part filters. Through extensive experiments on several scene recognition benchmarks, we demonstrate that Hybrid-Parts outperforms recent state-of-the-arts, and combining it with standard low-level features such as the GIST descriptor can lead to further improvements. 1
Describing Images using Qualitative Models and Description Logics
"... Our approach describes any digital image qualitatively by detecting regions/objects inside it and describing their visual characteristics (shape and colour) and their spatial characteristics (orientation and topology) by means of qualitative models. The description obtained is translated into a desc ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Our approach describes any digital image qualitatively by detecting regions/objects inside it and describing their visual characteristics (shape and colour) and their spatial characteristics (orientation and topology) by means of qualitative models. The description obtained is translated into a description logic (DL) based ontology, which gives a formal and explicit meaning to the qualitative tags representing the visual features of the objects in the image and the spatial relations between them. For any image, our approach obtains a set of individuals that are classified using a DL reasoner according to the descriptions of our ontology.
Cross-caption coreference resolution for automatic image understanding
- In CoNLL
, 2010
"... Recent work in computer vision has aimed ..."
(Show Context)