Results 1 - 10
of
44
Creating efficient codebooks for visual recognition
- In Proceedings of the IEEE International Conference on Computer Vision
, 2005
"... Visual codebook based quantization of robust appearance descriptors extracted from local image patches is an effective means of capturing image statistics for texture analysis and scene classification. Codebooks are usually constructed by using a method such as k-means to cluster the descriptor vect ..."
Abstract
-
Cited by 111 (12 self)
- Add to MetaCart
Visual codebook based quantization of robust appearance descriptors extracted from local image patches is an effective means of capturing image statistics for texture analysis and scene classification. Codebooks are usually constructed by using a method such as k-means to cluster the descriptor vectors of patches sampled either densely (‘textons’) or sparsely (‘bags of features ’ based on keypoints or salience measures) from a set of training images. This works well for texture analysis in homogeneous images, but the images that arise in natural object recognition tasks have far less uniform statistics. We show that for dense sampling, k-means over-adapts to this, clustering centres almost exclusively around the densest few regions in descriptor space and thus failing to code other informative regions. This gives suboptimal codes that are no better than using randomly selected centres. We describe a scalable acceptance-radius based clusterer that generates better codebooks and study its performance on several image classification tasks. We also show that dense representations outperform equivalent keypoint based ones on these tasks and that SVM or Mutual Information based feature selection starting from a dense codebook further improves the performance. 1.
Nearest Neighbors In High-Dimensional Spaces
, 2004
"... In this chapter we consider the following problem: given a set P of points in a high-dimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer sci ..."
Abstract
-
Cited by 63 (2 self)
- Add to MetaCart
In this chapter we consider the following problem: given a set P of points in a high-dimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer science, including pattern recognition, searching in multimedial data, vector compression [GG91], computational statistics [DW82], and data mining. Many of these applications involve data sets which are very large (e.g., a database containing Web documents could contain over one billion documents). Moreover, the dimensionality of the points is usually large as well (e.g., in the order of a few hundred). Therefore, it is crucial to design algorithms which scale well with the database size as well as with the dimension. The nearest-neighbor problem is an example of a large class of proximity problems, which, roughly speaking, are problems whose definitions involve the notion of...
A sparse texture representation using local affine regions
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... This article introduces a texture representation suitable for recognizing images of textured surfaces under a wide range of transformations, including viewpoint changes and non-rigid deformations. At the feature extraction stage, a sparse set of affine Harris and Laplacian regions is found in the im ..."
Abstract
-
Cited by 60 (11 self)
- Add to MetaCart
This article introduces a texture representation suitable for recognizing images of textured surfaces under a wide range of transformations, including viewpoint changes and non-rigid deformations. At the feature extraction stage, a sparse set of affine Harris and Laplacian regions is found in the image. Each of these regions can be thought of as a texture element having a characteristic elliptic shape and a distinctive appearance pattern. This pattern is captured in an affine-invariant fashion via a process of shape normalization followed by the computation of two novel descriptors, the spin image and the RIFT descriptor. When affine invariance is not required, the original elliptical shape serves as an additional discriminative feature for texture recognition. The proposed approach is evaluated in retrieval and classi-fication tasks using the entire Brodatz database and a publicly available collection of 1000 photographs of textured surfaces taken from different viewpoints.
Fast Image Search for Learned Metrics
"... We introduce a method that enables scalable image search for learned metrics. Given pairwise similarity and dissimilarity constraints between some images, we learn a Mahalanobis distance function that captures the images’ underlying relationships well. To allow sub-linear time similarity search unde ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
We introduce a method that enables scalable image search for learned metrics. Given pairwise similarity and dissimilarity constraints between some images, we learn a Mahalanobis distance function that captures the images’ underlying relationships well. To allow sub-linear time similarity search under the learned metric, we show how to encode the learned metric parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality make it infeasible to learn an explicit weighting over the feature dimensions. We demonstrate the approach applied to a variety of image datasets. Our learned metrics improve accuracy relative to commonly-used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases.
Mean shift is a bound optimization
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... Abstract—We build on the current understanding of mean shift as an optimization procedure. We demonstrate that, in the case of piecewise constant kernels, mean shift is equivalent to Newton’s method. Further, we prove that, for all kernels, the mean shift procedure is a quadratic bound maximization. ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Abstract—We build on the current understanding of mean shift as an optimization procedure. We demonstrate that, in the case of piecewise constant kernels, mean shift is equivalent to Newton’s method. Further, we prove that, for all kernels, the mean shift procedure is a quadratic bound maximization. Index Terms—Mean shift, bound optimization, Newton’s method, adaptive gradient descent, mode seeking.
A statistical approach to material classification using image patch exemplars
, 2006
"... In this paper, we investigate material classification from single images obtained under unknown viewpoint and illumination. It is demonstrated that materials can be classified using the joint distribution of intensity values over extremely compact neighbourhoods (starting from as small as 3×3 pixels ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
In this paper, we investigate material classification from single images obtained under unknown viewpoint and illumination. It is demonstrated that materials can be classified using the joint distribution of intensity values over extremely compact neighbourhoods (starting from as small as 3×3 pixels square), and that this outperforms classification using filter banks with large support. It is also shown that the performance of filter banks is inferior to that of image patches with equivalent neighbourhoods. We develop novel texton based representations which are suited to modelling this joint neighbour-hood distribution for MRFs. The representations are learnt from training images, and then used to classify novel images (with unknown viewpoint and lighting) into texture classes. Three such representations are proposed, and their performance is assessed and compared to that of filter banks. The power of the method is demonstrated by classifying 2806 images of all 61 materials present in the Columbia-Utrecht database. The classification performance surpasses that of recent state of the art filter bank based classifiers such as Leung and Malik (IJCV 01), Cula and Dana (IJCV 04), and Varma and Zisserman (IJCV 05). We also benchmark performance by classifying all the textures present in the Microsoft Textile database as well as the San Francisco outdoor dataset. We conclude with discussions on why features based on compact neighbourhoods can correctly discriminate between textures with large global structure and why the performance of filter banks is not superior to the source image patches from which they were derived.
Sketch2photo: internet image montage
- ACM SIGGRAPH Asia
, 2009
"... Figure 1: A simple freehand sketch is automatically converted into a photo-realistic picture by seamlessly composing multiple images discovered online. The input sketch plus overlaid text labels is shown in (a). A composed picture is shown in (b); (c) shows two further compositions. Discovered onlin ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Figure 1: A simple freehand sketch is automatically converted into a photo-realistic picture by seamlessly composing multiple images discovered online. The input sketch plus overlaid text labels is shown in (a). A composed picture is shown in (b); (c) shows two further compositions. Discovered online images used during composition are shown in (d). We present a system that composes a realistic picture from a simple freehand sketch annotated with text labels. The composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels; these are found by searching the Internet. Although online image search generates many inappropriate results, our system is able to automatically select suitable photographs to generate a high quality composition, using a filtering scheme to exclude undesirable images. We also provide a novel image blending algorithm to allow seamless image composition. Each blending result is given a numeric score, allowing us to find an optimal combination of discovered images. Experimental results show the method is very successful; we also evaluate our system using the results from two user studies. 1
Linear Model Hashing and Batch RANSAC for Rapid and Accurate Object Recognition
- IEEE International Conference on Computer Vision and Pattern Recognition
, 2004
"... This paper proposes a joint feature-based model indexing and geometric constraint based alignment pipeline for efficient and accurate recognition of 3D objects from a large model database. Traditional approaches either first prune the model database using indexing without geometric alignment or dire ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper proposes a joint feature-based model indexing and geometric constraint based alignment pipeline for efficient and accurate recognition of 3D objects from a large model database. Traditional approaches either first prune the model database using indexing without geometric alignment or directly perform recognition based alignment. The indexing based pruning methods without geometric constraints can miss the correct models under imperfections such as noise, clutter and obscurations. Alignment based verification methods have to linearly verify each model in the database and hence do not scale up.
Acceleration strategies for Gaussian mean-shift image segmentation
- In CVPR
, 2006
"... Gaussian mean-shift (GMS) is a clustering algorithm that has been shown to produce good image segmentations (where each pixel is represented as a feature vector with spatial and range components). GMS operates by defining a Gaussian kernel density estimate for the data and clustering together points ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Gaussian mean-shift (GMS) is a clustering algorithm that has been shown to produce good image segmentations (where each pixel is represented as a feature vector with spatial and range components). GMS operates by defining a Gaussian kernel density estimate for the data and clustering together points that converge to the same mode under a fixed-point iterative scheme. However, the algorithm is slow, since its complexity is O(kN 2), where N is the number of pixels and k the average number of iterations per pixel. We study four acceleration strategies for GMS based on the spatial structure of images and on the fact that GMS is an expectation-maximisation (EM) algorithm: spatial discretisation, spatial neighbourhood, sparse EM and EM–Newton algorithm. We show that the spatial discretisation strategy can accelerate GMS by one to two orders of magnitude while achieving essentially the same segmentation; and that the other strategies attain speedups of less than an order of magnitude. The mean-shift algorithm is a hill-climbing algorithm that operates as follows. Given a data set {xn} N n=1 ⊂ R D, define a kernel density estimate p(x) = 1
Fast similarity search for learned metrics
, 2007
"... We propose a method to efficiently index into a large database of examples according to a learned metric. Given a collection of examples, we learn a Mahalanobis distance using an information-theoretic metric learning technique that adapts prior knowledge about pairwise distances to incorporate simil ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We propose a method to efficiently index into a large database of examples according to a learned metric. Given a collection of examples, we learn a Mahalanobis distance using an information-theoretic metric learning technique that adapts prior knowledge about pairwise distances to incorporate similarity and dissimilarity constraints. To enable sub-linear time similarity search under the learned metric, we show how to encode a learned Mahalanobis parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for sparse input vector spaces whose high dimensionality make it infeasible to learn an explicit weighting over the feature dimensions. We demonstrate the approach applied to systems and image datasets, and show that our learned metrics improve accuracy relative to commonly-used metric baselines, while our hashing construction permits efficient indexing with a learned distance and very large databases. 1

