Results 1 -
8 of
8
From information scaling of natural images to regimes of statistical models
- Quarterly of Applied Math
, 2008
"... 1 Computer vision can be considered a highly specialized data collection and data analysis problem. We need to understand the special properties of image data in order to construct statistical models for representing the wide variety of image patterns. One special property of vision that distinguish ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
1 Computer vision can be considered a highly specialized data collection and data analysis problem. We need to understand the special properties of image data in order to construct statistical models for representing the wide variety of image patterns. One special property of vision that distinguishes itself from other sensory data such as speech data is that distance or scale plays a profound role in image data. More specifically, visual objects and patterns can appear at a wide range of distances or scales, and the same visual pattern appearing at different distances or scales produces different image data with different statistical properties, thus entails different regimes of statistical models. In particular, we show that the entropy rate of the image data changes over the viewing distance (as well as the camera resolution). Moreover, the inferential uncertainty changes with viewing distance too. We call these changes information scaling. From this perspective, we examine both empirically and theoretically two prominent and yet largely isolated research themes in image modeling literature, namely, wavelet sparse coding and Markov random fields. Our results indicate that the two models are appropriate on two different entropy regimes: sparse coding targets the
I2T: Image Parsing to Text Description
"... In this paper, we present an image parsing to text generation (I2T) framework that generates natural language descriptions from image and video content. This framework converts the harder content based image and video retrieval problem into an easier text search problem with potential applications ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we present an image parsing to text generation (I2T) framework that generates natural language descriptions from image and video content. This framework converts the harder content based image and video retrieval problem into an easier text search problem with potential applications in Internet search and visual data mining. The proposed I2T framework follows three steps. 1) Input images or video frames are decomposed into their constituent visual patterns through an image parsing engine, which outputs a scene as a parse graph representation, in a spirit similar to parsing sentences in speech and natural language. 2) The parse graphs are converted into semantic representation using the Web Ontology Language (OWL) format, which is a formal and unambiguous knowledge representation. 3) A text generation engine converts the semantic representation into a semantically meaningful, human readable and query-able text report. Success of the above framework relies on two knowledge bases. The first one is a visual knowledge base that provides top-down hypotheses for image parsing and serves as an image ontology for translating parse graphs into semantic representations. The core of the visual knowledge base is an And-Or graph representation. It entails vocabularies of visual elements including pixels, primitives, parts, objects and scenes and a stochastic image grammar specifying compositional, spatial, temporal and functional relations between visual elements. We developed a large-scale ground-truth image database and an interactive image annotation software to build the And-Or graph from real-world image instances. The second knowledge base is a general knowledge base that interconnects several domain specific ontologies in the form of the Semantic Web. This knowledge base further enriches the semantic representation of visual content with domain specific information. Finally, we demonstrate a case study in video surveillance, an end-to-end system that automatically infers video events and generates natural language descriptions of video scenes. Experiments with maritime and urban scenes indicate the feasibility of the proposed approach.
Sisley The Abstract Painter
, 2010
"... We present an interactive abstract painting system named Sisley. Sisley works upon the psychological principle [Berlyne 1971] that abstract arts are often characterized by their greater perceptual ambiguities than photographs, which tend to invoke moderate mental efforts of the audience for interp ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present an interactive abstract painting system named Sisley. Sisley works upon the psychological principle [Berlyne 1971] that abstract arts are often characterized by their greater perceptual ambiguities than photographs, which tend to invoke moderate mental efforts of the audience for interpretation, accompanied with subtle aesthetic pleasures. Given an input photograph, Sisley decomposes it into a hierarchy/tree of its constituent image components (e.g., regions, objects of different categories) with interactive guidance painting images, with increased ambiguities of both the scene and individual objects at desired levels. Sisley consists of three major working parts: (1) an interactive image parser executing the tasks of segmentation, labeling, and hierarchical organization, (2) a painterly rendering engine with abstract operators for transferring the image appearance, and (3) a numerical ambiguity computation and control module of servomechanism. With the help of Sisley, even an amateur user can create abstract paintings from photographs easily in minutes. We have evaluated the rendering results of Sisley using human experiments, and verified that they have similar abstract effects to original abstract paintings by artists.
Mapping Natural Image Patches by Explicit and Implicit Manifolds
"... Image patches are fundamental elements for object modeling and recognition. However, there has not been a panoramic study of the structures of the whole ensemble of natural image patches in the literature. In this article, we study the structures of this ensemble by mapping natural image patches int ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Image patches are fundamental elements for object modeling and recognition. However, there has not been a panoramic study of the structures of the whole ensemble of natural image patches in the literature. In this article, we study the structures of this ensemble by mapping natural image patches into two types of subspaces which we call “explicit manifolds” and “implicit manifolds” respectively. On explicit manifolds, one finds those simple and regular image primitives, such as edges, bars, corners and junctions. On implicit manifolds, one finds those complex and stochastic image patches, such as textures and clutters. On different types of manifolds, different perceptual metrics are used. We propose a method for learning a probabilistic distribution on the space of patches by pursuing both types of manifolds using a common information theoretical criterion. The connection between the two types of manifolds is realized by image scaling, which changes the entropy of the image patches. The explicit manifolds live in low entropy regimes while the implicit manifolds live in high entropy regimes. We study the transition between the two types of manifolds over scale and show that the complexity of the manifolds peaks in a middle entropy regime.
Learning Hybrid Image Templates (HIT) by Information Projection
- FOR REVIEW: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... This paper presents a novel framework for learning a generative image representation – the hybrid image template (HIT) from a small number (i.e, 3 ∼ 20) of image examples. Each learned template is composed of, typically, 50 ∼ 500 image patches whose geometric attributes (location, scale, orientatio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents a novel framework for learning a generative image representation – the hybrid image template (HIT) from a small number (i.e, 3 ∼ 20) of image examples. Each learned template is composed of, typically, 50 ∼ 500 image patches whose geometric attributes (location, scale, orientation) may adapt in a local neighborhood for deformation, and whose appearances are characterized respectively by four types of descriptors: local sketch (edge or bar), texture gradients with orientations, flatness regions, and colors. These heterogeneous patches are automatically ranked and selected from a large pool according to their information gains using an information projection framework. Intuitively, a patch has a higher information gain if (i) its feature statistics is consistent within the training examples and is distinctive from the statistics of negative examples (i.e. generic images or examples from other categories); and (ii) its feature statistics has less intra-class variations. The learning process pursues the most informative (for either generative or discriminative purpose) patches one at a time and stops when the information gain is within statistical fluctuation. The template is associated with a well-normalized probability model that integrates the heterogeneous feature statistics. This automated feature selection procedure allows our algorithm to scale up to a wide range of image categories, from those with regular shapes to those with stochastic texture. The learned representation captures the intrinsic characteristics of the object or scene categories. We evaluate the hybrid image templates on several public benchmarks, and demonstrate classification performances on par with state-of-art methods like HoG+SVM, and when small training sample sizes are used the proposed system shows a clear advantage.
Learning Explicit and Implicit Visual Manifolds by Information Projection
"... Natural images have a vast amount of visual patterns distributed in a wide spectrum of subspaces of varying complexities and dimensions. Understanding the characteristics of these subspaces and their compositional structures is of fundamental importance for pattern modeling, learning and recognition ..."
Abstract
- Add to MetaCart
Natural images have a vast amount of visual patterns distributed in a wide spectrum of subspaces of varying complexities and dimensions. Understanding the characteristics of these subspaces and their compositional structures is of fundamental importance for pattern modeling, learning and recognition. In this paper, we start with small image patches and define two types of atomic subspaces: explicit manifolds of low dimensions for structural primitives and implicit manifolds of high dimensions for stochastic textures. Then we present an information theoretical learning framework that derives common models for these manifolds through information projection, and study a manifold pursuit algorithm that clusters image patches into those atomic subspaces and ranks them according to their information gains. We further show how those atomic subspaces change over an image scaling process and how they are composed to form larger and more complex image patterns. Finally, we integrate the implicit and explicit manifolds to form a primal sketch model as a generic representation in early vision and to generate a hybrid image template representation for object category recognition in high level vision. The study of the mathematical structures in the image space sheds lights on some basic questions in human vision, such as atomic elements in visual perception, the perceptual metrics in various manifolds, and the perceptual transitions over image scales.
Mapping the Ensemble of Natural Image Patches by Explicit and Implicit Manifolds
"... Image patches are fundamental elements for object modeling and recognition. However, there has not been a panoramic study in the literature on the structures of the whole ensemble of natural image patches. In this article, we study the mathematical structures of the ensemble of natural image patches ..."
Abstract
- Add to MetaCart
Image patches are fundamental elements for object modeling and recognition. However, there has not been a panoramic study in the literature on the structures of the whole ensemble of natural image patches. In this article, we study the mathematical structures of the ensemble of natural image patches and map image patches into two types of subspaces which we call “explicit manifolds” and “implicit manifolds” respectively. On explicit manifolds, one finds those simple and regular image primitives, such as edges, bars, corners and junctions. On implicit manifolds, one finds those complex and stochastic image patches, such as textures and clutters. On these manifolds, different perceptual metrics are used. Then we show a unified framework for learning a probabilistic model on the space of patches by pursuing both types of manifolds under a common information theoretical principle. The connection between the two types of manifolds are realized through image scaling which changes the entropy of the image patches. The explicit manifolds live in low entropy regimes while the implicit manifolds live in high entropy regimes. In experiments, we cluster the natural image patches and compare the two types of manifolds with a common information theoretical criterion. We also study the transition of the manifolds over scales and show that the complexity peak in a middle entropy regime where most objects and parts reside.
Deformable Template Combining Alignable and Non-alignable Sketches
"... This paper proposes a hybrid model for deformable template which combines alignable and non-alignable sketches. These sketches are subject to slight or considerable translations in different images. For slight translations, Wu et al [13] proposed active basis model to capture them, where each sketch ..."
Abstract
- Add to MetaCart
This paper proposes a hybrid model for deformable template which combines alignable and non-alignable sketches. These sketches are subject to slight or considerable translations in different images. For slight translations, Wu et al [13] proposed active basis model to capture them, where each sketch is allowed to shift in position and orientation. For larger translations of sketches, [13] assumed that they follow the same distribution as sketches of natural image ensembles, which need not be explicitly modeled. But in fact, for a specified object class, the unaligned sketches follow a totally different distribution from those of natural images. We summarize these sketches by their means in the foreground mask. We treat the mean value in each direction as independent features and fit their marginal distributions on object ensemble and natural image ensemble using Gaussian distribution. The marginal distributions are combined with Active Basis into a joint probability ratio to distinguish foreground object from natural background. Experiments are conducted on 14 object classes, most of which show considerable improvement in ROC. 1.

