Results 1 - 10
of
19
Sharing Visual Features for Multiclass And Multiview Object Detection
, 2004
"... We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each clas ..."
Abstract
-
Cited by 122 (4 self)
- Add to MetaCart
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects.
Image Parsing: Unifying Segmentation, Detection, and Recognition
, 2005
"... In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The ..."
Abstract
-
Cited by 113 (12 self)
- Add to MetaCart
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of reversible Markov chain jumps. This computational framework integrates two popular inference approaches -- generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters.
Context and hierarchy in a probabilistic image model
- in CVPR
, 2006
"... It is widely conjectured that the excellent ROC performance of biological vision systems is due in large part to the exploitation of context at each of many levels in a part/whole hierarchy. We propose a mathematical framework (a “composition machine”) for constructing probabilistic hierarchical ima ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
It is widely conjectured that the excellent ROC performance of biological vision systems is due in large part to the exploitation of context at each of many levels in a part/whole hierarchy. We propose a mathematical framework (a “composition machine”) for constructing probabilistic hierarchical image models, designed to accommodate arbitrary contextual relationships, and we build a demonstration system for reading Massachusetts license plates in an image set collected at Logan Airport. The demonstration system detects and correctly reads more than 98 % of the plates, with a negligible rate of false detection. Unlike a formal grammar, the architecture of a composition machine does not exclude the sharing of sub-parts among multiple entities, and does not limit interpretations to single trees (e.g. a scene can have multiple license plates, or no plates at all). In this sense, the architecture is more like a general Bayesian network than a formal grammar. On the other hand, unlike a Bayesian network, the distribution is non-Markovian, and therefore more like a probabilistic context-sensitive grammar. The conceptualization and construction of a composition machine is facilitated by its formulation as the result of a series of non-Markovian perturbations of a “Markov backbone. ” 1 1.
Minimax bounds for active learning
- In COLT
, 2007
"... Abstract. This paper aims to shed light on achievable limits in active learning. Using minimax analysis techniques, we study the achievable rates of classification error convergence for broad classes of distributions characterized by decision boundary regularity and noise conditions. The results cle ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Abstract. This paper aims to shed light on achievable limits in active learning. Using minimax analysis techniques, we study the achievable rates of classification error convergence for broad classes of distributions characterized by decision boundary regularity and noise conditions. The results clearly indicate the conditions under which one can expect significant gains through active learning. Furthermore we show that the learning rates derived are tight for “boundary fragment ” classes in ddimensional feature spaces when the feature marginal density is bounded from above and below. 1
A Coarse-to-Fine Strategy for Multi-Class Shape Detection
, 2004
"... Multi-class shape detection, in the sense of recognizing and localizing instances from multiple shape classes, is formulated as a two-step process in which local indexing primes global interpretation. During indexing a list of instantiations (shape identities and poses) is compiled constrained only ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
Multi-class shape detection, in the sense of recognizing and localizing instances from multiple shape classes, is formulated as a two-step process in which local indexing primes global interpretation. During indexing a list of instantiations (shape identities and poses) is compiled constrained only by no missed detections at the expense of false positives. Global information, such as expected relationships among poses, is incorporated afterward to remove ambiguities. This division is motivated by computational efficiency. In addition, indexing itself is organized as a coarse-to-fine search simultaneously in class and pose. This search can be interpreted as successive approximations to likelihood ratio tests arising from a simple (“naive Bayes”) statistical model for the edge maps extracted from the original images. The key to constructing efficient “hypothesis tests” for multiple classes and poses is local OR’ing; in particular, spread edges provide imprecise but common and locally invariant features. Natural tradeoffs then emerge between discrimination and the pattern of spreading. These are analyzed mathematically within the model-based framework and the whole procedure is illustrated by experiments in reading license plates.
Faster rates in regression via active learning
- in Proceedings of NIPS
, 2005
"... In this paper we address the theoretical capabilities of active sampling for estimating functions in noise. Specifically, the problem we consider is that of estimating a function from noisy point-wise samples, that is, the measurements which are collected at various points over the domain of the fun ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
In this paper we address the theoretical capabilities of active sampling for estimating functions in noise. Specifically, the problem we consider is that of estimating a function from noisy point-wise samples, that is, the measurements which are collected at various points over the domain of the function. In the classical (passive) setting the sampling locations are chosen a priori, meaning that the choice of the sample locations precedes the gathering of the function observations. In the active sampling setting, on the other hand, the sample locations are chosen in an online fashion: the decision of where to sample next depends on all the observations made up to that point, in the spirit of the twenty questions game (as opposed to passive sampling where all the questions need to be asked before any answers are given). This extra degree of flexibility leads to improved signal reconstruction in comparison to the performance of classical (passive) methods. We present results characterizing the fundamental limits of active learning for various nonparametric function classes, as well as practical algorithms capable of exploiting the extra flexibility of the active setting and provably improving on classical techniques. In particular, significantly faster rates of convergence are achievable in cases involving functions whose complexity (in a the Kolmogorov sense) is highly concentrated in small regions of space (e.g., piecewise constant functions). Our active learning theory and methods show promise in a number of applications, including field estimation using wireless sensor networks and fault line detection. 1
Face detection - efficient and rank deficient
- Advances in Neural Information Processing Systems 17
, 2005
"... This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning large images, this decreases the computational complexity by a significant amount. Experimental results show that in face detection, rank deficient approximations are 4 to 6 times faster than unconstrained reduced set systems. 1
Coarse-to-fine manifold learning
- in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing — ICASSP ’04
, 2004
"... In this paper we consider a sequential, coarse-to-fine estimation of a piecewise constant function with smooth boundaries. Accurate detection and localization of the boundary (a manifold) is the key aspect of this problem. In general, algorithms capable of achieving optimal performance require exhau ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
In this paper we consider a sequential, coarse-to-fine estimation of a piecewise constant function with smooth boundaries. Accurate detection and localization of the boundary (a manifold) is the key aspect of this problem. In general, algorithms capable of achieving optimal performance require exhaustive searches over large dictionaries that grow exponentially with the dimension of the observation domain. The computational burden of the search hinders the use of such techniques in practice, and motivates our work. We consider a sequential, coarse-to-fine approach that involves first examining the data on a coarse grid, and then refining the analysis and approximation in regions of interest. Our estimators involve an almost linear-time (in two dimensions) sequential search over the dictionary, and converge at the same near-optimal rate as estimators based on exhaustive searches. Specifically, for two dimensions, our algorithm requires O(n 7/6) operations for an n-pixel image, much less than the traditional wedgelet approaches, which require O(n 11/6) operations. 1
A design principle for coarse-to-fine classification
- In Proceedings of CVPR
, 2006
"... Coarse-to-fine classification is an efficient way of organizing object recognition in order to accommodate a large number of possible hypotheses and to systematically exploit shared attributes and the hierarchical nature of the visual world. The basic structure is a nested representation of the spac ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Coarse-to-fine classification is an efficient way of organizing object recognition in order to accommodate a large number of possible hypotheses and to systematically exploit shared attributes and the hierarchical nature of the visual world. The basic structure is a nested representation of the space of hypotheses and a corresponding hierarchy of (binary) classifiers. In existing work, the representation is manually crafted. Here we introduce a design principle for recursively learning the representation and the classifiers together. This also unifies previous work on cascades and tree-structured search. The criterion for deciding when a group of hypotheses should be “retested ” (a cascade) versus partitioned into smaller groups (“divide-and-conquer”) is motivated by recent theoretical work on optimal search strategies. The key concept is the cost-to-power ratio of a classifier. The learned hierarchy consists of both linear cascades and branching segments and outperforms manual ones in experiments on face detection. 1.
A Hierarchy of Support Vector Machines for Pattern Detection
- Journal of Artificial Intelligence Research
, 2006
"... We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse- ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse-to-fine and each chain of positive responses from the root to a leaf constitutes a detection.

