Results 1  10
of
43
Statistical pattern recognition: A review
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract

Cited by 1035 (30 self)
 Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the wellknown methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Learnability and the VapnikChervonenkis dimension
, 1989
"... Valiant’s learnability model is extended to learning classes of concepts defined by regions in Euclidean space E”. The methods in this paper lead to a unified treatment of some of Valiant’s results, along with previous results on distributionfree convergence of certain pattern recognition algorith ..."
Abstract

Cited by 727 (22 self)
 Add to MetaCart
Valiant’s learnability model is extended to learning classes of concepts defined by regions in Euclidean space E”. The methods in this paper lead to a unified treatment of some of Valiant’s results, along with previous results on distributionfree convergence of certain pattern recognition algorithms. It is shown that the essential condition for distributionfree learnability is finiteness of the VapnikChervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufftcient conditions are provided for feasible learnability.
Operations for Learning with Graphical Models
 Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract

Cited by 276 (13 self)
 Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feedforward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 96 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
Hierarchical Discriminant Analysis for Image Retrieval
 IEEE Trans. PAMI
, 1999
"... Abstract—A selforganizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The SelfOrganizing Hierarchical Optimal Subspace Learning and Inference Framework (SHOSLIF) system uses the theories of optimal linear projection for automati ..."
Abstract

Cited by 59 (4 self)
 Add to MetaCart
Abstract—A selforganizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The SelfOrganizing Hierarchical Optimal Subspace Learning and Inference Framework (SHOSLIF) system uses the theories of optimal linear projection for automatic optimal feature derivation and a hierarchical structure to achieve a logarithmic retrieval complexity. A SpaceTessellation Tree is automatically generated using the Most Expressive Features (MEFs) and the Most Discriminating Features (MDFs) at each level of the tree. The major characteristics of the proposed hierarchical discriminant analysis include: 1) avoiding the limitation of global linear features (hyperplanes as separators) by deriving a recursively betterfitted set of features for each of the recursively subdivided sets of training samples; 2) generating a smaller tree whose cell boundaries separate the samples along the class boundaries better than the principal component analysis, thereby giving a better generalization capability (i.e., better recognition rate in a disjoint test); 3) accelerating the retrieval using a tree structure for data pruning, utilizing a different set of discriminant features at each level of the tree. We allow for perturbations in the size and position of objects in the images through learning. We demonstrate the technique on a large image database of widely varying realworld objects taken in natural settings, and show the applicability of the approach for variability in position, size, and 3D orientation. This paper concentrates on the hierarchical partitioning of the feature spaces. Index Terms—Principal component analysis, discriminant analysis, hierarchical image database, image retrieval, tessellation, partitioning, object recognition, face recognition, complexity with large image databases.
Consistency of datadriven histogram methods for density estimation and classification
 Annals of Statistics
, 1996
"... We present general sufficient conditions for the almost sure L1consistency of histogram density estimates based on datadependent partitions. Analogous conditions guarantee the almostsure risk consistency of histogram classification schemes based on datadependent partitions. Multivariate data i ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
We present general sufficient conditions for the almost sure L1consistency of histogram density estimates based on datadependent partitions. Analogous conditions guarantee the almostsure risk consistency of histogram classification schemes based on datadependent partitions. Multivariate data is considered throughout. In each case, the desired consistency requires shrinking cells, subexponential growth of a combinatorial complexity measure, and sublinear growth of the number of cells. It is not required that the cells of every partition be rectangles with sides paralles to the coordinate axis, or that each cell contain a minimum number of points. No assumptions are made concerning the common distribution of the training vectors. We apply the results to establish the consistency of several known partitioning estimates, including the knspacing density estimate, classifiers based on statistically equivalent blocks, and classifiers based on multivariate clustering schemes.
Rates of convergence in the source coding theorem, in empirical quantizer design and in universal lossy source coding
 IEEE Transactions on Information Theory
, 1994
"... AbstructRate of convergence results are established for vector quantization. Convergence rates are given for an increasing vector dimension and/or an increasing training set size. In particular, the following results are shown for memoryless realvalued sources with bounded support at transmission r ..."
Abstract

Cited by 53 (7 self)
 Add to MetaCart
(Show Context)
AbstructRate of convergence results are established for vector quantization. Convergence rates are given for an increasing vector dimension and/or an increasing training set size. In particular, the following results are shown for memoryless realvalued sources with bounded support at transmission rate R: (1) If a vector quantizer with fixed dimension k is designed to minimize the empirical meansquare error (MSE) with respect to m training vectors, then its MSE for the true source converges in expectation and almost surely to the minimum possible MSE as O(\llogm/nl; (2) The MSE of an optimal kdimensional vector quantizer for the true source converges, as the dimension grows, to the distortionrate function D(R) as O(J/~); (3) There exists a fixedrate universal lossy source coding scheme whose perletter MSE on n realvalued source samples converges in expectation and almost surely to the
Combinations of Weak Classifiers
, 1997
"... To obtain classification systems with both good generalizatìon performance and efficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers are linear classifiers (perceptrons) which can do a little better than making random guesses. A ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
To obtain classification systems with both good generalizatìon performance and efficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers are linear classifiers (perceptrons) which can do a little better than making random guesses. A randomized algorithm is proposed to find the weak classifiers. They are then combined through a majority vote. As demonstrated through systematic experiments, the method developed is able to obtain combinations of weak classifiers with good generalization performance and a fast training time on a variety of test problems and real applications. Theoretical analysis on one of the test problems investigated in our experiments provides insights on when and why the proposed method works. In particular, when the strength of weak classifiers is properly chosen, combinations of weak classifiers can achieve a good generalization performance with polynomial space and timecomplexity.
Binary Partitioning, Perceptual Grouping, and Restoration with Semidefinite Programming
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... We introduce a novel optimization method based on semidefinite programming relaxations to the field of computer vision and apply it to the combinatorial problem of minimizing quadratic functionals in binary decision variables subject to linear constraints. ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
(Show Context)
We introduce a novel optimization method based on semidefinite programming relaxations to the field of computer vision and apply it to the combinatorial problem of minimizing quadratic functionals in binary decision variables subject to linear constraints.
Adaptive model selection using empirical complexities
 Annals of Statistics
, 1999
"... Key words and phrases. Complexity regularization, classi cation, pattern recognition, regression estimation, curve tting, minimum description length. 1 Given n independent replicates of a jointly distributed pair (X; Y) 2R d R, we wish to select from a xed sequence of model classes F1; F2;:::a deter ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
Key words and phrases. Complexity regularization, classi cation, pattern recognition, regression estimation, curve tting, minimum description length. 1 Given n independent replicates of a jointly distributed pair (X; Y) 2R d R, we wish to select from a xed sequence of model classes F1; F2;:::a deterministic prediction rule f: R d! R whose risk is small. We investigate the possibility of empirically assessing the complexity of each model class, that is, the actual di culty of the estimation problem within each class. The estimated complexities are in turn used to de ne an adaptive model selection procedure, which is based on complexity penalized empirical risk. The available data are divided into two parts. The rst is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error.