Results 1 - 10
of
57
Content-based multimedia information retrieval: State of the art and challenges
- ACM Trans. Multimedia Comput. Commun. Appl
, 2006
"... Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100+ recent articles on content-based multimedia information retri ..."
Abstract
-
Cited by 311 (12 self)
- Add to MetaCart
Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques. Based on the current state of the art, we discuss the major challenges for the future.
Efficient feature selection via analysis of relevance and redundancy
- Journal of Machine Learning Research
, 2004
"... Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature ..."
Abstract
-
Cited by 209 (3 self)
- Add to MetaCart
Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.
Feature selection for unsupervised learning
- Journal of Machine Learning Research
, 2004
"... In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dime ..."
Abstract
-
Cited by 146 (4 self)
- Add to MetaCart
(Show Context)
In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dimension. We explore the feature selection problem and these issues through FSSEM (Feature Subset Selection using Expectation-Maximization (EM) clustering) and through two different performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. We present proofs on the dimensionality biases of these feature criteria, and present a cross-projection normalization scheme that can be applied to any criterion to ameliorate these biases. Our experiments show the need for feature selection, the need for addressing these two issues, and the effectiveness of our proposed solutions.
Simultaneous feature selection and clustering using mixture models
- IEEE TRANS. PATTERN ANAL. MACH. INTELL
, 2004
"... Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched u ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
Video behaviour profiling and abnormality detection without manual labelling
- In IEEE International Conference on Computer Vision
, 2005
"... A novel framework is developed for automatic behaviour profiling and abnormality sampling/detection without any manual labelling of the training dataset. Natural grouping of behaviour patterns is discovered through unsupervised model selection and feature selection on the eigenvectors of a normalise ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
(Show Context)
A novel framework is developed for automatic behaviour profiling and abnormality sampling/detection without any manual labelling of the training dataset. Natural grouping of behaviour patterns is discovered through unsupervised model selection and feature selection on the eigenvectors of a normalised affinity matrix. Our experiments demonstrate that a behaviour model trained using an unlabelled dataset is superior to those trained using the same but labelled dataset in detecting abnormality from an unseen video. 1.
S.: Video behavior profiling for anomaly detection
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2008
"... Abstract—This paper aims to address the problem of modeling video behavior captured in surveillance videos for the applications of online normal behavior recognition and anomaly detection. A novel framework is developed for automatic behavior profiling and online anomaly sampling/detection without a ..."
Abstract
-
Cited by 57 (10 self)
- Add to MetaCart
(Show Context)
Abstract—This paper aims to address the problem of modeling video behavior captured in surveillance videos for the applications of online normal behavior recognition and anomaly detection. A novel framework is developed for automatic behavior profiling and online anomaly sampling/detection without any manual labeling of the training data set. The framework consists of the following key components: 1) A compact and effective behavior representation method is developed based on discrete-scene event detection. The similarity between behavior patterns are measured based on modeling each pattern using a Dynamic Bayesian Network (DBN). 2) The natural grouping of behavior patterns is discovered through a novel spectral clustering algorithm with unsupervised model selection and feature selection on the eigenvectors of a normalized affinity matrix. 3) A composite generative behavior model is constructed that is capable of generalizing from a small training set to accommodate variations in unseen normal behavior patterns. 4) A runtime accumulative anomaly measure is introduced to detect abnormal behavior, whereas normal behavior patterns are recognized when sufficient visual evidence has become available based on an online Likelihood Ratio Test (LRT) method. This ensures robust and reliable anomaly detection and normal behavior recognition at the shortest possible time. The effectiveness and robustness of our approach is demonstrated through experiments using noisy and sparse data sets collected from both indoor and outdoor surveillance scenarios. In particular, it is shown that a behavior model trained using an unlabeled data set is superior to those trained using the same but labeled data set in detecting anomaly from an unseen video. The experiments also suggest that our online LRT-based behavior recognition approach is advantageous over the commonly used Maximum Likelihood (ML) method in differentiating ambiguities among different behavior classes observed online.
MODELING SEMANTIC ASPECTS FOR CROSS-MEDIA IMAGE INDEXING
, 2007
"... To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text cap ..."
Abstract
-
Cited by 45 (5 self)
- Add to MetaCart
To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indices for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a Probabilistic Latent Semantic Analysis model (PLSA) for annotated images, and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard EM algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods
Medical image categorization and retrieval for PACS using the GMM-KL framework. IEEE Trans Inf Technol Biomed 2007;11(2):190–202. Please cite this article in press as: Pourghassem H, Ghassemian H. Content-based medical image classification using a new hie
- Comput Med Imaging Graph (2008), doi:10.1016/j.compmedimag.2008.07.006 CMIG-866; No. of Pages 11 ARTICLE IN PRESS H. Pourghassem, H. Ghassemian / Computerized Medical Imaging and Graphics xxx (2008) xxx–xxx 11
"... Abstract—This work presents an image representation and matching framework for image categorization in medical image archives. Categorization enables to determine automatically, based on the image content, the examined body region and imaging modality. It is a basic step in content-based image retri ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
Abstract—This work presents an image representation and matching framework for image categorization in medical image archives. Categorization enables to determine automatically, based on the image content, the examined body region and imaging modality. It is a basic step in content-based image retrieval (CBIR) systems, the goal of which is to augment textbased search with visual information analysis. CBIR systems are currently being integrated with Picture-Archiving and Communication Systems (PACS) for increasing the overall search capabilities and tools available to radiologists. The proposed methodology is comprised of a continuous and probabilistic image representation scheme using Gaussian mixture modeling (GMM) along with information-theoretic image matching via the Kullback Leibler (KL) measure. The GMM-KL framework is used for matching and categorizing x-ray images by body regions. A multi-dimensional feature space is used to represent the image input, including intensity, texture and spatial information. Unsupervised clustering via the GMM is used to extract coherent regions in feature space which are then used in the matching process. A dominant characteristic of the radiological images is their poor contrast and large intensity variations. This presents a challenge to matching between images and is handled via an illumination invariant representation. The GMM-KL framework is evaluated for image categorization and image retrieval on a dataset of 1500 radiological images. A classification rate of 97.5% was achieved. The classification results compare favorably with reported global and local representation schemes. Precision vs. Recall curves indicate a strong retrieval result as compared with
Spectral clustering with eigenvector selection
- Pattern Recognition
, 2008
"... The task of discovering natural groupings of input patterns, or clustering, is an important aspect machine learning and pattern analysis. In this paper, we study the widely-used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data s ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
(Show Context)
The task of discovering natural groupings of input patterns, or clustering, is an important aspect machine learning and pattern analysis. In this paper, we study the widely-used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) How to automatically determine the number of clusters? and (2) How to perform effective clustering given noisy and sparse data? An analysis of the characteristics of eigenspace is carried out which shows that (a) Not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) Eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) The corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data.
Constraint Score: A new filter method for feature selection with pairwise constraints
- Pattern Recognition
, 2008
"... Feature selection is an important preprocessing step in mining high-dimensional data. Generally, supervised feature selection methods with supervision information are superior to unsupervised ones without supervision information. In the literature, nearly all existing supervised feature selection me ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Feature selection is an important preprocessing step in mining high-dimensional data. Generally, supervised feature selection methods with supervision information are superior to unsupervised ones without supervision information. In the literature, nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose to use another form of supervision information for feature selection, i.e. pairwise constraints, which specifies whether a pair of data samples belong to the same class (must-link constraints) or different classes (cannot-link constraints). Pairwise constraints arise naturally in many tasks and are more practical and inexpensive than class labels. This topic has not yet been addressed in feature selection research. We call our pairwise constraints guided feature selection algorithm as Constraint Score and compare it with the well-known Fisher Score and Laplacian Score algorithms. Experiments are carried out on several high-dimensional UCI and face data sets. Experimental results show that, with very few pairwise constraints, Constraint Score achieves similar or even higher performance than Fisher Score with full class labels on the whole training data, and significantly outperforms Laplacian Score.