Results 1 - 10
of
20
Curvilinear Component Analysis: A Self-Organizing Neural Network for Nonlinear Mapping of Data Sets
, 1997
"... We present a new strategy called “curvilinear component analysis” (CCA) for dimensionality reduction and representation of multidimensional data sets. The principle of CCA is a self-organized neural network performing two tasks: vector quantization (VQ) of the submanifold in the data set (input spac ..."
Abstract
-
Cited by 121 (1 self)
- Add to MetaCart
We present a new strategy called “curvilinear component analysis” (CCA) for dimensionality reduction and representation of multidimensional data sets. The principle of CCA is a self-organized neural network performing two tasks: vector quantization (VQ) of the submanifold in the data set (input space) and nonlinear projection (P) of these quantizing vectors toward an output space, providing a revealing unfolding of the submanifold. After learning, the network has the ability to continuously map any new point from one space into another: forward mapping of new points in the input space, or backward mapping of an arbitrary position in the output space.
SOM-Based Data Visualization Methods
- Intelligent Data Analysis
, 1999
"... The Self-Organizing Map (SOM) is an efficient tool for visualization of multidimensional numerical data. In this paper, an overview and categorization of both old and new methods for the visualization of SOM is presented. The purpose is to give an idea of what kind of information can be acquired fro ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
The Self-Organizing Map (SOM) is an efficient tool for visualization of multidimensional numerical data. In this paper, an overview and categorization of both old and new methods for the visualization of SOM is presented. The purpose is to give an idea of what kind of information can be acquired from different presentations and how the SOM can best be utilized in exploratory data visualization. Most of the presented methods can also be applied in the more general case of first making a vector quantization (e.g. k-means) and then a vector projection (e.g. Sammon's mapping).
Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions
- Bioinformatics
, 2003
"... Motivation: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the ‘curse of dimensionality’: the number of features characterizing these data is in the thousands or tens of thousands. The oth ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
Motivation: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the ‘curse of dimensionality’: the number of features characterizing these data is in the thousands or tens of thousands. The other is the ‘curse of dataset sparsity’: the number of samples is limited. The consequences of these two curses are far-reaching when such data are used to classify the presence or absence of disease. Results: Using very simple classifiers, we show for several publicly available microarray and proteomics datasets how these curses influence classification outcomes. In particular, even if the sample per feature ratio is increased to the recommended 5–10 by feature extraction/reduction methods, dataset sparsity can render any classification result statistically suspect. In addition, several ‘optimal’ feature sets are typically identifiable for sparse datasets, all producing perfect classification results, both for the training and independent validation sets. This non-uniqueness leads to interpretational difficulties and casts doubt on the biological relevance of any of these ‘optimal’ feature sets. We suggest an approach to assess the relative quality of apparently equally good classifiers.
Learning as Extraction of Low-Dimensional Representations
- Mechanisms of Perceptual Learning
, 1996
"... Psychophysical findings accumulated over the past several decades indicate that perceptual tasks such as similarity judgment tend to be performed on a low-dimensional representation of the sensory data. Low dimensionality is especially important for learning, as the number of examples required for a ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Psychophysical findings accumulated over the past several decades indicate that perceptual tasks such as similarity judgment tend to be performed on a low-dimensional representation of the sensory data. Low dimensionality is especially important for learning, as the number of examples required for attaining a given level of performance grows exponentially with the dimensionality of the underlying representation space. In this chapter, we argue that, whereas many perceptual problems are tractable precisely because their intrinsic dimensionality is low, the raw dimensionality of the sensory data is normally high, and must be reduced by a nontrivial computational process, which, in itself, may involve learning. Following a survey of computational techniques for dimensionality reduction, we show that it is possible to learn a low-dimensional representation that captures the intrinsic low-dimensional nature of certain classes of visual objects, thereby facilitating further learning of tasks...
Linear Feature Extractors Based on Mutual Information
- In Proceedings of the 13th International Conference on Pattern Recognition
, 1996
"... This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only sim ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only simple low order dependencies. As evidenced by several simulations on high dimensional data sets, the proposed techniques provide superior feature extraction and better dimensionality reduction while having similar computational requirements. 1. Introduction The capabilities of a classifier are ultimately limited by the quality of the features in each input vector. In particular, when the measurement space is highdimensional but the number of samples is limited, one is faced with the "curse of dimensionality" problem during training [3]. Feature extraction is often used to alleviate this problem. Although linear feature extractors are ultimately less flexible than the more general non-linear ...
Linear discriminant analysis for two classes via removal of classification structure
- IEEE Trans. on Pattern Analysis and Machine Intelligence
, 1997
"... Index Terms—Exploratory data analysis, dimension reduction, linear discriminant analysis, discriminant plots, structure removal. ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Index Terms—Exploratory data analysis, dimension reduction, linear discriminant analysis, discriminant plots, structure removal.
Learning Low Dimensional Representations of Visual Objects With Multiple Use of Prior Knowledge
, 1997
"... . Learning to recognize visual objects from examples requires the ability to find meaningful patterns in spaces of very high dimensionality. We present a method for dimensionality reduction which effectively biases the learning system by combining multiple constraints via the use of class labels. Th ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
. Learning to recognize visual objects from examples requires the ability to find meaningful patterns in spaces of very high dimensionality. We present a method for dimensionality reduction which effectively biases the learning system by combining multiple constraints via the use of class labels. The use of multiple class labels steers the resulting low-dimensional representation to become invariant to those directions of variation in the input space that are irrelevant to classification; this is done merely by making class labels independent of these directions. We also show that prior knowledge of the proper dimensionality of the target representation can be imposed by training a multi-layer bottleneck network. Computational experiments involving non-trivial categorization of parameterized fractal images and of human faces indicate that the low-dimensional representation extracted by our method leads to improved generalization in the learned tasks and is likely to preserve the topolo...
A Survey of Methods for Multivariate Data Projection, Visualisation and Interactive Analysis
"... In this paper, algorithms for multivariate data projection, based on topology or distance preserving mappings, as well as tools and techniques for projection display and user interaction are briefly reviewed and compared in an unifying approach. Advanced mapping algorithms, that focuse on improved d ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
In this paper, algorithms for multivariate data projection, based on topology or distance preserving mappings, as well as tools and techniques for projection display and user interaction are briefly reviewed and compared in an unifying approach. Advanced mapping algorithms, that focuse on improved data structure preservation, following laws of perception as given by Gestalt-theory, as well as advanced features of data visualisation and navigation are introduced. These methods help to exploit the remarkable human perceiptive and associative capabilities in man/computer dialog, e.g. for visual exploratory data analysis.
Topographic Mappings and Feed-Forward Neural Networks
, 1996
"... This paper represented an extension of work in the earlier paper [Webb 1992], described previously. In contrast to that implementation, the monotonic regression phase was discarded and a radial basis function network was used to effect the transformation, as suggested by Lowe [1993]. A further exten ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper represented an extension of work in the earlier paper [Webb 1992], described previously. In contrast to that implementation, the monotonic regression phase was discarded and a radial basis function network was used to effect the transformation, as suggested by Lowe [1993]. A further extension of the procedure was to include a mechanism for discrimination, similar to that employed by Koontz and Fukunaga above. Instead of minimising the standard stress measure, the author employed one of the form
Neural and Statistical Methods for the Visualization of Multidimensional Data
- DISSERTATION, KATEDRA METOD KOMPUTEROWYCH UMK
, 2001
"... In many fields of engineering science we have to deal with multivariate numerical data. In order to choose the technique that is best suited to a given task, it is necessary to get an insight into the data and to "understand" them. Much information allowing the understanding of multivariate data, th ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In many fields of engineering science we have to deal with multivariate numerical data. In order to choose the technique that is best suited to a given task, it is necessary to get an insight into the data and to "understand" them. Much information allowing the understanding of multivariate data, that is the description of its global structure, the presence and shape of clusters or outliers, can be gained through data visualization. Multivariate data visualization can be realized through a reduction of the data dimensionality, which is often performed by mathematical and statistical tools that are well known. Such tools are Principal Components Analysis or Multidimensional Scaling. Artificial neural networks have developed and found applications mainly in the last two decades, and they are now considered as a mature field of research. This thesis investigates the use of existing algorithms as applied to multivariate data visualization. First an overview of existing neural and statistical techniques applied to data visualization is presented. Then a comparison is made between two chosen algorithms from the point of view of multivariate data visualization. The chosen neural network algorithm is Kohonen's Self-Organizing Maps, and the statistical technique is Multidimensional Scaling. The advantages and drawbacks from the theoretical and practical viewpoints of both approaches are put into light. The preservation of data topology involved by those two mapping techniques is discussed. The multidimensional scaling method was analyzed in details, the importance of each parameter was determined, and the technique was implemented in metric and non-metric versions. Improvements to the algorithm were proposed in order to increase the performance of the mapping process. A graphic...

