Results 11 - 20
of
814
Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis
- Journal of Machine Learning Research
, 2007
"... Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a ..."
Abstract
-
Cited by 124 (12 self)
- Add to MetaCart
(Show Context)
Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called localitypreserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to non-linear dimensionality reduction scenarios by applying the kernel trick.
Combining multiple clusterings using evidence accumulation
- IEEE Transaction on Pattern Analysis and Machine Intelligence
, 2005
"... We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble- a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: (1)- applying differ ..."
Abstract
-
Cited by 108 (7 self)
- Add to MetaCart
(Show Context)
We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble- a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: (1)- applying different clustering algorithms, and (2)- applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n × n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well known clustering algorithms.
Alignment-free sequence comparison-a review
- Bioinformatics
, 2003
"... Motivation: Genetic recombination and, in particular, genetic shuffling are at odds with sequence comparison by alignment, which assumes conservation of contiguity between homologous segments. A variety of theoretical foundations are being used to derive alignment-free methods that overcome this lim ..."
Abstract
-
Cited by 102 (8 self)
- Add to MetaCart
(Show Context)
Motivation: Genetic recombination and, in particular, genetic shuffling are at odds with sequence comparison by alignment, which assumes conservation of contiguity between homologous segments. A variety of theoretical foundations are being used to derive alignment-free methods that overcome this limitation. The formulation of alternative metrics for dissimilarity between sequences and their algorithmic implementations are reviewed. Results: The overwhelming majority of work on alignmentfree sequence has taken place in the past two decades, with most reports published in the past 5 years. Two main categories of methods have been proposed—methods based on word (oligomer) frequency, and methods that do not require resolving the sequence with fixed word length segments. The first category is based on the statistics of word frequency, on the distances defined in a Cartesian space defined by the frequency vectors, and on the information content of frequency distribution. The second category includes the use of Kolmogorov complexity and Chaos Theory. Despite their low visibility, alignment-free metrics are in fact already widely used as pre-selection filters for alignment-based querying of large applications. Recent work is furthering their usage as a scale-independent methodology that is capable of recognizing homology when loss of contiguity is beyond the possibility of alignment. Availability: Most of the alignment-free algorithms reviewed were implemented in MATLAB code and are available
BValidating the independent components of neuroimaging time series via clustering and visualization
- NeuroImage
, 2004
"... and visualization ..."
An Analysis of Recent Work on Clustering Algorithms
, 1999
"... This paper describes four recent papers on clustering, each of which approaches the clustering problem from a different perspective and with different goals. It analyzes the strengths and weaknesses of each approach and describes how a user could could decide which algorithm to use for a given clust ..."
Abstract
-
Cited by 86 (0 self)
- Add to MetaCart
This paper describes four recent papers on clustering, each of which approaches the clustering problem from a different perspective and with different goals. It analyzes the strengths and weaknesses of each approach and describes how a user could could decide which algorithm to use for a given clustering application. Finally, it concludes with ideas that could make the selection and use of clustering algorithms for data analysis less difficult.
XClust: Clustering XML Schemas for Effective Integration
, 2002
"... It is increasingly important to develop scalable integration techniques for the growing number of XML data sources. A practical starting point for the integration of large numbers of Document Type Definitions (DTDs) of XML sources would be to first find clusters of DTDs that are similar in structure ..."
Abstract
-
Cited by 84 (1 self)
- Add to MetaCart
(Show Context)
It is increasingly important to develop scalable integration techniques for the growing number of XML data sources. A practical starting point for the integration of large numbers of Document Type Definitions (DTDs) of XML sources would be to first find clusters of DTDs that are similar in structure and semantics. Reconciling similar DTDs within such a cluster will be an easier task than reconciling DTDs that are different in structure and semantics as the latter would involve more restructuring. We introduce XClust, a novel integration strategy that involves the clustering of DTDs. A matching algorithm based on the semantics, immediate descendents and leaf-context similarity of DTD elements is developed. Our experiments to integrate real world DTDs demonstrate the effectiveness of the XClust approach.
Cluster Validation Techniques for Genome Expression Data
- Signal Processing
, 2002
"... Several clustering algorithms have been suggested to analyse genome expression data, but fewer solutions have been implemented to guide the design of clusteringbased experiments and assess the quality of their outcomes. A cluster validity framework provides insights into the problem of predicting th ..."
Abstract
-
Cited by 81 (12 self)
- Add to MetaCart
(Show Context)
Several clustering algorithms have been suggested to analyse genome expression data, but fewer solutions have been implemented to guide the design of clusteringbased experiments and assess the quality of their outcomes. A cluster validity framework provides insights into the problem of predicting the correct the number of clusters. This paper presents several validation techniques for gene expression data analysis. Normalisation and validity aggregation strategies are proposed to improve the prediction about the number of relevant clusters. The results obtained indicate that this systematic evaluation approach may significantly support genome expression analyses for knowledge discovery applications.
To carve nature at its joints: On the existence of discrete classes in personality
- Psychological Review
, 1985
"... In principle, units of personality may be of two varieties: dimensional variables, which involve continuously distributed differences in degree, and class variables, which involve discretely distributed differences in kind. There exists, however, a prevailing and rarely questioned assumption that th ..."
Abstract
-
Cited by 79 (5 self)
- Add to MetaCart
(Show Context)
In principle, units of personality may be of two varieties: dimensional variables, which involve continuously distributed differences in degree, and class variables, which involve discretely distributed differences in kind. There exists, however, a prevailing and rarely questioned assumption that the units of personality are continuous dimensions and an accompanying prejudice against class variables. We examine this prejudice, the arguments that generated it, and those that uphold it. We conclude that these arguments are applicable to class variables as they often have been explicated, in phenetic terms; by contrast, genetically explicated class variables are not vulnerable to these arguments. We propose criteria for conjecturing and present methods for corroborating the existence of class variables in personality. Specifically, we test a class model of a construct whose conceptual status makes it reasonable to evaluate whether or not the differences between individuals represented by this construct constitute discrete classes. Finally, we examine the implications for conceptualizing and investigating the nature and origins of personality. As a psychological concept, personality re-fers to regularities and consistencies in the behavior of individuals and to structures and processes that underlie these regularities and consistencies. Such phenomena, to the extent that they exist, ought to distinguish individuals from other individuals and to render their actions predictable. Typically, in personality theories, these distinguishing features have been treated as comparative individual differ-ences on the assumption that one can mean-The research and preparation of this manuscript were
Collaborative learning through computermediated communication in academic education
- Institute, University of Maastricht, Maastricht
, 2001
"... Abstract: This article reports on three studies that involved undergraduate students collaboratively working on authentic discussion tasks in synchronous and asynchronous Computer Mediated Communication systems (Netmeeting, Belvédère, Allaire Forums). The purposes of the assignments were respectivel ..."
Abstract
-
Cited by 75 (1 self)
- Add to MetaCart
(Show Context)
Abstract: This article reports on three studies that involved undergraduate students collaboratively working on authentic discussion tasks in synchronous and asynchronous Computer Mediated Communication systems (Netmeeting, Belvédère, Allaire Forums). The purposes of the assignments were respectively to develop insight and understanding in a theoretical framework, to co-construct meaningful didactics for a computer based training program and to develop insight and understanding in educational theories in relation to technology. To examine whether the use of the CMC systems could meet these ends, students ’ dialogues are characterised in terms of their constructive and argumentative contributions and by their focus on the meaning and application of concepts. In addition, we compare different forms of support, from peer coaching synchronous discussions, offering graphical support at the interface to reflective moderation in asynchronous discussions. Although we compare different systems and assignments, the interplay between focused argumentation and constructive contributions allows us to analyse and compare interaction and learning in different CMC systems, with and without forms of pedagogical support.