Results 1 - 10
of
26
Learning a distance metric from relative comparisons
- In Proceedings of Neural Information Processing Systems
, 2004
"... This paper presents a method for learning a distance metric from relative comparison such as “A is closer to B than A is to C”. Taking a Support Vector Machine (SVM) approach, we develop an algorithm that provides a flexible way of describing qualitative training data as a set of constraints. We sho ..."
Abstract
-
Cited by 80 (0 self)
- Add to MetaCart
This paper presents a method for learning a distance metric from relative comparison such as “A is closer to B than A is to C”. Taking a Support Vector Machine (SVM) approach, we develop an algorithm that provides a flexible way of describing qualitative training data as a set of constraints. We show that such constraints lead to a convex quadratic programming problem that can be solved by adapting standard methods for SVM training. We empirically evaluate the performance and the modelling flexibility of the algorithm on a collection of text documents. 1
Case Study: Visualizing Sets of Evolutionary Trees
, 2002
"... We describe a visualization tool which allows a biologist to explore a large set of hypothetical evolutionary trees. Interacting with such a dataset allows the biologist to identify distinct hypotheses about how different species or organisms evolved, which would not have been clear from traditional ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
We describe a visualization tool which allows a biologist to explore a large set of hypothetical evolutionary trees. Interacting with such a dataset allows the biologist to identify distinct hypotheses about how different species or organisms evolved, which would not have been clear from traditional analyses. Our system integrates a point-set visualization of the distribution of hypothetical trees with detail views of an individual tree, or of a consensus tree summarizing a subset of trees. Efficient algorithms were required for the key tasks of computing distances between trees, finding consensus trees, and laying out the point-set visualization. 1
Fast multidimensional scaling through sampling, springs and interpolation
- Information Visualization
, 2003
"... The term ‘proximity data ’ refers to data sets within which it is possible to assess the similarity of pairs of objects. Multidimensional scaling (MDS) is applied to such data and attempts to map high-dimensional objects onto low-dimensional space through the preservation of these similarity relatio ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
The term ‘proximity data ’ refers to data sets within which it is possible to assess the similarity of pairs of objects. Multidimensional scaling (MDS) is applied to such data and attempts to map high-dimensional objects onto low-dimensional space through the preservation of these similarity relationships. Standard MDS techniques have in the past suffered from high computational complexity and, as such, could not feasibly be applied to data sets over a few thousand objects in size. Through a novel hybrid approach based upon stochastic sampling, interpolation and spring models, we have designed an algorithm running in O(N÷N). Using Chalmers ’ 1996 O(N 2) spring model as a benchmark for the evaluation of our technique, we compare layout quality and run times using data sets of synthetic and real data. Our algorithm executes significantly faster than Chalmers ’ 1996 algorithm, whilst producing superior layouts. In reducing complexity and run time, we allow the visualisation of data sets of previously infeasible size. Our results indicate that our method is a solid foundation for interactive and visual exploration of data. 1.
Visualization Methodology for Multidimensional Scaling
- JOURNAL OF CLASSIFICATION
, 2001
"... We discuss interactive techniques for ..."
A Visual Workspace for Constructing Hybrid MDS Algorithms and Coordinating Multiple Views
- INFORMATION VISUALIZATION
, 2003
"... Data can be distinguished according to volume, variable types and distribution, and each of these characteristics imposes constraints upon the choice of applicable algorithms for their visualisation. This has led to an abundance of often disparate algorithmic techniques. Previous work has shown that ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Data can be distinguished according to volume, variable types and distribution, and each of these characteristics imposes constraints upon the choice of applicable algorithms for their visualisation. This has led to an abundance of often disparate algorithmic techniques. Previous work has shown that a hybrid algorithmic approach can be successful in addressing the impact of data volume on the feasibility of multidimensional scaling (MDS). This paper presents a system and framework in which a user can easily explore algorithms as well as their hybrid conjunctions and the data flowing through them. Visual programming and a novel algorithmic architecture let the user semi--automatically define data flows and the co-ordination of multiple views of algorithmic and visualisation components. We propose that our approach has two main benefits: significant improvements in run times of MDS algorithms can be achieved, and intermediate views of the data and the visualisation program structure can provide greater insight and control over the visualisation process.
Visualizing Clustering Results
- SIAM International Conference on Data Mining
, 2002
"... Non-hierarchical clustering has a long history in numerical taxonomy [13] and machine learning [1] with many applications in fields such as data mining [2], statistical analysis [3] and information retrieval [17]. Clustering involves finding a specific number of subgroups (k) within a set of s obser ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Non-hierarchical clustering has a long history in numerical taxonomy [13] and machine learning [1] with many applications in fields such as data mining [2], statistical analysis [3] and information retrieval [17]. Clustering involves finding a specific number of subgroups (k) within a set of s observations (data points/objects); each described by d
Visualization and Integration of Protein-Protein Interactions
, 2002
"... CONTENTS INTRODUCTION ..........................................................................................................................................2 WHY DO WE NEED VISUALIZATION? .......................................................................................................... ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
CONTENTS INTRODUCTION ..........................................................................................................................................2 WHY DO WE NEED VISUALIZATION? .............................................................................................................3 PROTEIN INTERACTION MAPS VERSUS METABOLIC PATHWAYS ....................................................................3 PROTEIN NETWORKS, PROTEIN COMPLEXES, AND DYNAMIC PROTEIN INTERACTIONS ..................................4 PROTEIN-PROTEIN INTERACTIONS AND ASSOCIATED INFORMATION.............................................................5 VISUALIZATION..........................................................................................................................................5 RELATIONAL VISUALIZATION ...............
Automatic Document Categorization: Interpreting the Performance of Clustering Algorithms
- In Günter, Kruse, Neumann (Eds.): Advances in Artificial Intelligence. LNAI 2821
, 2003
"... Abstract Clustering a document collection is the current approach to automatically derive underlying document categories. The categorization performance of a document clustering algorithm can be captured by the F-Measure, which quantifies how close a human-defined categorization has been resembled. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract Clustering a document collection is the current approach to automatically derive underlying document categories. The categorization performance of a document clustering algorithm can be captured by the F-Measure, which quantifies how close a human-defined categorization has been resembled. However, a bad F-Measure value tells us nothing about the reason why a clustering algorithm performs poorly. Among several possible explanations the most interesting question is the following: Are the implicit assumptions of the clustering algorithm admissible with respect to a document categorization task? Though the use of clustering algorithms for document categorization is widely accepted, no foundation or rationale has been stated for this admissibility question. The paper in hand is devoted to this gap. It presents considerations and a measure to quantify the sensibility of a clustering process with regard to geometric distortions of the data space. Along with the method of multidimensional scaling, this measure provides an instrument for accessing a clustering algorithm’s adequacy.
Towards the Application of Classification Techniques to Test and Identify Faults in Multimedia Systems
- Proceedings of the 4th International Conference on Quality Software (QSIC 2004)
, 2004
"... The advances in computer and graphic technologies have led to the popular use of multimedia for information exchange. However, multimedia systems are difficult to test. A major reason is that these systems generally exhibit fuzziness in their temporal behaviors. The fuzziness may be caused by the ex ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The advances in computer and graphic technologies have led to the popular use of multimedia for information exchange. However, multimedia systems are difficult to test. A major reason is that these systems generally exhibit fuzziness in their temporal behaviors. The fuzziness may be caused by the existence of non-deterministic factors in their runtime environments, such as system load and network traffic. It complicates the analysis of test results. The problem is aggravated when a test involves the synchronization of different multimedia streams as well as variations in system loading.
In this paper, we conduct an empirical study on the testing and fault-identification of multimedia systems by treating the issue as a classification problem. Typical classification techniques, including Bayesian networks, k-nearest neighbor, and neural networks, are experimented with the use of X-Smiles, an open sourced multimedia authoring tool supporting the Synchronized Multimedia Integration Language (SMIL). From these experiments, we make a few interesting observations and give plausible explanations based on the geometrical properties of the test results.
Data Visualization Through Graph Drawing
- Comput. Statist
, 2001
"... . In this paper the problem of visualizing categorical multivariate data sets is considered. By representing the data as the adjacency matrix of an appropriately defined bipartite graph, the problem is transformed to one of graph drawing. A general graph drawing framework is introduced, the corr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
. In this paper the problem of visualizing categorical multivariate data sets is considered. By representing the data as the adjacency matrix of an appropriately defined bipartite graph, the problem is transformed to one of graph drawing. A general graph drawing framework is introduced, the corresponding mathematical problem defined and an algorithmic approach for solving the necessary optimization problem discussed. The new approach is illustrated through several examples. 1.

