Results 1 - 10
of
34
Parallel sets: Interactive exploration and visual analysis of categorical data.
- IEEE Transactions on Visualization and Computer Graphics,
, 2006
"... ..."
(Show Context)
The concentration of fractional distances
- IEEE Trans. on Knowledge and Data Engineering
, 2007
"... Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, t ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, the relevance of the euclidean distance has been questioned in the past, and fractional norms (Minkowski-like norms with an exponent less than one) were introduced to fight the concentration phenomenon. This paper justifies the use of alternative distances to fight concentration by showing that the concentration is indeed an intrinsic property of the distances and not an artifact from a finite sample. Furthermore, an estimation of the concentration as a function of the exponent of the distance and of the distribution of the data is given. It leads to the conclusion that, contrary to what is generally admitted, fractional norms are not always less concentrated than the euclidean norm; a counterexample is given to prove this claim. Theoretical arguments are presented, which show that the concentration phenomenon can appear for real data that do not match the hypotheses of the theorems, in particular, the assumption of independent and identically distributed variables. Finally, some insights about how to choose an optimal metric are given. Index Terms—Nearest neighbor search, high-dimensional data, distance concentration, fractional distances. 1
Analysis Guided Visual Exploration of Multivariate Data
"... Visualization systems traditionally focus on graphical representation of information. They tend not to provide integrated analytical services that could aid users in tackling complex knowledge discovery tasks. Users’ exploration in such environments is usually impeded due to several problems: 1) val ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Visualization systems traditionally focus on graphical representation of information. They tend not to provide integrated analytical services that could aid users in tackling complex knowledge discovery tasks. Users’ exploration in such environments is usually impeded due to several problems: 1) valuable information is hard to discover when too much data is visualized on the screen; 2) Users have to manage and organize their discoveries off line, because no systematic discovery management mechanism exists; 3) their discoveries based on visual exploration alone may lack accuracy; 4) and they have no convenient access to the important knowledge learned by other users. To tackle these problems, it has been recognized that analytical tools must be introduced into visualization systems. In this paper, we present a novel analysis-guided exploration system, called the Nugget Management System (NMS). It leverages the collaborative effort of human comprehensibility and machine computations to facilitate users ’ visual exploration processes. Specifically, NMS first extracts the valuable information (nuggets) hidden in datasets based on the interests of users. Given that similar nuggets may be re-discovered by different users, NMS consolidates the nugget candidate set by clustering based on their semantic similarity. To solve the problem of inaccurate discoveries, localized data mining techniques are applied to refine the nuggets to best represent the captured patterns in datasets. Lastly, the resulting well-organized nugget pool is used to guide users ’ exploration. To evaluate the effectiveness of NMS, we integrated NMS into XmdvTool, a freeware multivariate visualization system. User studies were performed to compare the users ’ efficiency and accuracy in finishing tasks on real datasets, with and without the help of NMS. Our user studies confirmed the effectiveness of NMS.
An Interactive 3D Integration of Parallel Coordinates and Star Glyphs
- IN PROC. IEEE INFOVIS
, 2005
"... Parallel Coordinates are a powerful method for visualizing multidimensional data, however, with large data sets they can become cluttered and difficult to read. On the other hand, a Star Glyph can be used to display either the attributes of a data item or the values across all items for a single att ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
(Show Context)
Parallel Coordinates are a powerful method for visualizing multidimensional data, however, with large data sets they can become cluttered and difficult to read. On the other hand, a Star Glyph can be used to display either the attributes of a data item or the values across all items for a single attribute. Star Glyphs may readily provide a quick impression; however, since the full data set will require multiple glyphs overall readings are more difficult. We present an interactive integration of the visual representations of Parallel Coordinates and Star Glyphs that utilizes the advantages of both representations to offset the disadvantages they have separately. We discuss the role of uniform and stepped color scales in the visual comparison of non-adjacent items and Star Glyphs. Our visualization provides capabilities for focus-in-context exploration of the data using two types of lenses, and interactions specific to the 3D space.
DimStiller: Workflows for dimensional analysis and reduction
"... DimStiller is a system for dimensionality reduction and analysis. It frames the task of understanding and transforming input dimensions as a series of analysis steps where users transform data tables by chaining together different techniques, called operators, into pipelines of expressions. The indi ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
(Show Context)
DimStiller is a system for dimensionality reduction and analysis. It frames the task of understanding and transforming input dimensions as a series of analysis steps where users transform data tables by chaining together different techniques, called operators, into pipelines of expressions. The individual operators have controls and views that are linked together based on the structure of the expression. Users interact with the operator controls to tune parameter choices, with immediate visual feedback guiding the exploration of local neighborhoods of the space of possible data tables. DimStiller also provides global guidance for navigating data-table space through expression templates called workflows, which permit re-use of common patterns of analysis. 1
Subspace search and visualization to make sense of alternative clusterings in high-dimensional data
- In Proc. IEEE Symp. on Visual Analytics Science and Technology (VAST
, 2012
"... In explorative data analysis, the data under consideration often resides in a high-dimensional (HD) data space. Currently many methods are available to analyze this type of data. So far, proposed automatic approaches include dimensionality reduction and cluster analysis, whereby visual-interactive m ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
(Show Context)
In explorative data analysis, the data under consideration often resides in a high-dimensional (HD) data space. Currently many methods are available to analyze this type of data. So far, proposed automatic approaches include dimensionality reduction and cluster analysis, whereby visual-interactive methods aim to provide effective visual mappings to show, relate, and navigate HD data. Furthermore, almost all of these methods conduct the analysis from a singular perspective, meaning that they consider the data in either the original HD data space, or a reduced version thereof. Additionally, HD data spaces often consist of combined features that measure different properties, in which case the particular relationships between the various properties may not be clear to the analysts a priori since it can only be revealed if appropriate feature combinations (subspaces) of the data are taken into consideration.
Value and relation display: Interactive visual exploration of large datasets with hundreds of dimensions. Submitted to
- IEEE Transactions on Visualization and Computer Graphics, Visual Analytics Special Issue
"... Abstract — Few existing visualization systems can handle large datasets with hundreds of dimensions, since high dimensional datasets cause clutter on the display and large response time in interactive exploration. In this paper, we present a significantly improved multi-dimensional visualization app ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Few existing visualization systems can handle large datasets with hundreds of dimensions, since high dimensional datasets cause clutter on the display and large response time in interactive exploration. In this paper, we present a significantly improved multi-dimensional visualization approach named Value and Relation (VaR) display that allows users to effectively and efficiently explore large datasets with several hundred dimensions. In the VaR display, data values and dimension relationships are explicitly visualized in the same display by using dimension glyphs to explicitly represent values in dimensions and glyph layout to explicitly convey dimension relationships. In particular, pixel-oriented techniques and density-based scatterplots are used to create dimension glyphs to convey values. Multi-dimensional scaling, Jigsaw map hierarchy visualization techniques, and an animation metaphor named Rainfall are used to convey relationships among dimensions. A rich set of interaction tools have been provided to allow users to interactively detect patterns of interest in the VaR display. A prototype of the VaR display has been fully implemented. The case studies presented in this paper show how the prototype supports interactive exploration of datasets of several hundred dimensions. A user study evaluating the prototype is also reported in this paper. Index Terms — Multi-dimensional visualization, high dimensional datasets, visual analytics.
Semantic image browser: Bridging information visualization with automated intelligent image analysis
- Proc. IEEE Symposium on Visual Analytics Science and Technology
, 2006
"... Browsing and retrieving images from large image collections are becoming common and important activities. Recent semantic image analysis techniques, which automatically detect high level semantic contents of images for annotation, are promising solutions toward this problem. However, few efforts hav ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
(Show Context)
Browsing and retrieving images from large image collections are becoming common and important activities. Recent semantic image analysis techniques, which automatically detect high level semantic contents of images for annotation, are promising solutions toward this problem. However, few efforts have been made to convey the annotation results to users in an intuitive manner to enable effective image browsing and retrieval. There also lack methods to monitor and evaluate the automatic image analysis algorithms due to the high dimensional nature of image data, features, and contents. In this paper, we propose a novel, scalable semantic image browser by applying existing information visualization techniques to semantic image analysis. This browser not only allows users to effectively browse and search in large image databases according to semantic content of images, but also allows analysts to evaluate their annotation process through interactive visual exploration. The major visualization components of this browser are Multi-Dimensional Scaling (MDS) based image layout, the Value and Relation (VaR) display that allows effective high dimensional visualization without dimension reduction, and a rich set of interaction tools such as search by sample images and content relationship detection. Our preliminary user study showed that the browser was easy to use and understand, and effective in supporting image browsing and retrieval tasks.
Quantifying and comparing features in high-dimensional datasets
- IN IV
, 2008
"... Linking and brushing is a proven approach to analyzing multi-dimensional datasets in the context of multiple coordinated views. Nevertheless, most of the respective visualization techniques only offer qualitative visual re-sults. Many user tasks, however, also require precise quantitative results as ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
(Show Context)
Linking and brushing is a proven approach to analyzing multi-dimensional datasets in the context of multiple coordinated views. Nevertheless, most of the respective visualization techniques only offer qualitative visual re-sults. Many user tasks, however, also require precise quantitative results as, for example, offered by statistical analysis. In succession of the useful Rank-by-Feature Framework, this paper describes a joint visual and statistical approach for guiding the user through a high-dimensional dataset by ranking dimensions (1D case) and pairs of dimensions (2D case) according to statistical summaries. While the original Rank-by-Feature Framework is limited to global features, the most im-portant novelty here is the concept to consider local features, i.e., data subsets defined by brushing in linked views. The ability to compare subsets to other subsets and subsets to the whole dataset in the context of a large number of dimensions significantly extends the benefits of the approach especially in later stages of an exploratory data analysis. A case study illustrates the workflow by analyzing counts of keywords for classifying e-mails as spam or no-spam.