Results 1 -
3 of
3
ParallelTopics: A Probabilistic Approach to Exploring Document Collections
"... Scalable and effective analysis of large text corpora remains a challenging problem as our ability to collect textual data continues to increase at an exponential rate. To help users make sense of large text corpora, we present a novel visual analytics system, Parallel-Topics, which integrates a sta ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Scalable and effective analysis of large text corpora remains a challenging problem as our ability to collect textual data continues to increase at an exponential rate. To help users make sense of large text corpora, we present a novel visual analytics system, Parallel-Topics, which integrates a state-of-the-art probabilistic topic model Latent Dirichlet Allocation (LDA) with interactive visualization. To describe a corpus of documents, ParallelTopics first extracts a set of semantically meaningful topics using LDA. Unlike most traditional clustering techniques in which a document is assigned to a specific cluster, the LDA model accounts for different topical aspects of each individual document. This permits effective full text analysis of larger documents that may contain multiple topics.To highlight this property of the model, ParallelTopics utilizes the parallel coordinate metaphor to present the probabilistic distribution of a document across topics. Such representation allows the users to discover single-topic vs. multi-topic documents and the relative importance of each topic to a document of interest. In addition, since
Visualizing Topic Models
"... Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method that learns the underlying themes in a large collection of otherwise u ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method that learns the underlying themes in a large collection of otherwise unorganized documents. This discovered structure summarizes and organizes the documents. However, topic models are high-level statistical tools—a user must scrutinize numerical distributions to understand and explore their results. In this paper, we present a method for visualizing topic models. Our method creates a navigator of the documents, allowing users to explore the hidden structure that a topic model discovers. These browsing interfaces reveal
MDS-Based Multiresolution Nonlinear Dimensionality Reduction Model for Color Image Segmentation
"... Abstract — In this paper, we present an efficient coarse-tofine multiresolution framework for multidimensional scaling and demonstrate its performance on a large-scale nonlinear dimensionality reduction and embedding problem in a texture feature extraction step for the unsupervised image segmentatio ..."
Abstract
- Add to MetaCart
Abstract — In this paper, we present an efficient coarse-tofine multiresolution framework for multidimensional scaling and demonstrate its performance on a large-scale nonlinear dimensionality reduction and embedding problem in a texture feature extraction step for the unsupervised image segmentation problem. We demonstrate both the efficiency of our multiresolution algorithm and its real interest to learn a nonlinear low-dimensional representation of the texture feature set of an image which can then subsequently be exploited in a simple clustering-based segmentation algorithm. The resulting segmentation procedure has been successfully applied on the Berkeley image database, demonstrating its efficiency compared to the best existing state-ofthe-art segmentation methods recently proposed in the literature. Index Terms — Berkeley image database, color textured image, multidimensional scaling, multiresolution optimization, nonlinear dimensionality reduction, probability rand index, unsupervised image segmentation. I.

