Results 1 - 10
of
20
Rapid understanding of scientific paper collections: integrating statistics, text analysis, and visualization
, 2011
"... Keeping up with rapidly growing research fields, especially when there are multiple interdisciplinary sources, requires substantial effort for researchers, program managers, or venture capital investors. Current theories and tools are directed at finding a paper or website, not gaining an understand ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
(Show Context)
Keeping up with rapidly growing research fields, especially when there are multiple interdisciplinary sources, requires substantial effort for researchers, program managers, or venture capital investors. Current theories and tools are directed at finding a paper or website, not gaining an understanding of the key papers, authors, controversies, and hypotheses. This report presents an effort to integrate statistics, text analytics, and visualization in a multiple coordinated window environment that supports exploration. Our prototype system, Action Science Explorer (ASE), provides an environment for demonstrating principles of coordination and conducting iterative usability tests of them with interested and knowledgeable users. We developed an understanding of the value of reference management, statistics, citation text extraction, natural language summarization for single and multiple documents, filters to interactively select key papers, and network visualization to see citation patterns and identify clusters. A three-phase
Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction
"... Automatically extracting terminology and index terms from scientific literature is useful for a variety of digital library, indexing and search applications. This task is non-trivial, complicated by domain-specific terminology and a steady introduction of new terminology. Correctly identifying neste ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Automatically extracting terminology and index terms from scientific literature is useful for a variety of digital library, indexing and search applications. This task is non-trivial, complicated by domain-specific terminology and a steady introduction of new terminology. Correctly identifying nested terminology further adds to the challenge. We present a Dirichlet Process (DP) model of word segmentation where multiword segments are either retrieved from a cache or newly generated. We show how this DP-Segmentation model can be used to successfully extract nested terminology, outperforming previous methods for solving this problem.
Generating Extractive Summaries of Scientific Paradigms
"... Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We sh ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We show how we can use citations to produce automatically generated, readily consumable, technical extractive summaries. We first propose C-LexRank, a model for summarizing single scientific articles based on citations, which employs community detection and extracts salient information-rich sentences. Next, we further extend our experiments to summarize a set of papers, which cover the same scientific topic. We generate extractive summaries of a set of Question Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their citation sentences and show that citations have unique information amenable to creating a summary. 1.
Edge Weight Regularization Over Multiple Graphs For Similarity Learning
"... Abstract—The growth of the web has directly influenced the increase in the availability of relational data. One of the key problems in mining such data is computing the similarity between objects with heterogeneous feature types. For example, publications have many heterogeneous features like text, ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Abstract—The growth of the web has directly influenced the increase in the availability of relational data. One of the key problems in mining such data is computing the similarity between objects with heterogeneous feature types. For example, publications have many heterogeneous features like text, citations, authorship information, venue information, etc. In most approaches, similarity is estimated using each feature type in isolation and then combined in a linear fashion. However, this approach does not take advantage of the dependencies between the different feature spaces. In this paper, we propose a novel approach to combine the different sources of similarity using a regularization framework over edges in multiple graphs. We show that the objective function induced by the framework is convex. We also propose an efficient algorithm using coordinate descent [1] to solve the optimization problem. We extrinsically evaluate the performance of the proposed unified similarity measure on two different tasks, clustering and classification. The proposed similarity measure outperforms three baselines and a state-of-the-art classification algorithm on a variety of standard, large data sets.
TopicViz: Interactive Topic Exploration in Document Collections
"... Copyright is held by the author/owner(s). ..."
(Show Context)
Making the most of bag of words: Sentence regularization with alternating direction method of multipliers.
- In Proc. of ICML,
, 2014
"... Abstract In many high-dimensional learning problems, only some parts of an observation are important to the prediction task; for example, the cues to correctly categorizing a document may lie in a handful of its sentences. We introduce a learning algorithm that exploits this intuition by encoding i ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract In many high-dimensional learning problems, only some parts of an observation are important to the prediction task; for example, the cues to correctly categorizing a document may lie in a handful of its sentences. We introduce a learning algorithm that exploits this intuition by encoding it in a regularizer. Specifically, we apply the sparse overlapping group lasso with one group for every bundle of features occurring together in a training-data sentence, leading to thousands to millions of overlapping groups. We show how to efficiently solve the resulting optimization challenge using the alternating directions method of multipliers. We find that the resulting method significantly outperforms competitive baselines (standard ridge, lasso, and elastic net regularizers) on a suite of real-world text categorization problems.
Topicviz: Semantic navigation of document collections
- In CHI Workin-Progress Paper (Supplemental Proceedings
, 2012
"... When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling – a technique for identifying latent themes across large collections of docu ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling – a technique for identifying latent themes across large collections of documents – can support semantic exploration. We present TopicViz, an interactive environment for information exploration. TopicViz combines traditional search and citation-graph functionality with a range of novel interactive visualizations, centered around a force-directed layout that links documents to the latent themes discovered by the topic model. We describe several use scenarios in which TopicViz supports rapid sensemaking on large doc-ument collections. 1
Modeling Scientific Impact with Topical Influence Regression
"... When reviewing scientific literature, it would be useful to have automatic tools that iden-tify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the ex-tent to which an articl ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
When reviewing scientific literature, it would be useful to have automatic tools that iden-tify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the ex-tent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to re-cover both the degree of topical influence of each article and the influence relationships be-tween articles. Experimental results on cor-pora from two well-known computer science conferences are used to illustrate and validate the proposed approach. 1
Understanding evolution of research themes: a probabilistic generative model for citations
- In KDD’13
, 2013
"... Understanding how research themes evolve over time in a research community is useful in many ways (e.g., revealing important mile-stones and discovering emerging major research trends). In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the them ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Understanding how research themes evolve over time in a research community is useful in many ways (e.g., revealing important mile-stones and discovering emerging major research trends). In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model. The key idea is to represent a research paper by a “bag of citations” and model such a “citation document ” with a probabilistic topic model. We explore the extension of a particular topic model, i.e., Latent Dirichlet Allocation (LDA), for citation analysis, and show that such a Citation-LDA can facilitate discovering of individual re-search topics as well as the theme evolution from multiple related topics, both of which in turn lead to the construction of evolution graphs for characterizing research themes. We test the proposed citation-LDA on two datasets: the ACL Anthology Network (AAN) of natural language research literatures and PubMed Central (PMC) archive of biomedical and life sciences literatures, and demonstrate that Citation-LDA can effectively discover the evolution of research themes, with better formed topics than (conventional) Content-LDA.
Large-scale examination of academic publications using statistical models
- In International Working Conference on Advanced Visual Interfaces (AVI): Workshop on Supporting Asynchronous Collaboration in Visual Analytics Systems
, 2012
"... We describe our experiences in three collaborative visual analytics projects. The projects center on large-scale examination of academic publications using statistical models. Each project involves a multidisciplinary team of social scientists, ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We describe our experiences in three collaborative visual analytics projects. The projects center on large-scale examination of academic publications using statistical models. Each project involves a multidisciplinary team of social scientists,