Results 1 - 10
of
12
Probabilistic Author-Topic Models for Information Discovery
- The Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics, and each topic is represented as a probabilit ..."
Abstract
-
Cited by 85 (8 self)
- Add to MetaCart
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words for that topic. The words in a multi-author paper are assumed to be the result of a mixture of each authors' topic mixture. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and learn a model with 300 topics. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, significant trends in the computer science literature between 1990 and 2002, parsing of abstracts by topics and authors and detection of unusual papers by specific authors. An online query interface to the model is also discussed that allows interactive exploration of author-topic models for corpora such as CiteSeer.
CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature
- Journal of the American Society for Information Science and Technology
, 2006
"... This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conc ..."
Abstract
-
Cited by 53 (14 self)
- Add to MetaCart
This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science – research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature – an evolving network of scientific publications cited by research front concepts. Kleinberg’s burst detection algorithm is adapted to identify emergent research front concepts. Freeman’s betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are: 1) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, 2) the value of a co-citation cluster is explicitly interpreted in terms of research front concepts and 3) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.
The centrality of pivotal points in the evolution of scientific networks
- In Proceedings of Proceedings of the International Conference on Intelligent User Interfaces (IUI 2005
, 2005
"... www.library.drexel.edu The following item is made available as a courtesy to scholars by the author(s) and Drexel University Library and may contain materials and content, including computer code and tags, artwork, text, graphics, images, and illustrations (Material) which may be protected by copyri ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
www.library.drexel.edu The following item is made available as a courtesy to scholars by the author(s) and Drexel University Library and may contain materials and content, including computer code and tags, artwork, text, graphics, images, and illustrations (Material) which may be protected by copyright law. Unless otherwise noted, the Material is made available for non profit and educational purposes, such as research, teaching and private study. For these limited purposes, you may reproduce (print, download or make copies) the Material without prior permission. All copies must include any copyright notice originally included with the Material. You must seek permission from the authors or copyright owners for all uses that are not allowed by fair use and other provisions of the U.S. Copyright Law. The responsibility for making an independent legal assessment and securing any necessary permission rests with persons desiring to reproduce or use the Material. Please direct questions to archives@drexel.edu
Learning Author-Topic Models from Text Corpora
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2008
"... We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is repr ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to three large text corpora: 150,000 abstracts from the CiteSeer digital library, 1,740 papers from the Neural Information Processing Systems (NIPS) Conferences, and 121,000 emails from the Enron corporation. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors. Experiments based on perplexity scores for test documents and precision-recall for document retrieval are used to illustrate systematic differences between the proposed author-topic model and a number of alternatives. Extensions to the model, allowing (for example) generalizations of the notion of an author, are also briefly discussed.
Learning Author Topic Models from Text Corpora
, 2005
"... We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is r ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to three large text corpora: 150,000 abstracts from the CiteSeer digital library, 1,740 papers from the Neural Information Processing Systems Conference (NIPS), and 121,000 emails from a large corporation. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors. Experiments based
Constrained Simultaneous and Near-simultaneous Embeddings
, 2007
"... A geometric simultaneous embedding of two graphs G1 = (V1, E1) and G2 = (V2, E2) with a bijective mapping of their vertex sets γ: V1 → V2 is a pair of planar straight-line drawings Γ1 of G1 and Γ2 of G2, such that each vertex v2 = γ(v1) is mapped in Γ2 to the same point where v1 is mapped in Γ1, wh ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A geometric simultaneous embedding of two graphs G1 = (V1, E1) and G2 = (V2, E2) with a bijective mapping of their vertex sets γ: V1 → V2 is a pair of planar straight-line drawings Γ1 of G1 and Γ2 of G2, such that each vertex v2 = γ(v1) is mapped in Γ2 to the same point where v1 is mapped in Γ1, where v1 ∈ V1 and v2 ∈ V2. In this paper we examine several constrained versions and a relaxed version of the geometric simultaneous embedding problem. We show that if the input graphs are assumed to share no common edges this does not seem to yield large classes of graphs that can be simultaneously embedded. Further, if a prescribed combinatorial embedding for each input graph must be preserved, then we can answer some of the problems that are still open for geometric simultaneous embedding. Finally, we present some positive and negative results on the near-simultaneous embedding problem, in which vertices are not forced to be placed exactly in the same, but just in “near” points in different drawings.
Measuring the Movement of a Research Paradigm
"... www.library.drexel.edu The following item is made available as a courtesy to scholars by the author(s) and Drexel University Library and may contain materials and content, including computer code and tags, artwork, text, graphics, images, and illustrations (Material) which may be protected by copyri ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
www.library.drexel.edu The following item is made available as a courtesy to scholars by the author(s) and Drexel University Library and may contain materials and content, including computer code and tags, artwork, text, graphics, images, and illustrations (Material) which may be protected by copyright law. Unless otherwise noted, the Material is made available for non profit and educational purposes, such as research, teaching and private study. For these limited purposes, you may reproduce (print, download or make copies) the Material without prior permission. All copies must include any copyright notice originally included with the Material. You must seek permission from the authors or copyright owners for all uses that are not allowed by fair use and other provisions of the U.S. Copyright Law. The responsibility for making an independent legal assessment and securing any necessary permission rests with persons desiring to reproduce or use the Material.
A system for visualizing and analyzing the evolution of the Web with a time series of graphs
- In International workshop on automatic faceand gesture-recognition
, 2005
"... We propose WebRelievo, a system for visualizing and analyzing the evolution of the web structure based on a large Web archive with a series of snapshots. It visualizes the evolution with a time series of graphs, in which nodes are web pages, and edges are relationships between pages. Graphs can be c ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We propose WebRelievo, a system for visualizing and analyzing the evolution of the web structure based on a large Web archive with a series of snapshots. It visualizes the evolution with a time series of graphs, in which nodes are web pages, and edges are relationships between pages. Graphs can be clustered to show the overview of changes in graphs. WebRelievo aligns these graphs according to their time, and automatically determines their layout keeping positions of nodes synchronized over time, so that the user can keep track pages and clusters. This visualization enables us to understand when pages appeared, how their relationships have evolved, and how clusters are merged and split over time. Current implementation of WebRelievo is based on six Japanese web archives crawled from 1999 to 2003. The user can interactively browse those graphs by changing the focused page and by changing layouts of graphs. Using WebRelievo we can answer historical questions, and to investigate changes in trends on the Web. We show the feasibility of WebRelievo by applying it to tracking trends in P2P systems and search engines for mobile phones, and to investigating link spamming.
Interactively Visualizing Dynamic Social Networks with
"... The dynamic social network visualizer “DySoN ” (Dynamic Social Networks) aims at understanding patterns and structural changes in dynamic social networks that evolve over time via an interactive visualization approach. As an alternative and supplementation to the numerous other approaches to visuali ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The dynamic social network visualizer “DySoN ” (Dynamic Social Networks) aims at understanding patterns and structural changes in dynamic social networks that evolve over time via an interactive visualization approach. As an alternative and supplementation to the numerous other approaches to visualization of social network data and as an attempt to overcome some of the drawbacks of these approaches, DySoN interactively visualizes streaming event data of social interactions by an interactive three-dimensional model of interpolated NURBS ”tubes”, representing activity and social proximity within a given set of actors during a given time period by using three dimensions of temporal information mapping: spatial density (tube distance), tubecolor and tube-diameter. We use a self assembled large collaboration network of Jazz musicians with a straightforward semantics for the computation of relation strengths for the evaluation of the approach. We also discuss applications of the concept for awareness services in mobile peer to peer social networks, which exhibit a vivid measurable social micro dynamics in time and space.
INCREASING UNDERGRADUATE INVOLVEMENT IN
"... Current undergraduate Computer Science curricula are generally built around a set of traditional lecture-oriented courses where the student is a passive recipient of knowledge. While easy to implement, such a model has the drawback of presenting the field as a static corpus of facts and techniques. ..."
Abstract
- Add to MetaCart
Current undergraduate Computer Science curricula are generally built around a set of traditional lecture-oriented courses where the student is a passive recipient of knowledge. While easy to implement, such a model has the drawback of presenting the field as a static corpus of facts and techniques. It does little to challenge and engage the brightest of students, or prepare them to participate directly and actively in a highly dynamic and rapidly evolving field. Nor does it give them a sense of engagement, belonging, and ownership in this body of knowledge. This paper describes our experiences with addressing this situation via a model that aims to get undergraduates exposed to, interested in, and involved with research early in their academic careers. We use a set of closely related research-oriented courses, starting with research seminars suitable for freshmen and sophomores, and leading up to advanced projects for juniors and seniors. These courses have the effect of engaging talented undergraduates in research early in their college careers. T This approach has led to a dramatic increase in the amount of undergraduate involvement in academic Computer Science research in our department in the last few years, and resulted in numerous research publications and awards. 1.

