Results 1 - 10
of
13
Recommending Citations: Translating Papers into References
"... When we write or prepare to write a research paper, we always have appropriate references in mind. However, there are most likely references we have missed and should have been read and cited. As such a good citation recommendation system would not only improve our paper but, overall, the efficiency ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
(Show Context)
When we write or prepare to write a research paper, we always have appropriate references in mind. However, there are most likely references we have missed and should have been read and cited. As such a good citation recommendation system would not only improve our paper but, overall, the efficiency and quality of literature search. Usually, a citation’s context contains explicit words explaining the citation. Using this, we propose a method that “translates ” research papers into references. By considering the citations and their contexts from existing papers as parallel data written in two different “languages”, we adopt the translation model to create a relationship between these two “vocabularies”. Experiments on both CiteSeer and CiteULike dataset show that our approach outperforms other baseline methods and increase the precision, recall and f-measure by at least 5% to 10%, respectively. In addition, our approach runs much faster in the both training and recommending stage, which proves the effectiveness and the scalability of our work.
CiteSeerX: A scholarly big dataset
- Proceedings of the 36th European Conference on Information Retrieval
, 2014
"... Abstract. The CiteSeerx digital library stores and indexes research ar-ticles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeerx has been proven as a powerful resource in many data min-ing, machine le ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract. The CiteSeerx digital library stores and indexes research ar-ticles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeerx has been proven as a powerful resource in many data min-ing, machine learning and information retrieval applications that use rich metadata, e.g., titles, abstracts, authors, venues, references lists, etc. The metadata extraction in CiteSeerx is done using automated tech-niques. Although fairly accurate, these techniques still result in noisy metadata. Since the performance of models trained on these data highly depends on the quality of the data, we propose an approach to CiteSeerx metadata cleaning that incorporates information from an external data source. The result is a subset of CiteSeerx, which is substantially cleaner than the entire set. Our goal is to make the new dataset available to the research community to facilitate future work in Information Retrieval.
Discovering Coherent Topics Using General Knowledge.
- In Proceedings of CIKM,
, 2013
"... ABSTRACT Topic models have been widely used to discover latent topics in text documents. However, they may produce topics that are not interpretable for an application. Researchers have proposed to incorporate prior domain knowledge into topic models to help produce coherent topics. The knowledge u ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
ABSTRACT Topic models have been widely used to discover latent topics in text documents. However, they may produce topics that are not interpretable for an application. Researchers have proposed to incorporate prior domain knowledge into topic models to help produce coherent topics. The knowledge used in existing models is typically domain dependent and assumed to be correct. However, one key weakness of this knowledge-based approach is that it requires the user to know the domain very well and to be able to provide knowledge suitable for the domain, which is not always the case because in most real-life applications, the user wants to find what they do not know. In this paper, we propose a framework to leverage the general knowledge in topic models. Such knowledge is domain independent. Specifically, we use one form of general knowledge, i.e., lexical semantic relations of words such as synonyms, antonyms and adjective attributes, to help produce more coherent topics. However, there is a major obstacle, i.e., a word can have multiple meanings/senses and each meaning often has a different set of synonyms and antonyms. Not every meaning is suitable or correct for a domain. Wrong knowledge can result in poor quality topics. To deal with wrong knowledge, we propose a new model, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries. To the best of our knowledge, GK-LDA is the first such model that can incorporate the domain independent knowledge. Our experiments using online product reviews show that GK-LDA performs significantly better than existing state-of-the-art models.
Online Egocentric Models for Citation Networks
"... With the emergence of large-scale evolving (timevarying) networks, dynamic network analysis (DNA) has become a very hot research topic in recent years. Although a lot of DNA methods have been proposed by researchers from different communities, most of them can only model snapshot data recorded at a ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
With the emergence of large-scale evolving (timevarying) networks, dynamic network analysis (DNA) has become a very hot research topic in recent years. Although a lot of DNA methods have been proposed by researchers from different communities, most of them can only model snapshot data recorded at a very rough temporal granularity. Recently, some models have been proposed for DNA which can be used to model large-scale citation networks at a fine temporal granularity. However, they suffer from a significant decrease of accuracy over time because the learned parameters or node features are static (fixed) during the prediction process for evolving citation
Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach
"... Given the large amounts of online textual documents available these days, e.g., news articles, weblogs, and scientific papers, ef-fective methods for extracting keyphrases, which provide a high-level topic descrip-tion of a document, are greatly needed. In this paper, we propose a supervised model f ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Given the large amounts of online textual documents available these days, e.g., news articles, weblogs, and scientific papers, ef-fective methods for extracting keyphrases, which provide a high-level topic descrip-tion of a document, are greatly needed. In this paper, we propose a supervised model for keyphrase extraction from research pa-pers, which are embedded in citation net-works. To this end, we design novel fea-tures based on citation network informa-tion and use them in conjunction with tra-ditional features for keyphrase extraction to obtain remarkable improvements in per-formance over strong baselines. 1
Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization
"... Abstract. Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inferenc ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.
Academic Network Analysis: A Joint Topic Modeling Approach
"... Abstract—We propose a novel probabilistic topic model that jointly models authors, documents, cited authors, and venues simultaneously in one integrated framework, as compared to previous work which embeds fewer components. This model is designed for three typical applications in academic network an ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—We propose a novel probabilistic topic model that jointly models authors, documents, cited authors, and venues simultaneously in one integrated framework, as compared to previous work which embeds fewer components. This model is designed for three typical applications in academic network analysis: the problems of expert ranking, cited author prediction and venue prediction. Experiments based on two real world data sets demonstrate the model to be effective, and it outperforms several state-of-the-art algorithms in all three applications.
Information Sciences and Technology,
"... Abstract. The CiteSeer x digital library stores and indexes research articles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeer x has been proven as a powerful resource in many data mining, machine le ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The CiteSeer x digital library stores and indexes research articles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeer x has been proven as a powerful resource in many data mining, machine learning and information retrieval applications that use rich metadata, e.g., titles, abstracts, authors, venues, references lists, etc. The metadata extraction in CiteSeer x is done using automated techniques. Although fairly accurate, these techniques still result in noisy metadata. Since the performance of models trained on these data highly depends on the quality of the data, we propose an approach to CiteSeer x metadata cleaning that incorporates information from an external data source. The result is a subset of CiteSeer x, which is substantially cleaner than the entire set. Our goal is to make the new dataset available to the research community to facilitate future work in Information Retrieval.
EMNLP versus ACL: Analyzing NLP Research Over Time
"... Abstract The conferences ACL (Association for Computational Linguistics) and EMNLP (Empirical Methods in Natural Language Processing) rank among the premier venues that track the research developments in Natural Language Processing and Computational Linguistics. In this paper, we present a study on ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract The conferences ACL (Association for Computational Linguistics) and EMNLP (Empirical Methods in Natural Language Processing) rank among the premier venues that track the research developments in Natural Language Processing and Computational Linguistics. In this paper, we present a study on the research papers of approximately two decades from these two NLP conferences. We apply keyphrase extraction and corpus analysis tools to the proceedings from these venues and propose probabilistic and vector-based representations to represent the topics published in a venue for a given year. Next, similarity metrics are studied over pairs of venue representations to capture the progress of the two venues with respect to each other and over time.