DMCA
Making Sense of Massive Amounts of Scientific Publications: the Scientific Knowledge Miner Project
Citations
199 | Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status.
- Teufel, Moens
- 2002
(Show Context)
Citation Context ...-Layered Annotated Corpus of Scientific Papers [5]8, we aim at exploring new approaches to citation purpose and polarity classification, by placing special attention on their robustness across domains and on the limited availability of manually annotated data. 4.2 Scientific document summarization Nowadays, the possibility to automatically identify the most relevant contents across a set of scientific publications is essential to deal with and perform screenings of the huge amount of articles currently available on-line. Several approaches to scientific papers summarization have been proposed [3, 8, 11]. Most of them extend general purpose document summarization methodologies by considering information facets that are characteristic of scientific publications. In particular, the sentences of the papers in which the article to summarize is cited provide valuable material to improve the quality of scientific summarization. Also the possibility to consider the rhetorical structure (background, approach, future work, etc.) of the different excerpts of the contents of a paper to summarize provides valuable information to generate summaries that include contents better balanced across the sections... |
57 | Automatic classification of citation function. In:
- Teufel, Siddharthan, et al.
- 2006
(Show Context)
Citation Context ...levant connection among both works. The count of the citation that a paper receives constitute the basis of the most common metrics exploited to evaluate the scientific production of papers, journals and researchers (i.e. h-index). The effectiveness of citation-based research evaluation metrics would benefit from the possibility to take into account not only the number of citations a paper receives but also the purpose and the polarity of each one of them. Several classification schemata and approaches have been proposed to characterize aspects related to the purpose and polarity of citations [2,12]. By relying on and extending the set of annotated citation included in the Dr. Inventor Multi-Layered Annotated Corpus of Scientific Papers [5]8, we aim at exploring new approaches to citation purpose and polarity classification, by placing special attention on their robustness across domains and on the limited availability of manually annotated data. 4.2 Scientific document summarization Nowadays, the possibility to automatically identify the most relevant contents across a set of scientific publications is essential to deal with and perform screenings of the huge amount of articles currentl... |
31 | Citances: Citation sentences for semantic analysis of bioscience text. In:
- Nakov, Schwartz, et al.
- 2004
(Show Context)
Citation Context ...and timeconsuming task. The availability of text mining tools able to extract, aggregate and turn scientific unstructured textual contents into well organized and interconnected knowledge is fundamental. However, scientific publications are characterized by several structural (title, abstract, figures, citations...), linguistic and semantic peculiarities that make them difficult to analyze by relying on general purpose text mining tools. One of the special features of scientific papers is their network of citations, that are starting to be exploited in several context including opinion mining [2, 7] and scientific text summarization [3, 8]. Besides citations, the interpretation of the semantics of the actual textual contents of 1 http://www.ncbi.nlm.nih.gov/pubmed 2 http://www.scopus.com 3 http://www.webofknowledge.com BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries 36 scientific papers usually needs the availability of knowledge repositories with an adequate coverage of scientific concepts and relations that could not be found on global domain knowledge resources like WordNet, DBPedia, FreeBase or BabelNet. Considering both the pec... |
16 | C.L.: Citation recommendation without author supervision. In:
- He, Kifer, et al.
- 2011
(Show Context)
Citation Context ...rent strategies to improve content and graph-based summarization approaches by considering typed citation networks and by relying on the automated characterization of the rhetorical structure of scientific publications implemented by the DRI Framework. 4.3 Recommender system for citations Citation recommendation is a complex task because of the difficulty in matching excerpts of the source paper to the contents of huge amounts of other candidate articles to be cited. Among the many approaches proposed, many of them rely on text classification as well as on question answering and query ranking [4, 6]. The goal of the SKM Project is to develop a recommender system for citations, that helps authors to find relevant articles by relying both on the semantic information extracted by the DRI Framework and on the data aggregated across corpora of papers crawled from the Web. In order to test our system, we will define a prediction task, where learning from the past, we will try to predict which citations a given article will contain. 8 http://sempub.taln.upf.edu/dricorpus/ BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries 40 5 Conclusions We ... |
14 | Coherent citation-based summarization of scientific papers.
- Abu-Jbara, Radev
- 2011
(Show Context)
Citation Context ...of text mining tools able to extract, aggregate and turn scientific unstructured textual contents into well organized and interconnected knowledge is fundamental. However, scientific publications are characterized by several structural (title, abstract, figures, citations...), linguistic and semantic peculiarities that make them difficult to analyze by relying on general purpose text mining tools. One of the special features of scientific papers is their network of citations, that are starting to be exploited in several context including opinion mining [2, 7] and scientific text summarization [3, 8]. Besides citations, the interpretation of the semantics of the actual textual contents of 1 http://www.ncbi.nlm.nih.gov/pubmed 2 http://www.scopus.com 3 http://www.webofknowledge.com BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries 36 scientific papers usually needs the availability of knowledge repositories with an adequate coverage of scientific concepts and relations that could not be found on global domain knowledge resources like WordNet, DBPedia, FreeBase or BabelNet. Considering both the peculiar structural and semantic features of... |
7 | Multi-step classification approaches to cumulative citation recommendation. In:
- Balog, Ramampiaro, et al.
- 2013
(Show Context)
Citation Context ...rent strategies to improve content and graph-based summarization approaches by considering typed citation networks and by relying on the automated characterization of the rhetorical structure of scientific publications implemented by the DRI Framework. 4.3 Recommender system for citations Citation recommendation is a complex task because of the difficulty in matching excerpts of the source paper to the contents of huge amounts of other candidate articles to be cited. Among the many approaches proposed, many of them rely on text classification as well as on question answering and query ranking [4, 6]. The goal of the SKM Project is to develop a recommender system for citations, that helps authors to find relevant articles by relying both on the semantic information extracted by the DRI Framework and on the data aggregated across corpora of papers crawled from the Web. In order to test our system, we will define a prediction task, where learning from the past, we will try to predict which citations a given article will contain. 8 http://sempub.taln.upf.edu/dricorpus/ BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries 40 5 Conclusions We ... |
6 |
Summa: A robust and adaptable summarization tool. In: Traitement Automatique des Langues.
- Saggion
- 2008
(Show Context)
Citation Context ... and ease the analysis of the textual contents of scientific publications, both in PDF and JATS XML format. To get more information on the framework, the interested reader can refer to [9]. In the SKM Project will extend the Framework by implementing semi-supervised or unsupervised methods for citation classification (polarity and purpose) and semantically aware relation extraction (e.g. causal inference), both features useful to support information extraction and automated semantic enrichment of scientific texts; – enrichment with new features of the SUMMA document summarization Java library [10]. In particular, SUMMA will be able to support the summarization of scientific texts by relying on citation-based summarization approaches (both sentence and paper assessment based on peers opinion). We will also implement state of the art multi-document summarization customized to scientific papers, based on the extraction and aggregation of relevant sentences across publications in order to automatically create surveys; – implementation of new methodologies for semantic enrichment, interlinking, indexing and navigation of corpora of scientific papers. We will develop Web crawling approaches ... |
5 | D.R.: Purpose and polarity of citation: Towards nlp-based bibliometrics. In:
- Abu-Jbara, Ezra, et al.
- 2013
(Show Context)
Citation Context ...and timeconsuming task. The availability of text mining tools able to extract, aggregate and turn scientific unstructured textual contents into well organized and interconnected knowledge is fundamental. However, scientific publications are characterized by several structural (title, abstract, figures, citations...), linguistic and semantic peculiarities that make them difficult to analyze by relying on general purpose text mining tools. One of the special features of scientific papers is their network of citations, that are starting to be exploited in several context including opinion mining [2, 7] and scientific text summarization [3, 8]. Besides citations, the interpretation of the semantics of the actual textual contents of 1 http://www.ncbi.nlm.nih.gov/pubmed 2 http://www.scopus.com 3 http://www.webofknowledge.com BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries 36 scientific papers usually needs the availability of knowledge repositories with an adequate coverage of scientific concepts and relations that could not be found on global domain knowledge resources like WordNet, DBPedia, FreeBase or BabelNet. Considering both the pec... |
2 |
A multi-layered annotated corpus of scientific papers. In:
- Fisas, Ronzano, et al.
- 2016
(Show Context)
Citation Context ...aluate the scientific production of papers, journals and researchers (i.e. h-index). The effectiveness of citation-based research evaluation metrics would benefit from the possibility to take into account not only the number of citations a paper receives but also the purpose and the polarity of each one of them. Several classification schemata and approaches have been proposed to characterize aspects related to the purpose and polarity of citations [2,12]. By relying on and extending the set of annotated citation included in the Dr. Inventor Multi-Layered Annotated Corpus of Scientific Papers [5]8, we aim at exploring new approaches to citation purpose and polarity classification, by placing special attention on their robustness across domains and on the limited availability of manually annotated data. 4.2 Scientific document summarization Nowadays, the possibility to automatically identify the most relevant contents across a set of scientific publications is essential to deal with and perform screenings of the huge amount of articles currently available on-line. Several approaches to scientific papers summarization have been proposed [3, 8, 11]. Most of them extend general purpose do... |
2 |
Knowledge extraction and modeling from scientific publications. In: Semantics, Analytics, Visualisation:
- Ronzano, Saggion
- 2016
(Show Context)
Citation Context ...tic indexing, search and content aggregation approaches are required in order to fully take advantage of the knowledge exposed by scientific articles. In this context, we present the Scientific Knowledge Miner (SKM) Project. It aims at developing both knowledge resources and a complex scientific knowledge mining infrastructure that will be exploited to support fine-grained semantic analysis and largescale studies of scientific document collections. In the context of the SKM Project, we are going to analyze publications by relying and extending the Dr. Inventor Scientific Text Mining Framework [9] (DRI Framework), a freely available Java-based library. The DRI Framework enables the automated analysis and characterization of several facets of publications including the identification of the scientific discourse category of sentences (Approach, Background, Future Work, etc.), the characterization of the purpose of citations and the annotation of Named Entities that occur inside the textual contents of a paper4. By performing fine-grained semantic analysis of articles and aggregating and merging this information across collections of papers, the scientific literature analysis supported by... |
1 |
Taking advantage of citances: citation scope identification and citation-based summarization. In: Text Analytics Conference
- Ronzano, Saggion
- 2014
(Show Context)
Citation Context ...of text mining tools able to extract, aggregate and turn scientific unstructured textual contents into well organized and interconnected knowledge is fundamental. However, scientific publications are characterized by several structural (title, abstract, figures, citations...), linguistic and semantic peculiarities that make them difficult to analyze by relying on general purpose text mining tools. One of the special features of scientific papers is their network of citations, that are starting to be exploited in several context including opinion mining [2, 7] and scientific text summarization [3, 8]. Besides citations, the interpretation of the semantics of the actual textual contents of 1 http://www.ncbi.nlm.nih.gov/pubmed 2 http://www.scopus.com 3 http://www.webofknowledge.com BIRNDL 2016 Joint Workshop on Bibliometric-enhanced Information Retrieval and NLP for Digital Libraries 36 scientific papers usually needs the availability of knowledge repositories with an adequate coverage of scientific concepts and relations that could not be found on global domain knowledge resources like WordNet, DBPedia, FreeBase or BabelNet. Considering both the peculiar structural and semantic features of... |