Results 1 -
5 of
5
Abstract Searching with Style: Authorship Attribution in Classic Literature
"... It is a truism of literature that certain authors have a highly recognizable style. The concept of style underlies the authorship attribution techniques that have been applied to tasks such as identifying which of several authors wrote a particular news article. In this paper, we explore whether the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
It is a truism of literature that certain authors have a highly recognizable style. The concept of style underlies the authorship attribution techniques that have been applied to tasks such as identifying which of several authors wrote a particular news article. In this paper, we explore whether the works of authors of classic literature can be correctly identified with either of two approaches to attribution, using a collection of 634 texts by 55 authors. Our results show that these methods can be highly accurate, with errors primarily for authors where it might be argued that style is lacking. And did Marlowe write the works of Shakespeare? Our preliminary evidence suggests not.
Application of Information Retrieval Techniques for Source Code Authorship Attribution
"... Abstract. Authorship attribution assigns works of contentious authorship to their rightful owners solving cases of theft, plagiarism and authorship disputes in academia and industry. In this paper we investigate the application of information retrieval techniques to attribution of authorship of C so ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Authorship attribution assigns works of contentious authorship to their rightful owners solving cases of theft, plagiarism and authorship disputes in academia and industry. In this paper we investigate the application of information retrieval techniques to attribution of authorship of C source code. In particular, we explore novel methods for converting C code into documents suitable for retrieval systems, experimenting with 1,597 student programming assignments. We investigate several possible program derivations, partition attribution results by original program length to measure effectiveness of modest and lengthy programs separately, and evaluate three different methods for interpreting document rankings as authorship attribution. The best of our methods achieves an average of 76.78 % classification accuracy for a one-in-ten classification problem which is competitive against six existing baselines. The techniques that we present can be the basis of practical software to support source code authorship investigations.
Source Code Authorship Attribution using n-grams
"... Plagiarism and copyright infringement are major problems in academic and corporate environments. Existing solutions for detecting infringements in structured text such as source code are restricted to textual similarity comparisons of two pieces of work. In this paper, we examine authorship attribut ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Plagiarism and copyright infringement are major problems in academic and corporate environments. Existing solutions for detecting infringements in structured text such as source code are restricted to textual similarity comparisons of two pieces of work. In this paper, we examine authorship attribution as a means for tackling plagiarism detection. Given several samples of work from several authors, we attempt to correctly identify the author of work presented as a query. On a collection of 1 640 documents written by 100 authors, we show that we can attribute authorship in up to 67 % of cases. This work can be a valuable additional indicator for the more difficult plagiarism investigations.
Authorship Classification: A Syntactic Tree Mining Approach ∗
"... In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators of original authorship. From among the hundreds of features which were developed, syntactic features were best able to refle ..."
Abstract
- Add to MetaCart
In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators of original authorship. From among the hundreds of features which were developed, syntactic features were best able to reflect an author’s writing style. However, due to the high computational complexity of extracting and computing syntactic features, only simple variations of basic syntactic features of function words and part-of-speech tags were considered. In this paper, we propose a novel approach to mining discriminative k-embedded-edge subtree patterns from a given set of syntactic trees that reduces the computational burden of using complex syntactic structures as a feature set. This method is shown to increase the classification accuracy. We also design a new kernel based on these features. Comprehensive experiments on real datasets of news articles and movie reviews demonstrate that our approach is reliable and more accurate than previous studies.
Authorship Classification: A Discriminative Syntactic Tree Mining Approach ∗
"... In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators for original authorship. From among the hundreds of features which were developed, syntactic features were best able to refl ..."
Abstract
- Add to MetaCart
In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators for original authorship. From among the hundreds of features which were developed, syntactic features were best able to reflect an author’s writing style. However, due to the high computational complexity for extracting and computing syntactic features, only simple variations of basic syntactic features such as function words, POS(Part of Speech) tags, and rewrite rules were considered. In this paper, we propose a new feature set of k-embedded-edge subtree patterns that holds more syntactic information than previous feature sets. We also propose a novel approach to directly mining them from a given set of syntactic trees. We show that this approach reduces the computational burden of using complex syntactic structures as the feature set. Comprehensive experiments on real-world datasets demonstrate that our approach is reliable and more accurate than previous studies.

