Loading...
Thumbnail Image
Item

CAG : stylometric authorship attribution of multi-author documents using a co-authorship graph

Sarwar, R
Urailertprasert, N
Vannaboot, N
Yu, C
Rakthanmanon, T
Chuangsuwanich, E
Nutanong, S
Alternative
Abstract
Stylometry has been successfully applied to perform authorship identification of single-author documents (AISD). The AISD task is concerned with identifying the original author of an anonymous document from a group of candidate authors. However, AISD techniques are not applicable to the authorship identification of multi-author documents (AIMD). Unlike AISD, where each document is written by one single author, AIMD focuses on handling multi-author documents. Due to the combinatoric nature of documents, AIMD lacks the ground truth information - that is, information on writing and non-writing authors in a multi-author document - which makes this problem more challenging to solve. Previous AIMD solutions have a number of limitations: (i) the best stylometry-based AIMD solution has a low accuracy, less than 30%; (ii) increasing the number of co-authors of papers adversely affects the performance of AIMD solutions; and (iii) AIMD solutions were not designed to handle the non-writing authors (NWAs). However, NWAs exist in real-world cases - that is, there are papers for which not every co-author listed has contributed as a writer. This paper proposes an AIMD framework called the Co-Authorship Graph that can be used to (i) capture the stylistic information of each author in a corpus of multi-author documents and (ii) make a multi-label prediction for a multi-author query document. We conducted extensive experimental studies on one synthetic and three real-world corpora. Experimental results show that our proposed framework (i) significantly outperformed competitive techniques; (ii) can effectively handle a larger number of co-authors in comparison with competitive techniques; and (iii) can effectively handle NWAs in multi-author documents.
Citation
Sarwar, R., Urailertprasert, N., Vannaboot, N. et al. (2020) CAG : stylometric authorship attribution of multi-author documents using a co-authorship graph, IEEE Access, 8, pp. 18374 - 18393. 10.1109/ACCESS.2020.2967449
Research Unit
PubMed ID
PubMed Central ID
Embedded videos
Type
Journal article
Language
en
Description
© 2020 The Authors. Published by IEEE. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://ieeexplore.ieee.org/document/8962080
Series/Report no.
ISSN
2169-3536
EISSN
2169-3536
ISBN
ISMN
Gov't Doc #
Sponsors
This work was supported in part by the Digital Economy Promotion Agency under Project MP-62-0003, and in part by the Thailand Research Fund and Office of the Higher Education Commission under Grant MRG6180266.
Rights
Research Projects
Organizational Units
Journal Issue
Embedded videos