Loading...
Self-attentive generative adversarial network-based authorship attribution in historical texts
Authors
Editors
Other contributors
Epub Date
Issue Date
2025
Submitted date
Subjects
authorship attribution
stylometric representation learning
generative adversarial networks
self-attentive generative adversarial networks
textual disguise and deception
reverse authorship attribution technique
digital humanities
historical text analysis
semantic graph modelling in NLP
transformer-based language models
stylometric representation learning
generative adversarial networks
self-attentive generative adversarial networks
textual disguise and deception
reverse authorship attribution technique
digital humanities
historical text analysis
semantic graph modelling in NLP
transformer-based language models
Alternative
Abstract
Authorship Attribution is the task of identifying the author of an unknown text. Given the large number of disputed texts and widespread use of pen names in humanities, especially in historical and literary texts, this study aims to follow a deep-learning-based approach to perform authorship attribution in historical texts.
This thesis introduces a novel approach to authorship attribution in historical texts, addressing the complexities of authorship disguise and deception. By utilising deep learning, specifically a Self-Attentive Generative Adversarial Network (GAN), this research proposes a novel methodology – Reverse Authorship Attribution Technique (RAAT) – to identify and mitigate attempts to hide or mimic authorial style. When authors deliberately hide their identity, the RAAT method generates imposter documents to augment the disguised writing styles, enhancing the model’s ability to detect such hidden authorial styles.
The core contribution of this research is the development of RAAT, which generates training datasets with deceptive and disguised text samples, significantly improving authorship attribution accuracy. To quantify and analyse authorial style, the study introduces StyleQuant, a unified and extensible representation framework for capturing authentic and obfuscated authorial styles, including those produced through disguise or deception, as
generated via RAAT. The results demonstrate that RAAT, in combination with self-attentive GANs and StyleQuant, significantly improves authorship attribution models, making them remarkably robust towards such obfuscation attempts.
The experimental results show that the proposed methods achieve or are close to the state-of-the-art performance on the Project Gutenberg corpus, spanning from the 15th to the 19th-century literary works. The evaluation metrics, including accuracy, precision, recall, and F1 score, demonstrate the effectiveness of RAAT in identifying true authorship, even in the
presence of concealed impostor writing styles. In some experiments, the explicitly generated impostor document quality has been evaluated with similarity comparisons with the original text using metrics such as ROUGE-1, Jaccard Similarity, Overlap Coefficient, and Cosine Similarity. This research contributes to the authorship attribution by its practical relevance within the digital humanities domain, particularly for analysing historical literary texts. The proposed methods are evaluated using curated datasets from the Project Gutenberg corpus, covering works dated between the 15th and 19th centuries, showcasing their applicability to real-world textual analysis. Overall, the thesis establishes a robust framework for identifying true authorship in historical documents and offers a foundation for future studies exploring broader applications of computational authorship attribution in literary scholarship.
Citation
Silva, K. (2025) Self-attentive generative adversarial network-based authorship attribution in historical texts. University of Wolverhampton. https://wlv.openrepository.com/handle/2436/626015
Publisher
Journal
Research Unit
DOI
PubMed ID
PubMed Central ID
Embedded videos
Additional Links
Type
Thesis or dissertation
Language
en
Description
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.
Series/Report no.
ISSN
EISSN
ISBN
ISMN
Gov't Doc #
Sponsors
RIF-4 project Responsible Digital Humanities Lab (RIGHT) at University of Wolverhampton.