Show simple item record

dc.contributor.authorSarwar, Raheem
dc.contributor.authorHassan, Saeed-Ul
dc.date.accessioned2021-07-21T14:08:09Z
dc.date.available2021-07-21T14:08:09Z
dc.date.issued2021-12-31
dc.identifier.issn2375-4699en
dc.identifier.urihttp://hdl.handle.net/2436/624212
dc.descriptionThis is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing (in press). The accepted version of the publication may differ from the final published version.en
dc.description.abstractThe authorship identification task aims at identifying the original author of an anonymous text sample from a set of candidate authors. It has several application domains such as digital text forensics and information retrieval. These application domains are not limited to a specific language. However, most of the authorship identification studies are focused on English and limited attention has been paid to Urdu. On the other hand, existing Urdu authorship identification solutions drop accuracy as the number of training samples per candidate author reduces, and when the number of candidate author increases. Consequently, these solutions are inapplicable to real-world cases. To overcome these limitations, we formulate a stylometric feature space. Based on this feature space we use an authorship identification solution that transforms each text sample into point set, retrieves candidate text samples, and relies the nearest neighbour classifier to predict the original author of the anonymous text sample. To evaluate our method, we create a significantly larger corpus than existing studies and conduct several experimental studies which show that our solution can overcome the limitations of existing studies and report an accuracy level of 94.03%, which is higher than all previous authorship identification works.en
dc.formatapplication/pdfen
dc.language.isoenen
dc.publisherAssociation for Computing Machineryen
dc.relation.urlhttps://dl.acm.org/toc/tallip/2021/20/3en
dc.subjectstylometryen
dc.subjectUrduen
dc.subjectauthorship attributionen
dc.titleUrdu AI: writeprints for Urdu authorship identificationen
dc.typeJournal articleen
dc.identifier.journalACM Transactions on Asian and Low-Resource Language Information Processingen
dc.date.updated2021-07-15T11:53:12Z
dc.date.accepted2021-07-14
rioxxterms.funderUniversity of Wolverhamptonen
rioxxterms.identifier.projectUOW21072021RSen
rioxxterms.versionAMen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/en
rioxxterms.licenseref.startdate2021-12-31en
refterms.dateFCD2021-07-21T14:02:57Z
refterms.versionFCDAM
refterms.dateFOA2021-07-21T14:08:09Z


Files in this item

Thumbnail
Name:
Sarwar_Urdu_AI_2021.pdf
Embargo:
2021-12-31
Size:
566.5Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by-nc-nd/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/