Show simple item record

dc.contributor.authorSarwar, R
dc.contributor.authorYu, C
dc.contributor.authorTungare, N
dc.contributor.authorChitavisutthivong, K
dc.contributor.authorSriratanawilai, S
dc.contributor.authorXu, Y
dc.contributor.authorChow, D
dc.contributor.authorRakthanmanon, T
dc.contributor.authorNutanong, S
dc.date.accessioned2020-10-12T10:14:26Z
dc.date.available2020-10-12T10:14:26Z
dc.date.issued2018-09-10
dc.identifier.citationSarwar, R., Yu, C., Tungare, N., Chitavisutthivong, K., Sriratanawilai, S., Xu, Y., Chow, D., Rakthanmanon, T. and Nutanong,, S. (2018) An effective and scalable framework for authorship attribution query processing, IEEE Access, 6, pp. 50030-50048.en
dc.identifier.issn2169-3536en
dc.identifier.doi10.1109/ACCESS.2018.2869198en
dc.identifier.urihttp://hdl.handle.net/2436/623705
dc.description© 2018 The Authors. Published by IEEE. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://ieeexplore.ieee.org/document/8457490en
dc.description.abstractAuthorship attribution aims at identifying the original author of an anonymous text from a given set of candidate authors and has a wide range of applications. The main challenge in authorship attribution problem is that the real-world applications tend to have hundreds of authors, while each author may have a small number of text samples, e.g., 5-10 texts/author. As a result, building a predictive model that can accurately identify the author of an anonymous text is a challenging task. In fact, existing authorship attribution solutions based on long text focus on application scenarios, where the number of candidate authors is limited to 50. These solutions generally report a significant performance reduction as the number of authors increases. To overcome this challenge, we propose a novel data representation model that captures stylistic variations within each document, which transforms the problem of authorship attribution into a similarity search problem. Based on this data representation model, we also propose a similarity query processing technique that can effectively handle outliers. We assess the accuracy of our proposed method against the state-of-the-art authorship attribution methods using real-world data sets extracted from Project Gutenberg. Our data set contains 3000 novels from 500 authors. Experimental results from this paper show that our method significantly outperforms all competitors. Specifically, as for the closed-set and open-set authorship attribution problems, our method have achieved higher than 95% accuracy.en
dc.description.sponsorshipThis work was supported by the CityU Project under Grant 7200387 and Grant 6000511.en
dc.formatapplication/pdfen
dc.language.isoenen
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en
dc.relation.urlhttps://ieeexplore.ieee.org/document/8457490en
dc.subjectlarge scale databaseen
dc.subjectsimilarity searchen
dc.subjectquery processingen
dc.subjectstylometryen
dc.titleAn effective and scalable framework for authorship attribution query processingen
dc.typeJournal articleen
dc.identifier.eissn2169-3536
dc.identifier.journalIEEE Accessen
dc.date.updated2020-10-07T17:10:58Z
dc.date.accepted2018-08-28
rioxxterms.funderCityU Projecten
rioxxterms.identifier.project7200387en
rioxxterms.identifier.project6000511en
rioxxterms.versionVoRen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0/en
rioxxterms.licenseref.startdate2020-10-12en
dc.source.volume6
dc.source.beginpage50030
dc.source.endpage50048
dc.description.versionPublished version
refterms.dateFCD2020-10-12T10:12:54Z
refterms.versionFCDVoR
refterms.dateFOA2020-10-12T10:14:27Z


Files in this item

Thumbnail
Name:
IEEE ACCESS 1.pdf
Size:
11.17Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0/