A scalable framework for stylometric analysis query processing
dc.contributor.author | Nutanong, Sarana | |
dc.contributor.author | Yu, Chenyun | |
dc.contributor.author | Sarwar, Raheem | |
dc.contributor.author | Xu, Peter | |
dc.contributor.author | Chow, Dickson | |
dc.date.accessioned | 2020-10-12T09:38:13Z | |
dc.date.available | 2020-10-12T09:38:13Z | |
dc.date.issued | 2017-02-02 | |
dc.identifier.citation | Nutanong, S., Yu, C., Sarwar, R., Xu, P. and Chow, D. (2016) A scalable framework for stylometric analysis query processing, 2016 IEEE 16th International Conference on Data Mining (ICDM). 10.1109/ICDM.2016.0147 | en |
dc.identifier.issn | 2374-8486 | en |
dc.identifier.doi | 10.1109/icdm.2016.0147 | en |
dc.identifier.uri | http://hdl.handle.net/2436/623704 | |
dc.description | This is an accepted manuscript of an article published by IEEE in 2016 IEEE 16th International Conference on Data Mining (ICDM) on 02/02/2017, available online: https://ieeexplore.ieee.org/document/7837960 The accepted version of the publication may differ from the final published version. | en |
dc.description.abstract | Stylometry is the statistical analyses of variationsin the author's literary style. The technique has been used inmany linguistic analysis applications, such as, author profiling, authorship identification, and authorship verification. Over thepast two decades, authorship identification has been extensivelystudied by researchers in the area of natural language processing. However, these studies are generally limited to (i) a small number of candidate authors, and (ii) documents with similar lengths. In this paper, we propose a novel solution by modeling authorship attribution as a set similarity problem to overcome the two stated limitations. We conducted extensive experimental studies on a real dataset collected from an online book archive, Project Gutenberg. Experimental results show that in comparison to existing stylometry studies, our proposed solution can handlea larger number of documents of different lengths written by alarger pool of candidate authors with a high accuracy. | en |
dc.format | application/pdf | en |
dc.language.iso | en | en |
dc.publisher | IEEE | en |
dc.relation.url | https://ieeexplore.ieee.org/document/7837960 | en |
dc.subject | stylometry | en |
dc.title | A scalable framework for stylometric analysis query processing | en |
dc.type | Conference contribution | en |
dc.identifier.journal | 2016 IEEE 16th International Conference on Data Mining (ICDM) | en |
dc.date.updated | 2020-10-07T19:39:16Z | |
dc.conference.name | 2016 IEEE 16th International Conference on Data Mining (ICDM) | |
pubs.finish-date | 2016-12-15 | |
pubs.start-date | 2016-12-12 | |
dc.date.accepted | 2016-10-09 | |
rioxxterms.funder | City University of Hong Kong | en |
rioxxterms.identifier.project | 7200387 | en |
rioxxterms.identifier.project | 6000511 | en |
rioxxterms.version | AM | en |
rioxxterms.licenseref.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | en |
rioxxterms.licenseref.startdate | 2020-10-12 | en |
dc.description.version | Published version | |
refterms.dateFCD | 2020-10-12T09:35:14Z | |
refterms.versionFCD | AM | |
refterms.dateFOA | 2020-10-12T09:38:13Z |