Loading...
Thumbnail Image
Item

A scalable framework for stylometric analysis query processing

Nutanong, Sarana
Yu, Chenyun
Sarwar, Raheem
Xu, Peter
Chow, Dickson
Editors
Other contributors
Affiliation
Epub Date
Issue Date
2017-02-02
Submitted date
Subjects
Alternative
Abstract
Stylometry is the statistical analyses of variationsin the author's literary style. The technique has been used inmany linguistic analysis applications, such as, author profiling, authorship identification, and authorship verification. Over thepast two decades, authorship identification has been extensivelystudied by researchers in the area of natural language processing. However, these studies are generally limited to (i) a small number of candidate authors, and (ii) documents with similar lengths. In this paper, we propose a novel solution by modeling authorship attribution as a set similarity problem to overcome the two stated limitations. We conducted extensive experimental studies on a real dataset collected from an online book archive, Project Gutenberg. Experimental results show that in comparison to existing stylometry studies, our proposed solution can handlea larger number of documents of different lengths written by alarger pool of candidate authors with a high accuracy.
Citation
Nutanong, S., Yu, C., Sarwar, R., Xu, P. and Chow, D. (2016) A scalable framework for stylometric analysis query processing, 2016 IEEE 16th International Conference on Data Mining (ICDM). 10.1109/ICDM.2016.0147
Publisher
Journal
Research Unit
PubMed ID
PubMed Central ID
Embedded videos
Type
Conference contribution
Language
en
Description
This is an accepted manuscript of an article published by IEEE in 2016 IEEE 16th International Conference on Data Mining (ICDM) on 02/02/2017, available online: https://ieeexplore.ieee.org/document/7837960 The accepted version of the publication may differ from the final published version.
Series/Report no.
ISSN
2374-8486
EISSN
ISBN
ISMN
Gov't Doc #
Sponsors
Rights
Research Projects
Organizational Units
Journal Issue
Embedded videos