Show simple item record

dc.contributor.authorMohamed, Emad
dc.contributor.authorSarwar, Raheem
dc.date.accessioned2021-09-09T15:54:45Z
dc.date.available2021-09-09T15:54:45Z
dc.date.issued2021-11-13
dc.identifier.citationMohamed, E. and Sarwar, R. (2022) Linguistic features evaluation for hadith authenticity through automatic machine learning. Digital Scholarship in the Humanities, 37(3), pp.830-843, https://doi.org/10.1093/llc/fqab092en
dc.identifier.issn2055-7671en
dc.identifier.doi10.1093/llc/fqab092
dc.identifier.urihttp://hdl.handle.net/2436/624329
dc.descriptionThis is an accepted manuscript of an article published by OUP in Digital Scholarship in the Humanities on 13/11/2021. The accepted version of the publication may differ from the final published version.en
dc.description.abstractThere has not been any research that provides an evaluation of the linguistic features extracted from the matn (text) of a Hadith. Moreover, none of the fairly large corpora are publicly available as a benchmark corpus for Hadith authenticity, and there is a need to build a “gold standard” corpus for good practices in Hadith authentication. We write a scraper in Python programming language and collect a corpus of 3651 authentic prophetic traditions and 3593 fake ones. We process the corpora with morphological segmentation and perform extensive experimental studies using a variety of machine learning algorithms, mainly through Automatic Machine Learning, to distinguish between these two categories. With a feature set including words, morphological segments, characters, top N words, top N segments, function words and several vocabulary richness features, we analyse the results in terms of both prediction and interpretability to explain which features are more characteristic of each class. Many experiments have produced good results and the highest accuracy (i.e., 78.28%) is achieved using word n-grams as features using the Multinomial Naive Bayes classifier. Our extensive experimental studies conclude that, at least for Digital Humanities, feature engineering may still be desirable due to the high interpretability of the features. The corpus and software (scripts) will be made publicly available to other researchers in an effort to promote progress and replicability.en
dc.formatapplication/pdfen
dc.language.isoenen
dc.publisherOxford University Pressen
dc.relation.urlhttps://academic.oup.com/dsh/article-abstract/37/3/830/6427308?redirectedFrom=fulltexten
dc.subjectfeatures evaluationen
dc.subjecthadith authenticityen
dc.titleLinguistic features evaluation for hadith authenticity through automatic machine learningen
dc.typeJournal articleen
dc.identifier.journalDigital Scholarship in the Humanitiesen
dc.date.updated2021-09-08T20:21:09Z
dc.date.accepted2021-09-08
rioxxterms.funderUniversity of wolverhamptonen
rioxxterms.identifier.projectUOW09092021RSen
rioxxterms.versionAMen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/en
rioxxterms.licenseref.startdate2023-11-13en
dc.source.volume37
dc.source.issue3
dc.source.beginpage830
dc.source.endpage843
refterms.dateFCD2021-09-09T15:50:06Z
refterms.versionFCDAM


Files in this item

Thumbnail
Name:
Mohamed_Sarwar_Linguistic_feat ...
Size:
408.0Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by-nc-nd/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/