Show simple item record

dc.contributor.authorShardlow, Matthew
dc.contributor.authorEvans, Richard
dc.contributor.authorZampieri, Marcos
dc.date.accessioned2022-04-08T10:02:03Z
dc.date.available2022-04-08T10:02:03Z
dc.date.issued2022-03-23
dc.identifier.citationShardlow, M., Evans, R. & Zampieri, M. (2022) Predicting lexical complexity in English texts: the Complex 2.0 dataset. Lang Resources & Evaluation. https://doi.org/10.1007/s10579-022-09588-2en
dc.identifier.issn1574-020Xen
dc.identifier.doi10.1007/s10579-022-09588-2en
dc.identifier.urihttp://hdl.handle.net/2436/624697
dc.description© 2022 The Authors. Published by Springer. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1007/s10579-022-09588-2en
dc.description.abstractIdentifying words which may cause difficulty for a reader is an essential step in most lexical text simplification systems prior to lexical substitution and can also be used for assessing the readability of a text. This task is commonly referred to as complex word identification (CWI) and is often modelled as a supervised classification problem. For training such systems, annotated datasets in which words and sometimes multi-word expressions are labelled regarding complexity are required. In this paper we analyze previous work carried out in this task and investigate the properties of CWI datasets for English. We develop a protocol for the annotation of lexical complexity and use this to annotate a new dataset, CompLex 2.0. We present experiments using both new and old datasets to investigate the nature of lexical complexity. We found that a Likert-scale annotation protocol provides an objective setting that is superior for identifying the complexity of words compared to a binary annotation protocol. We release a new dataset using our new protocol to promote the task of Lexical Complexity Prediction.en
dc.formatapplication/pdfen
dc.languageEnglish
dc.language.isoenen
dc.publisherSpringeren
dc.relation.urlhttps://link.springer.com/article/10.1007/s10579-022-09588-2en
dc.subjectcomplex word identificationen
dc.subjectlexical complexityen
dc.subjecttext simplificationen
dc.titlePredicting lexical complexity in English texts: the Complex 2.0 dataseten
dc.typeJournal articleen
dc.identifier.journalLanguage Resources and Evaluationen
dc.date.updated2022-04-07T11:11:07Z
dc.date.accepted2022-03-07
rioxxterms.funderUniversity of Wolverhamptonen
rioxxterms.identifier.projectUOW08042022REen
rioxxterms.versionVoRen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0/en
rioxxterms.licenseref.startdate2022-04-08en
refterms.dateFCD2022-04-08T10:00:50Z
refterms.versionFCDVoR
refterms.dateFOA2022-04-08T10:02:04Z


Files in this item

Thumbnail
Name:
Shardlow_Predicting_Lexical_Co ...
Size:
1.017Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0/