MetadataShow full item record
AbstractText Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.
CitationŠtajner S., Mitkov R., Corpas Pastor G. (2015) Simple or Not Simple? A Readability Question. In: Gala N., Rapp R., Bel-Enguix G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham
TypeChapter in book
DescriptionChapter from Language Production, Cognition, and the Lexicon, edited by Núria Gala, Reinhard Rapp and Gemma Bel-Enguix, Part of the Text, Speech and Language Technology book series (TLTB, volume 48)
SponsorsUniversity of Malaga and European Commission, TRADICOR (Ref. no.: PIE 13-054), EXPERT (Ref. no.: 317471-FP7-PEOPLE-2012-ITN), LATEST (Ref: 327197-FP7-PEOPLE-2012-IEF) and FIRST (Ref. no.: 287607-FP7-ICT-2011-7).
The following licence applies to the copyright and re-use of this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States