Item

Simple or not simple? A readability question

Sjaner, Sanja
Mitkov, Ruslan
Corpas Pastor, Gloria
Alternative
Abstract
Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.
Citation
Štajner S., Mitkov R., Corpas Pastor G. (2015) Simple or Not Simple? A Readability Question. In: Gala N., Rapp R., Bel-Enguix G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham
Publisher
Journal
Research Unit
DOI
PubMed ID
PubMed Central ID
Embedded videos
Type
Chapter in book
Language
en
Description
Chapter from Language Production, Cognition, and the Lexicon, edited by Núria Gala, Reinhard Rapp and Gemma Bel-Enguix, Part of the Text, Speech and Language Technology book series (TLTB, volume 48)
Series/Report no.
ISSN
EISSN
ISBN
9783319080437
ISMN
Gov't Doc #
Sponsors
University of Malaga and European Commission, TRADICOR (Ref. no.: PIE 13-054), EXPERT (Ref. no.: 317471-FP7-PEOPLE-2012-ITN), LATEST (Ref: 327197-FP7-PEOPLE-2012-IEF) and FIRST (Ref. no.: 287607-FP7-ICT-2011-7).
Rights
Attribution-NonCommercial-NoDerivs 3.0 United States
Research Projects
Organizational Units
Journal Issue
Embedded videos
Collections