Combining Multiple Corpora for Readability Assessment for People with Cognitive Disabilities
Average rating
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Star rating
Your vote was cast
Thank you for your feedback
Thank you for your feedback
Issue Date
2017-09-08
Metadata
Show full item recordAbstract
Given the lack of large user-evaluated corpora in disability-related NLP research (e.g. text simplification or readability assessment for people with cognitive disabilities), the question of choosing suitable training data for NLP models is not straightforward. The use of large generic corpora may be problematic because such data may not reflect the needs of the target population. At the same time, the available user-evaluated corpora are not large enough to be used as training data. In this paper we explore a third approach, in which a large generic corpus is combined with a smaller population-specific corpus to train a classifier which is evaluated using two sets of unseen user-evaluated data. One of these sets, the ASD Comprehension corpus, is developed for the purposes of this study and made freely available. We explore the effects of the size and type of the training data used on the performance of the classifiers, and the effects of the type of the unseen test datasets on the classification performance.Additional Links
http://www.cs.rochester.edu/~tetreaul/bea12.htmlType
Conference contributionLanguage
enDescription
The 12th Workshop on Innovative Use of NLP for Building Educational Applications, 8th September 2017 Copenhagen, Denmark.The following licence applies to the copyright and re-use of this item:
- Creative Commons
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/