Predicting the difficulty of multiple choice questions in a high-stakes medical exam
Abstract
Predicting the construct-relevant difficulty of Multiple-Choice Questions (MCQs) has the potential to reduce cost while maintaining the quality of high-stakes exams. In this paper, we propose a method for estimating the difficulty of MCQs from a high-stakes medical exam, where all questions were deliberately written to a common reading level. To accomplish this, we extract a large number of linguistic features and embedding types, as well as features quantifying the difficulty of the items for an automatic question-answering system. The results show that the proposed approach outperforms various baselines with a statistically significant difference. Best results were achieved when using the full feature set, where embeddings had the highest predictive power, followed by linguistic features. An ablation study of the various types of linguistic features suggested that information from all levels of linguistic processing contributes to predicting item difficulty, with features related to semantic ambiguity and the psycholinguistic properties of words having a slightly higher importance. Owing to its generic nature, the presented approach has the potential to generalize over other exams containing MCQs.Citation
Ha, L. A., Yaneva, V., Baldwin, P. and Mee, J. (2019) Predicting the difficulty of multiple choice questions in a high-stakes medical exam, Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. Florence, Italy: Association for Computational Linguistics, pp. 11–20.Additional Links
https://aclweb.org/anthology/papers/W/W19/W19-4402/Type
Conference contributionLanguage
enISBN
9781950737345
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0/