Abstract
This paper presents a corpus of 43,985 clinical patient notes (PNs) written by 35,156 examinees during the high-stakes USMLE® Step 2 Clinical Skills examination. In this exam, examinees interact with standardized patients - people trained to portray simulated scenarios called clinical cases. For each encounter, an examinee writes a PN, which is then scored by physician raters using a rubric of clinical concepts, expressions of which should be present in the PN. The corpus features PNs from 10 clinical cases, as well as the clinical concepts from the case rubrics. A subset of 2,840 PNs were annotated by 10 physician experts such that all 143 concepts from the case rubrics (e.g., shortness of breath) were mapped to 34,660 PN phrases (e.g., dyspnea, difficulty breathing). The corpus is available via a data sharing agreement with NBME and can be requested at https://www.nbme.org/services/data-sharing.Citation
Yaneva, V., Mee, J., Ha, L.A., Harik, P., Jodoin, M. and Mechaber, A. (2022) The USMLE® Step 2 Clinical Skills Patient Note Corpus. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2880–2886, Seattle, United States. Association for Computational Linguistics.Additional Links
https://aclanthology.org/2022.naacl-main.208/Type
Conference contributionLanguage
enDescription
© 2022 The Authors. Published by ACL. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://aclanthology.org/2022.naacl-main.208ISBN
9781955917711ae974a485f413a2113503eed53cd6c53
10.18653/v1/2022.naacl-main.208
Scopus Count
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0/