Loading...
Thumbnail Image
Item

Can ChatGPT evaluate research environments? Evidence from REF2021

Thelwall, Mike
Gadd, Elizabeth
Alternative
Abstract
UK academic departments are evaluated partly on the statements that they write about the value of their research environments for the Research Excellence Framework (REF) periodic assessments. These statements mix qualitative narratives and quantitative data, typically requiring time-consuming and difficult expert judgements to assess. This article investigates whether Large Language Models (LLMs) can support the process or validate the results, using the UK REF2021 unit-level environment statements as a test case. Based on prompts mimicking the REF guidelines, ChatGPT-4o mini scores correlated positively with expert scores in almost all 34 (field-based) Units of Assessment (UoAs). ChatGPT’s scores had moderate to strong positive Spearman correlations with REF expert scores in 32 out of 34 UoAs: 14 UoAs above 0.7 and a further 13 between 0.6 and 0.7. Only two UoAs had weak or no significant associations (Classics and Clinical Medicine). From further tests for UoA34, multiple LLMs had significant positive correlations with REF2021 environment scores (all p < .001), with ChatGPT-5 performing best (r=0.81; ρ=0.82), followed by ChatGPT-4o mini (r=0.68; ρ=0.67) and Gemini Flash 2.5 (r=0.67; ρ=0.69). If LLM-generated scores for environment statements are used in future to help reduce workload, support more consistent interpretation, and complement human review, where acceptable, then caution must be exercised because of the potential for biases, inaccuracy in some cases, and unwanted systemic effects. Even the strong correlations found here seem unlikely to be judged close enough to expert scores to fully delegate the assessment task to LLMs.
Citation
Kousha, K., Thelwall, M., Gadd, E. (in press) Can ChatGPT evaluate research environments? Evidence from REF2021. Scientometrics.
Publisher
Research Unit
DOI
PubMed ID
PubMed Central ID
Embedded videos
Type
Journal article
Language
en
Description
This is an author's accepted manuscript of an article due to be published by Springer. The published article can be viewed here: [insert link]. For re-use please see Springer's terms and conditions.
Series/Report no.
ISSN
0138-9130
EISSN
1588-2861
ISBN
ISMN
Gov't Doc #
Sponsors
Mike Thelwall is funded by the Economic and Social Research Council (ESRC), UK (APP43146).
Rights
Research Projects
Organizational Units
Journal Issue
Embedded videos