Loading...
Thumbnail Image
Item

How much are LLMs changing the language of academic papers after ChatGPT? A multi-database and full text analysis

Alternative
Abstract
This study investigates how Large Language Models (LLMs) are influencing the language of academic papers by tracking 12 LLM-associated terms across six major scholarly databases (Scopus, Web of Science, PubMed, PubMed Central (PMC), Dimensions, and OpenAlex) from 2015 to 2024. Using over 2.4 million PMC open-access publications (2021–July 2025), we also analysed full texts to assess changes in the frequency and co-occurrence of these terms before and after ChatGPT’s initial public release. Across databases, delve (+1,500%), underscore (+1,000%), and intricate (+700%) had the largest increases between 2022 and 2024. Growth in LLM-term usage was much higher in STEM fields than in social sciences and arts and humanities. In PMC full texts, the proportion of papers using underscore six or more times increased by over 10,000% from 2022 to 2025, followed by intricate (+5,400%) and meticulous (+2,800%). Nearly half of all 2024 PMC papers using any LLM term also included underscore, compared with only 3%–14% of papers before ChatGPT in 2022. Papers using one LLM term are now much more likely to include other terms. For example, in 2024, underscore strongly correlated with pivotal (0.449) and delve (0.311), compared with very weak associations in 2022 (0.032 and 0.018, respectively). These findings provide the first large-scale evidence based on full-text publications and multiple databases that some LLM-associated terms are now being used much more frequently and together in academic writing. However, the results do not provide direct causal evidence and cannot distinguish between LLM-generated text, LLM-edited text, or broader adoption of LLM-associated writing or publishing styles. The rapid uptake of LLMs to support scholarly publishing is a welcome development reducing the language barrier to academic publishing for non-English speakers.
Citation
Kousha, K., Thelwall, M. (in press) How much are LLMs changing the language of academic papers after ChatGPT? A multi-database and full text analysis, Scientometrics.
Research Unit
DOI
PubMed ID
PubMed Central ID
Embedded videos
Type
Journal article
Language
en
Description
This is an accepted manuscript of an article due to be published by Springer Nature in Scientometrics on [dd/mm/yyyy], available online: [link to online copy] The accepted version of the publication may differ from the final published version. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/[insert DOI]
Series/Report no.
ISSN
0138-9130
EISSN
1588-2861
ISBN
ISMN
Gov't Doc #
Sponsors
No funding was provided for this study.
Rights
Research Projects
Organizational Units
Journal Issue
Embedded videos