Bridging the “gApp”: improving neural machine translation systems for multiword expression detection
Abstract
The present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems.Citation
Hidalgo-Ternero, C. and Corpas Pastor, G. (2020) Bridging the “gApp”: improving neural machine translation systems for multiword expression detection, Yearbook of Phraseology, 11(1), pp. 61–80. DOI: https://doi.org/10.1515/phras-2020-0005Publisher
Walter de Gruyter GmbHJournal
Yearbook of PhraseologyAdditional Links
https://www.degruyter.com/view/journals/yop/11/1/article-p61.xmlType
Journal articleLanguage
enDescription
This is the published version of an article published by De Gruyter in Yearbook of Phraseology on 25/11/2020, available online: https://doi.org/10.1515/phras-2020-0005ISSN
1868-632XEISSN
1868-6338ae974a485f413a2113503eed53cd6c53
10.1515/phras-2020-0005
Scopus Count
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/