Bridging the “gApp”: improving neural machine translation systems for multiword expression detection
dc.contributor.author | Hidalgo-Ternero, Carlos Manuel | |
dc.contributor.author | Pastor, Gloria Corpas | |
dc.date.accessioned | 2021-01-14T09:57:03Z | |
dc.date.available | 2021-01-14T09:57:03Z | |
dc.date.issued | 2020-11-25 | |
dc.identifier.citation | Hidalgo-Ternero, C. and Corpas Pastor, G. (2020) Bridging the “gApp”: improving neural machine translation systems for multiword expression detection, Yearbook of Phraseology, 11(1), pp. 61–80. DOI: https://doi.org/10.1515/phras-2020-0005 | en |
dc.identifier.issn | 1868-632X | en |
dc.identifier.doi | 10.1515/phras-2020-0005 | en |
dc.identifier.uri | http://hdl.handle.net/2436/623878 | |
dc.description | This is the published version of an article published by De Gruyter in Yearbook of Phraseology on 25/11/2020, available online: https://doi.org/10.1515/phras-2020-0005 | en |
dc.description.abstract | The present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems. | en |
dc.format | application/pdf | en |
dc.language.iso | en | en |
dc.publisher | Walter de Gruyter GmbH | en |
dc.relation.url | https://www.degruyter.com/view/journals/yop/11/1/article-p61.xml | en |
dc.subject | text preprocessing system | en |
dc.subject | neural machine translation (NMT) | en |
dc.subject | multiword expression (MWE) | en |
dc.subject | verb–noun idiomatic combinations (VNICs) | en |
dc.subject | discontinuity | en |
dc.title | Bridging the “gApp”: improving neural machine translation systems for multiword expression detection | en |
dc.type | Journal article | en |
dc.identifier.eissn | 1868-6338 | |
dc.identifier.journal | Yearbook of Phraseology | en |
dc.date.updated | 2021-01-11T09:20:34Z | |
rioxxterms.funder | Spanish Ministry of Education | en |
rioxxterms.identifier.project | FPU16/02032 | en |
rioxxterms.version | VoR | en |
rioxxterms.licenseref.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | en |
rioxxterms.licenseref.startdate | 2021-11-25 | en |
dc.source.volume | 11 | |
dc.source.issue | 1 | |
dc.source.beginpage | 61 | |
dc.source.endpage | 80 | |
dc.description.version | Published version | |
refterms.dateFCD | 2021-01-14T09:56:09Z | |
refterms.versionFCD | VoR |