Show simple item record

dc.contributor.authorTrijakwanich, Nattapol
dc.contributor.authorLimkonchotiwat, Peerat
dc.contributor.authorSarwar, Raheem
dc.contributor.authorPhatthiyaphaibun, Wannaphong
dc.contributor.authorChuangsuwanich, Ekapol
dc.contributor.authorNutanong, Sarana
dc.contributor.editorMoens, Marie-Francineen
dc.contributor.editorHuan, Xuanjingen
dc.contributor.editorSpecia, Luciaen
dc.contributor.editorYih, Scott Wen-tauen
dc.date.accessioned2021-09-10T09:28:08Z
dc.date.available2021-09-10T09:28:08Z
dc.date.issued2021-11-01
dc.identifier.citationTrijakwanich, N., Limkonchotiwat, P., Sarwar, R., Phatthiyaphaibun, W., Chuangsuwanich, E. and Nutanong, S. (2021) Robust fragment-based framework for cross-lingual sentence retrieval. Findings of the Association for Computational Linguistics: EMNLP 2021. Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih (Editors) : Association for Computational Linguistics.Pp.935–944.en
dc.identifier.urihttp://hdl.handle.net/2436/624330
dc.description© 2021 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://aclanthology.org/2021.findings-emnlp.80en
dc.description.abstractCross-lingual Sentence Retrieval (CLSR) aims at retrieving parallel sentence pairs that are translations of each other from a multilingual set of comparable documents. The retrieved parallel sentence pairs can be used in other downstream NLP tasks such as machine translation and cross-lingual word sense disambiguation. We propose a CLSR framework called Robust Fragment-level Representation (RFR) CLSR framework to address Out-of- Domain (OOD) CLSR problems. In particular, we improve the sentence retrieval robustness by representing each sentence as a collection of fragments. In this way, we change the retrieval granularity from the sentence to the fragment level. We performed CLSR experiments based on three OOD datasets, four language pairs, and three base well-known sentence encoders: m-USE, LASER, and LaBSE. Experimental results show that RFR significantly improves the base encoders’ performance for more than 85% of the cases.en
dc.formatapplication/pdfen
dc.language.isoenen
dc.publisherAssociation for Computational Linguisticsen
dc.relation.urlhttps://aclanthology.org/2021.findings-emnlp.80/en
dc.subjectSentence Retrievalen
dc.subjectCross-lingualen
dc.titleRobust fragment-based framework for cross-lingual sentence retrievalen
dc.typeConference contributionen
dc.date.updated2021-09-09T08:15:47Z
dc.identifier.conferenceConference on Empirical Methods in Natural Language Processingen
dc.conference.nameEmpirical Methods in Natural Language Processing
dc.date.accepted2021-08-26
rioxxterms.funderUniversity of Wolverhamptonen
rioxxterms.identifier.projectUOW10092021RSen
rioxxterms.versionVoRen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0/en
rioxxterms.licenseref.startdate2022-09-10en
dc.source.booktitleFindings of the Association for Computational Linguistics: EMNLP 2021
dc.source.beginpage935
dc.source.endpage944
refterms.dateFCD2021-09-10T09:27:27Z
refterms.versionFCDVoR
refterms.dateFOA2021-11-19T09:58:30Z


Files in this item

Thumbnail
Name:
2021.findings-emnlp.80.pdf
Size:
1001.Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0/