Robust fragment-based framework for cross-lingual sentence retrieval
Authors
Trijakwanich, NattapolLimkonchotiwat, Peerat
Sarwar, Raheem
Phatthiyaphaibun, Wannaphong
Chuangsuwanich, Ekapol
Nutanong, Sarana
Editors
Moens, Marie-FrancineHuan, Xuanjing
Specia, Lucia
Yih, Scott Wen-tau
Issue Date
2021-11-01
Metadata
Show full item recordAbstract
Cross-lingual Sentence Retrieval (CLSR) aims at retrieving parallel sentence pairs that are translations of each other from a multilingual set of comparable documents. The retrieved parallel sentence pairs can be used in other downstream NLP tasks such as machine translation and cross-lingual word sense disambiguation. We propose a CLSR framework called Robust Fragment-level Representation (RFR) CLSR framework to address Out-of- Domain (OOD) CLSR problems. In particular, we improve the sentence retrieval robustness by representing each sentence as a collection of fragments. In this way, we change the retrieval granularity from the sentence to the fragment level. We performed CLSR experiments based on three OOD datasets, four language pairs, and three base well-known sentence encoders: m-USE, LASER, and LaBSE. Experimental results show that RFR significantly improves the base encoders’ performance for more than 85% of the cases.Citation
Trijakwanich, N., Limkonchotiwat, P., Sarwar, R., Phatthiyaphaibun, W., Chuangsuwanich, E. and Nutanong, S. (2021) Robust fragment-based framework for cross-lingual sentence retrieval. Findings of the Association for Computational Linguistics: EMNLP 2021. Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih (Editors) : Association for Computational Linguistics.Pp.935–944.Additional Links
https://aclanthology.org/2021.findings-emnlp.80/Type
Conference contributionLanguage
enDescription
© 2021 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://aclanthology.org/2021.findings-emnlp.80
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0/