Loading...
Thumbnail Image
Item

Robust fragment-based framework for cross-lingual sentence retrieval

Trijakwanich, Nattapol
Limkonchotiwat, Peerat
Sarwar, Raheem
Phatthiyaphaibun, Wannaphong
Chuangsuwanich, Ekapol
Nutanong, Sarana
Alternative
Abstract
Cross-lingual Sentence Retrieval (CLSR) aims at retrieving parallel sentence pairs that are translations of each other from a multilingual set of comparable documents. The retrieved parallel sentence pairs can be used in other downstream NLP tasks such as machine translation and cross-lingual word sense disambiguation. We propose a CLSR framework called Robust Fragment-level Representation (RFR) CLSR framework to address Out-of- Domain (OOD) CLSR problems. In particular, we improve the sentence retrieval robustness by representing each sentence as a collection of fragments. In this way, we change the retrieval granularity from the sentence to the fragment level. We performed CLSR experiments based on three OOD datasets, four language pairs, and three base well-known sentence encoders: m-USE, LASER, and LaBSE. Experimental results show that RFR significantly improves the base encoders’ performance for more than 85% of the cases.
Citation
Trijakwanich, N., Limkonchotiwat, P., Sarwar, R., Phatthiyaphaibun, W., Chuangsuwanich, E. and Nutanong, S. (2021) Robust fragment-based framework for cross-lingual sentence retrieval. Findings of the Association for Computational Linguistics: EMNLP 2021. Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih (Editors) : Association for Computational Linguistics.Pp.935–944.
Journal
Research Unit
DOI
PubMed ID
PubMed Central ID
Embedded videos
Type
Conference contribution
Language
en
Description
© 2021 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://aclanthology.org/2021.findings-emnlp.80
Series/Report no.
ISSN
EISSN
ISBN
ISMN
Gov't Doc #
Sponsors
Rights
Research Projects
Organizational Units
Journal Issue
Embedded videos