Show simple item record

dc.contributor.authorGajbhiye, Amit
dc.contributor.authorFomicheva, Marina
dc.contributor.authorAlva-Manchego, Fernando
dc.contributor.authorBlain, Frederic
dc.contributor.authorObamuyide, Abiola
dc.contributor.authorAletras, Nikolaos
dc.contributor.authorSpecia, Lucia
dc.date.accessioned2021-06-08T09:45:56Z
dc.date.available2021-06-08T09:45:56Z
dc.date.issued2021-12-31
dc.identifier.citationGajbhiye, A., Fomicheva, M., Alva-Manchego, F., Blain, F., Obamuyide, A., Aletras, N. and Specia, L. (in press) Knowledge distillation for quality estimation. The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), 2nd-4th August, 2021. Online.en
dc.identifier.urihttp://hdl.handle.net/2436/624102
dc.descriptionThis is an accepted manuscript of an article due to be published by ACL in The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021). The accepted version of the publication may differ from the final published version.en
dc.description.abstractQuality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.en
dc.formatapplication/pdfen
dc.language.isoenen
dc.publisherAssociation for Computational Linguisticsen
dc.relation.urlhttps://2021.aclweb.org/en
dc.subjectquality estimationen
dc.subjectmachine translationen
dc.subjectknowledge distillationen
dc.titleKnowledge distillation for quality estimationen
dc.typeConference contributionen
dc.date.updated2021-06-07T13:44:55Z
dc.conference.name59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
pubs.finish-date2021-08-04
pubs.start-date2021-08-02
dc.date.accepted2021-05-06
rioxxterms.funderUniversity of Wolverhamptonen
rioxxterms.identifier.projectUOW08062021FBen
rioxxterms.versionAMen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/en
rioxxterms.licenseref.startdate2021-12-31en
refterms.dateFCD2021-06-08T09:43:44Z
refterms.versionFCDAM


Files in this item

Thumbnail
Name:
Gajbhiye_et_al_Knowledge_disti ...
Embargo:
2021-12-31
Size:
279.5Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by-nc-nd/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/