• MLQE-PE: A multilingual quality estimation and post-editing dataset

      Fomicheva, Marina; Sun, Shuo; Fonseca, Erick; Zerva, Chrysoula; Blain, Frédéric; Chaudhary, Vishrav; Guzmán, Francisco; Lopatina, Nina; Specia, Lucia; Martins, André FT (arXiv, 2020-10-11)
      We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains eleven language pairs, with human labels for up to 10,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.