• Automatic translation of scientific documents in the HAL archive

      Lambert, Patrik; Schwenk, Holger; Blain, Frederic (European Language Resources Association (ELRA), 2012-05-31)
      This paper describes the development of a statistical machine translation system between French and English for scientific papers. This system will be closely integrated into the French HAL open archive, a collection of more than 100.000 scientific papers. We describe the creation of in-domain parallel and monolingual corpora, the development of a domain specific translation system with the created resources, and its adaptation using monolingual resources only. These techniques allowed us to improve a generic system by more than 10 BLEU points.
    • Collaborative machine translation service for scientific texts

      Lambert, Patrik; Senellart, Jean; Romary, Laurent; Schwenk, Holger; Zipser, Florian; Lopez, Patrice; Blain, Frederic (Association for Computational Linguistics, 2012-04-30)
      French researchers are required to frequently translate into French the description of their work published in English. At the same time, the need for French people to access articles in English, or to international researchers to access theses or papers in French, is incorrectly resolved via the use of generic translation tools. We propose the demonstration of an end-to-end tool integrated in the HAL open archive for enabling efficient translation for scientific texts. This tool can give translation suggestions adapted to the scientific domain, improving by more than 10 points the BLEU score of a generic system. It also provides a post-edition service which captures user post-editing data that can be used to incrementally improve the translations engines. Thus it is helpful for users which need to translate or to access scientific texts.
    • Continuous adaptation to user feedback for statistical machine translation

      Blain, Frédéric; Bougares, Fethi; Hazem, Amir; Barrault, Loïc; Schwenk, Holger (Association for Computational Linguistics, 2015-06-30)
      This paper gives a detailed experiment feedback of different approaches to adapt a statistical machine translation system towards a targeted translation project, using only small amounts of parallel in-domain data. The experiments were performed by professional translators under realistic conditions of work using a computer assisted translation tool. We analyze the influence of these adaptations on the translator productivity and on the overall post-editing effort. We show that significant improvements can be obtained by using the presented adaptation techniques.
    • Incremental adaptation using translation informations and post-editing analysis

      Blain, Frederic; Schwenk, Holger; Senellart, Jean (IWSLT, 2012-12-06)
      It is well known that statistical machine translation systems perform best when they are adapted to the task. In this paper we propose new methods to quickly perform incremental adaptation without the need to obtain word-by-word alignments from GIZA or similar tools. The main idea is to use an automatic translation as pivot to infer alignments between the source sentence and the reference translation, or user correction. We compared our approach to the standard method to perform incremental re-training. We achieve similar results in the BLEU score using less computational resources. Fast retraining is particularly interesting when we want to almost instantly integrate user feed-back, for instance in a post-editing context or machine translation assisted CAT tool. We also explore several methods to combine the translation models.
    • Project adaptation over several days

      Blain, Frederic; Hazem, Amir; Bougares, Fethi; Barrault, Loic; Schwenk, Holger (Johannes Gutenberg University of Mainz, 2015-01-30)
    • Qualitative analysis of post-editing for high quality machine translation

      Blain, Frédéric; Senellart, Jean; Schwenk, Holger; Plitt, Mirko; Roturier, Johann; AAMT, Asia-Pacific Association for Machine Translation (Asia-Pacific Association for Machine Translation, 2011-09-30)
      In the context of massive adoption of Machine Translation (MT) by human localization services in Post-Editing (PE) workflows, we analyze the activity of post-editing high quality translations through a novel PE analysis methodology. We define and introduce a new unit for evaluating post-editing effort based on Post-Editing Action (PEA) - for which we provide human evaluation guidelines and propose a process to automatically evaluate these PEAs. We applied this methodology on data sets from two technologically different MT systems. In that context, we could show that more than 35% of the remaining effort can be saved by introducing of global PEA and edit propagation.