A MACHINE LEARNING APPROACH TO THE IDENTIFICATION OF TRANSLATIONAL LANGUAGE: AN INQUIRY INTO TRANSLATIONESE LEARNING MODELS
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Your vote was cast
Thank you for your feedback
Thank you for your feedback
AdvisorsMitkov, R., Corpas, G., Inkpen, D.
MetadataShow full item record
AbstractIn the eld of Descriptive Translation Studies, translationese refers to the speci c traits that characterise the language used in translations. While translationese has been often investigated to illustrate that translational language is di erent from non-translational language, scholars have also proposed a set of hypotheses which may characterise such di erences. In the quest for the validation of these hypotheses, embracing corpus-based techniques had a well-known impact in the domain, leading to several advances in the past twenty years. Despite extensive research, however, there are no universally recognised characteristics of translational language, nor universally recognised patterns likely to occur within translational language. This thesis addresses these issues, with a less used approach in the eld of Descriptive Translation Studies, by investigating the nature of translational language from a machine learning perspective. While the main focus is on analysing translationese, this thesis investigates two related sub-hypotheses: simpli cation and explicitation. To this end, a multilingual learning framework is designed and implemented for the identi cation of translational language. The framework is modelled as a categorisation task, the learning techniques having the major goal to automatically learn to distinguish between translated and non-translated texts. The second and third major goals of this research are the retrieval of the recurring patterns that are revealed in the process of solving the task of categorisation, as well as the ranking of the most in uential characteristics used to accomplish the learning task. These aims are ful lled by implementing a system that adopts the machine learning methodology proposed in this research. The learning framework proves to be an adaptable multilingual framework for the investigation of the nature of translational language, its adaptability being illustrated in this thesis by applying it to the investigation of two languages: Spanish and Romanian. In this thesis, di erent research scenarios and learning models are experimented with in order to assess to what extent translated texts can be di erentiated from non-translated texts in certain contexts. The ndings show that machine learning algorithms, aggregating a large set of potentially discriminative characteristics for translational language, are able to di erentiate translated texts from non-translated ones with high scores. The evaluation experiments report performance values such as accuracy, precision, recall, and F-measure on two datasets. The present research is situated at the con uence of three areas, more precisely: Descriptive Translation Studies, Machine Learning and Natural Language Processing, justifying the need to combine these elds for the investigation of translationese and translational hypotheses.
PublisherUniversity of Wolverhampton
TypeThesis or dissertation
DescriptionA thesis submitted in partial ful lment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy
Showing items related by title, author, creator and subject.
A Dynamic Programming Approach to Improving Translation Memory Matching and Retrieval Using ParaphrasesGupta, Rohit; Orăsan, Constantin; Liu, Qun; Mitkov, Ruslan; Sojka, Petr; Horak, Ales; Kopecek, Ivan (Springer, 2016-09)Translation memory tools lack semantic knowledge like paraphrasing when they perform matching and retrieval. As a result, paraphrased segments are often not retrieved. One of the primary reasons for this is the lack of a simple and efficient algorithm to incorporate paraphrasing in the TM matching process. Gupta and Orăsan  proposed an algorithm which incorporates paraphrasing based on greedy approximation and dynamic programming. However, because of greedy approximation, their approach does not make full use of the paraphrases available. In this paper we propose an efficient method for incorporating paraphrasing in matching and retrieval based on dynamic programming only. We tested our approach on English-German, English-Spanish and English-French language pairs and retrieved better results for all three language pairs compared to the earlier approach
The first Automatic Translation Memory Cleaning Shared TaskBarbu, Eduard; Parra Escartín, Carla; Bentivogli, Luisa; Negri, Matteo; Turchi, Marco; Orasan, Constantin; Federico, Marcello (Springer, 2017-01-21)This paper reports on the organization and results of the rst Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at nding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys.