• Introduction

      Corpas Pastor, Gloria; Colson, Jean-Pierre (John Benjamins Publishing Company, 2020-05-08)
    • Las tecnologías de interpretación a distancia en los servicios públicos: uso e impacto

      Gaber, Mahmoud; Corpas Pastor, Gloria; Postigo Pinazo, Encarnación (Peter Lang, 2020-02-27)
      This chapter deals with the use of distance interpreting technologies and their impact on public services interpreters. Remote (or distance) interpreting offers a wide range of solutions in order to successfully satisfy the pressing need for languages services in both the public and private sectors. This study focuses on telephone-mediated and video-mediated interpreting, presenting their advantages and disadvantages. We have designed a survey to gather data about the psychological and physiological impact that remote interpreting technologies generate in community interpreters. Our main aim is to ascertain interpreters’ general view on technology, so as to detect deficiencies and suggest ways of improvement. This study is a first contribution in the direction of optimising distance interpreting technologies. Current demand reveals the enormous potential of distance interpreting, its rapid evolution and generalised presence that this modality will have in the future.
    • Laughing one's head off in Spanish subtitles: a corpus-based study on diatopic variation and its consequences for translation

      Corpas Pastor, Gloria; Mogorrón, Pedro; Martines, Vicent (John Benjamins, 2018-11-08)
      Looking for phraseological information is common practice among translators. When rendering idioms, information is mostly needed to find the appropriate equivalent, but, also, to check usage and diasystemic restrictions. One of the most complex issues in this respect is diatopic variation. English and Spanish are transnational languages that are spoken in several countries around the globe. Crossvariety differences as regards idiomaticity range from the actual choice of phraseological units, to different lexical or grammatical variants, usage preferences and differential distribution. In this respect, translators are severely underequipped as regards information found in dictionaries. While some diatopic marks are generally used to indicate geographical restrictions, not all idioms are clearly identified and very little information is provided about preferences and/or crucial differences that occur when the same idiom is used in various national varieties. In translation, source language textemes usually turn into target language repertoremes, i.e. established units within the target system. Toury’s law of growing standardisation helps explaining why translated texts tend to be more simple, conventional and prototypical than non-translated texts, among other characteristic features. Provided a substantial part of translational Spanish is composed of textual repertoremes, any source textemes are bound to be ‘dissolved’ into typical ways of expressing in ‘standard’ Spanish. This means filtering source idiomatic diatopy through the ‘neutral, standard sieve’. This paper delves into the rendering into Spanish of the English idiom to laugh one’s head off. After a cursory look at the notions of transnational and translational Spanish(es) in Section 2, Section 3 analyses the translation strategies deployed in a giga-token parallel subcorpus of Spanish-English subtitles. In Section 4, dictionary and textual equivalents retrieved from the parallel corpus are studied against the background of two sets of synonymous idioms for ‘laughing out loud’ in 19 giga-token comparable subcorpora of Spanish national varieties. Corpas Pastor’s (2015) corpus-based research protocol will be adopted in order to uncover varietal differences, detect diatopic configurations and derive consequences for contrastive studies and translation, as summarised in Section 5. This is the first study, to the best of our knowledge, investigating the translation of to laugh one’s head off and also analysing the Spanish equivalent idioms in national and transnational corpora.
    • Leveraging large corpora for translation using the Sketch Engine

      Moze, Sarah; Krek, Simon (Cambridge University Press, 2018)
    • Linguistic features of genre and method variation in translation: A computational perspective

      Lapshinova-Koltunski, Ekaterina; Zampieri, Marcos; Legallois, Dominique; Charnois, Thierry; Larjavaara, Meri (Mouton De Gruyter, 2018-04-09)
      In this contribution we describe the use of text classification methods to investigate genre and method variation in an English - German translation corpus. For this purpose we use linguistically motivated features representing texts using a combination of part-of-speech tags arranged in bigrams, trigrams, and 4-grams. The classification method used in this study is a Bayesian classifier with Laplace smoothing. We use the output of the classifiers to carry out an extensive feature analysis on the main difference between genres and methods of translation.
    • Multiword units in machine translation and translation technology

      Ruslan, Mitkov; Monti, Johanna; Corpas Pastor, Gloria; Seretan, Violeta (John Benjamins, 2018-07-20)
      The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but we believe that there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. In this chapter, we present a survey of the field with particular reference to Machine Translation and Translation Technology.
    • Natural language processing for mental disorders: an overview

      Calixto, Iacer; Yaneva, Viktoriya; Cardoso, Raphael (CRC Press, 2022-12-31)
    • New directions in the study of family names

      Hanks, Patrick; Boullón Agrelo, Ana Isabel (Consello da Cultura Galega, 2018-12-28)
      This paper explores and explains recent radical developments in resources and methodology for studying the origins, cultural associations, and histories of family names (also called ‘surnames’). It summarizes the current state of the art and outlines new resources and procedures that are now becoming available. It shows how such innovations can enable the correction of errors in previous work and improve the accuracy of dictionaries of family names, with a focus on the English-speaking world. Developments such as the digitization of archives are having a profound effect, not only on the interpretation and understanding of traditional, ‘established’ family names and their histories, but also of names in other languages and other cultures. There are literally millions of different family names in the world today, many of which have never been studied at all. What are good criteria for selection of entries in a dictionary of family names, and what can be said about them? What is the nature of the evidence? How stable (or how variable) are family names over time? What are the effects of factors such as migration? What is the relationship between family names and geographical locations, given that people can and do move around? What is the relationship between traditional philological and historical approaches to the subject and statistical analysis of newly available digitized data? The paper aims to contribute to productive discussion of such questions.
    • NLP-enhanced self-study learning materials for quality healthcare in Europe

      Urbano Mendaña, Míriam; Corpas Pastor, Gloria; Seghiri Domínguez, Míriam; Aguado de Cea, G; Aussenac-Gilles, N; Nazarenko, A; Szulman, S (Université Paris 13, 2013-10)
      In this paper we present an overview of the TELL-ME project, which aims to develop innovative e-learning tools and self-study materials for teaching vocationally-specific languages to healthcare professionals, helping them to communicate at work. The TELL-ME e-learning platform incorporates a variety of NLP techniques to provide an array of diverse work-related exercises, selfassessment tools and an interactive dictionary of key vocabulary and concepts aimed at medics for Spanish, English and German. A prototype of the e-learning platform is currently under evaluation.
    • Recursos documentales para la traducción de seguros turísticos en el par de lenguas inglés-español

      Corpas Pastor, Gloria; Seghiri Domínguez, Miriam; Postigo Pinazo, Encarnación (Universidad de Málaga, 2007-04-05)
      Las páginas que siguen a continuación resumen parte de la investigación realizada en el marco de un proyecto de I+D interdisciplinar e interuniversitario sobre Tecnologías de la Traducción, denominado TURICOR (BFF2003-04616, MCYT), cuyos objetivos principales son la compilación virtual de un corpus multilingüe de contratación turística a partir de recursos electrónicos y el desarrollo de un sistema de generación de lenguaje natural (GLN), también multilingü. El corpus Turicor alberga, pues, diversos tipos de documentos relativos a la contratación turística en las cuatro lenguas implicadas (español, inglés, alemán e italiano). En concreto, la tipologíatextual que ha vertebrado la selección de los documentos que integran los distintossubcorpus de los que consta Turicor abarca lo siguiente: legislación turística (internacional, comunitaria y nacional de los respectivos países incluidos); condiciones generales, formularios y contratos turísticos.
    • Size Matters: A Quantitative Approach to Corpus Representativeness

      Corpas Pastor, Gloria; Seghiri Domínguez, Míriam; Rabadán, Rosa (Publicaciones Universidad de León, 2010-06-01)
      We should always bear in mind that the assumption of representativeness ‘must be regarded largely as an act of faith’ (Leech 1991: 2), as at present we have no means of ensuring it, or even evaluating it objectively. (Tognini-Bonelli 2001: 57) Corpus Linguistics (CL) has not yet come of age. It does not make any difference whether we consider it a full-fledged linguistic discipline (Tognini-Bonelli 2000: 1) or, else, a set of analytical techniques that can be applied to any discipline (McEnery et al. 2006: 7). The truth is that CL is still striving to solve thorny, central issues such as optimum size, balance and representativeness of corpora (of the language as a whole or of some subset of the language). Corpus-driven/based studies rely on the quality and representativeness of each corpus as their true foundation for producing valid results. This entails deciding on valid external and internal criteria for corpus design and compilation. A basic tenet is that corpus representativeness determines the kinds of research questions that can be addressed and the generalizability of the results obtained (cf. Biber et al. 1988: 246). Unfortunately, faith and beliefs do not seem to ensure quality. In this paper we will attempt to deal with these key questions. Firstly, we will give a brief description of the R&D projects which originally have served as the main framework for this research. Secondly, we will focus on the complex notion of corpus representativeness and ideal size, from both a theoretical and an applied perspective. Finally, we will describe a computer application which has been developed as part of the research. This software will be used to verify whether a sample bilingual comparable corpus could be deemed representative.
    • Teaching idioms for translation purposes: a trilingual corpus-based glossary applied to phraseodidactics (ES/EN/DE)

      Corpas Pastor, Gloria; Hidalgo Ternero, Carlos Manuel; Bautista Zambrada, María Rosario; Martínez, Florentina Mena; Strohschen, Carola (Peter Lang, 2020)
      Phraseology plays a pivotal role in the development of translation competence as well as in translation quality assessment. Thus far, however, there remains a paucity of research on how to best teach idioms for translation purposes. Against such a background, this study aims to shed some light on the multiple applications of phraseodidactics to translation training. We will follow a corpus-based methodology and, for the sake of the argument, the focus will be on somatisms in Spanish, English and German. The overall structure of this paper takes the form of four sections. Section One begins by laying out the theoretical dimensions of phraseology and its convergence with translation. In section two we examine the main components of a corpus-based glossary of somatisms, named Glossomatic, and how it can be employed to establish ad hoc phraseological equivalences in those cases (analysed in section three) where the manipulation of idioms and the absence of one-to-one phraseological correspondence may pose some problems to translation. In this regard, given the importance of accurately conveying the pragmatic, semantic and discursive load of an idiom into a TT and, concomitantly, conveying the manipulation depicted in the ST, section four presents a teaching proposal in which students are prompted with a set of strategies and steps to be implemented with the aid of the glossary in order to solve these issues. Overall, the insights gained from this research will prove useful not only in developing trainees’ phraseological competence but also in giving centre stage to phraseodidactics in Translation Studies.
    • Translationese and register variation in English-to-Russian professional translation

      Kunilovskaya, Maria; Corpas Pastor, Gloria; Wang, Vincent; Lim, Lily; Li, Defeng (Springer Singapore, 2021-10-12)
      This study explores the impact of register on the properties of translations. We compare sources, translations and non-translated reference texts to describe the linguistic specificity of translations common and unique between four registers. Our approach includes bottom-up identification of translationese effects that can be used to define translations in relation to contrastive properties of each register. The analysis is based on an extended set of features that reflect morphological, syntactic and text-level characteristics of translations. We also experiment with lexis-based features from n-gram language models estimated on large bodies of originally- authored texts from the included registers. Our parallel corpora are built from published English-to-Russian professional translations of general domain mass-media texts, popular-scientific books, fiction and analytical texts on political and economic news. The number of observations and the data sizes for parallel and reference components are comparable within each register and range from 166 (fiction) to 525 (media) text pairs; from 300,000 to 1 million tokens. Methodologically, the research relies on a series of supervised and unsupervised machine learning techniques, including those that facilitate visual data exploration. We learn a number of text classification models and study their performance to assess our hypotheses. Further on, we analyse the usefulness of the features for these classifications to detect the best translationese indicators in each register. The multivariate analysis via text classification is complemented by univariate statistical analysis which helps to explain the observed deviation of translated registers through a number of translationese effects and detect the features that contribute to them. Our results demonstrate that each register generates a unique form of translationese that can be only partially explained by cross-linguistic factors. Translated registers differ in the amount and type of prevalent translationese. The same translationese tendencies in different registers are manifested through different features. In particular, the notorious shining-through effect is more noticeable in general media texts and news commentary and is less prominent in fiction.
    • What matters more: the size of the corpora or their quality? The case of automatic translation of multiword expressions using comparable corpora.

      Mitkov, Ruslan; Taslimipoor, Shiva (John Benjamins, 2020-05-08)
      This study investigates (and compares) the impact of the size and the similarity/quality of comparable corpora on the specific task of extracting translation equivalents of verb-noun collocations from such corpora. The comprehensive evaluation of different configurations of English and Spanish corpora sheds some light on the more general and perennial question: what matters more – the quantity or quality of corpora?