• A flexible framework for collocation retrieval and translation from parallel and comparable corpora

      Rivera, Oscar Mendoza; Mitkov, Ruslan; Corpas Pastor, Gloria (John Benjamins, 2018)
      This paper outlines a methodology and a system for collocation retrieval and translation from parallel and comparable corpora. The methodology was developed with translators and language learners in mind. It is based on a phraseology framework, applies statistical techniques, and employs source tools and online resources. The collocation retrieval and translation has proved successful for English and Spanish and can be easily adapted to other languages. The evaluation results are promising and future goals are proposed. Furthermore, conclusions are drawn on the nature of comparable corpora and how they can be better exploited to suit particular needs of target users.
    • A framework for named entity recognition in the open domain

      Evans, Richard (John Benjamins Publishing Company, 2004)
    • A High Precision Information Retrieval Method for WiQA

      Orasan, Constantin; Puşcaşu, Georgiana (Springer, 2007)
      This paper presents Wolverhampton University’s participation in the WiQA competition. The method chosen for this task combines a high precision, but low recall information retrieval approach with a greedy sentence ranking algorithm. The high precision retrieval is ensured by querying the search engine with the exact topic, in this way obtaining only sentences which contain the topic. In one of the runs, the set of retrieved sentences is expanded using coreferential relations between sentences. The greedy algorithm used for ranking selects one sentence at a time, always the one which adds most information to the set of sentences without repeating the existing information too much. The evaluation revealed that it achieves a performance similar to other systems participating in the competition and that the run which uses coreference obtains the highest MRR score among all the participants.
    • A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method

      Mitkov, Ruslan; Evans, Richard; Orasan, Constantin (Springer, 2002)
      This paper describes a new, advanced and completely revamped version of Mitkov's knowledge-poor approach to pronoun resolution. In contrast to most anaphora resolution approaches, the new system, referred to as MARS, operates in fully automatic mode. It benefits from purpose-built programs for identifying occurrences of non-nominal anaphora (including pleonastic pronouns) and for recognition of animacy, and employs genetic algorithms to achieve optimal performance. The paper features extensive evaluation and discusses important evaluation issues in anaphora resolution.
    • Anaphora Resolution: To What Extent Does It Help NLP Applications?

      Mitkov, Ruslan; Evans, Richard; Orasan, Constantin (Springer, 2007)
    • Computer-based assessment in numeracy and data analysis

      Binns, Ray; Thelwall, Mike (University of Wolverhampton, 2001)
    • Corpus, Tecnología y Traducción

      Corpas Pastor, Gloria; Casas, M; García Antuña, M (Servicio de Publicaciones de la Universidad de Cádiz, 2012-04-25)
      No es casualidad que la Lingüística de Corpus floreciese especialmente en el contexto europeo. Recordemos que la investigación en tecnologías lingüísticas (o " industrias de la len-gua ") ha sido el marchamo de las políticas científicas europeas. 1 Desde ahí se ha favorecido la investigación en tecnologías lingüísticas como forma de salvaguardar, por un lado, la diversidad cultural y el multilingüismo de Europa, y, al mismo tiempo, superar las barreras y dificultades que esto supone para poder alcanzar los objetivos comunes a todos los europeos. Multilingüismo, multiculturalidad, traducción, tecnologías son rasgos inherentes a la sociedad europea actual. Se podría decir, además, que estas características definitorias han contribuido decisivamente al desarrollo de aplicaciones y recursos lingüísticos encaminados dar soporte a las políticas sociales europeas, y sus estribaciones en materia de comercio, educación e investigación. Si bien las tecnologías lingüísticas y el corpus se han abierto camino desde época muy temprana en las vertientes teóricas y aplicadas de la Lingüística, han sido necesarias varias décadas para que traductores e intérpretes se hayan subido por fin a este carro, que ya iba repleto de investigadores de otras disciplinas afines. En este trabajo realizaremos un breve ex-curso por lo que ha supuesto la incorporación de tales recursos y herramientas para el ámbito de la traducción y la interpretación, con especial referencia a las tecnologías propias del sec-1 Para una visión de conjunto sobre las políticas científicas europeas en materia de tecnologías lingüísticas, véase Corpas Pastor (2008).
    • Corpus-based multilingual lexicographic resources for translators: an overview

      Corpas Pastor, Gloria; Durán Muñoz, Isabel; Mirazo Balsa, Mónica; Valcárcel Riveiro, Carlos; Domínguez Vázquez, María José (De Gruyter, 2019-12-16)
    • Detecting semantic difference: a new model based on knowledge and collocational association

      Taslimipoor, Shiva; Corpas Pastor, Gloria; Rohanian, Omid; Corpas Pastor, Gloria; Colson, Jean-Pierre (John Benjamins Publishing Company, 2020-05-08)
      Semantic discrimination among concepts is a daily exercise for humans when using natural languages. For example, given the words, airplane and car, the word flying can easily be thought and used as an attribute to differentiate them. In this study, we propose a novel automatic approach to detect whether an attribute word represents the difference between two given words. We exploit a combination of knowledge-based and co-occurrence features (collocations) to capture the semantic difference between two words in relation to an attribute. The features are scores that are defined for each pair of words and an attribute, based on association measures, n-gram counts, word similarity, and Concept-Net relations. Based on these features we designed a system that run several experiments on a SemEval-2018 dataset. The experimental results indicate that the proposed model performs better, or at least comparable with, other systems evaluated on the same data for this task.
    • El EEES y la competencia tecnológica: los nuevos grados en Traducción

      Corpas Pastor, Gloria; Muñoz, María (Universidad de Las Palmas de Gran Canaria, Servicio de Publicaciones y Difusión Científica, 2015-04-23)
      El presente trabajo toma como punto de partida la investigación que se describe en Muñoz Ramos (2012). En él haremos una breve síntesis del origen y evolución del EEES hasta llegar a nuestros días y su repercusión en los estudios de Traducción. Daremos cuenta de la imbricación existente entre los principios constitutivos del Proceso de Bolonia y las Tecnologías de la Información y Comunicación (TIC), que se posicionan como las compañeras idóneas para la consecución de los objetivos de la Declaración de Bolonia. Finalmente, podremos comprobar cómo estos dos puntos convergen en los nuevos grados en Traducción españoles, que se ajustan al EEES y encuentran en las materias de tecnologías de la traducción la piedra angular de su razón de ser.
    • El hablar y el discurso repetido: la fraseología

      Mellado, Carmen; Corpas, Gloria; Berty, Katrin; Loureda, Óscar; Schrott, Angela (De Gruyter, 2021-01-18)
      Este capitulo muestra la interrelacion entre fijacion y variabilidad en las unidades fraseologicas desde distintos puntos de vista. En primer lugar, realizamos un analisis detallado del concepto de «discurso repetido» de Coseriu, que ya considera en su origen la idea de cambio creativo, para despues ofrecer una panoramica de la evolucion de la fraseologia en relacion a la lingilistica textual. En segundo lugar, se presenta una clasificacion de la tipologia de la variacion fraseologica, ilustrada con ejemplos de corpus lingiiisticos y centrada en los niveles del sistema y habla, asi como en la intencionalidad del hablante. En tercer lugar, tratamos el tema de la variabilidad fraseologica y el giro que ha tornado la nocion de «fijacion» desde que se dispone de datos masivos de corpus. En este contexto, las magnitudes de frecuencia absoluta, normalizada y de significacion estadistica desempeiian un papel fundamental para el grado de fijacion.
    • Estrategias heurísticas con corpus para la enseñanza de la fraseología orientada a la traducción

      Corpas Pastor, Gloria; Hidalgo Ternero, Carlos Manuel; Seghiri, Miriam (Peter Lang, 2020)
      This work presents a didactic proposal carried out in the subject Lengua y cultura “B” aplicadas a la Traducción e Interpretación (II) – inglés, taught in the first year of the Bache-lor’s Degree in Translation and Interpreting, at the University of Malaga. The main objec-tive of this proposal is to teach the possibilities that both monolingual and bilingual corpora can provide for the correct identification and interpretation of phraseological units with regard to their translation, paying special attention to those cases where the ambiguity of phraseological sequences may lead to multiple interpretations. We will focus on somatisms and will mainly use two Spanish monolingual corpora (CORPES XXI and esEuTenTen), an English monolingual corpus (enTenTen) and two parallel corpora (Europarl and Linguee, more specifically its English-Spanish subcorpus). Against this background, this proposal is divided into several learning activities. After a first seminar where the concepts of corpus, phraseology and translation are introduced, in the learning activity 2 we will use parallel corpora to find translation pairings that contain translation mistakes caused by problems with phraseological ambiguity. Then, in the third learning activity, we will teach some disambiguating elements that will facilitate a correct identification and interpretation of the phraseological unit, in order to be able to convey its pragmatic and semantic weight in the target text. It is in this step where corpora can play a decisive role as documentation tools. Nevertheless, the localisation and interpretation of phraseological units is not problem-free. Given the necessity to develop some techniques that will enable a more effective detection of phraseological units, in the fourth learning activity students will learn an array of heuris-tic strategies to refine their searches in the consulted corpora as well as to select adequate equivalences after a correct interpretation of the results produced by these corpora.
    • Exploiting Data-Driven Hybrid Approaches to Translation in the EXPERT Project

      Orăsan, Constantin; Escartín, Carla Parra; Torres, Lianet Sepúlveda; Barbu, Eduard; Ji, Meng; Oakes, Michael (Cambridge University Press, 2019-06-13)
      Technologies have transformed the way we work, and this is also applicable to the translation industry. In the past thirty to thirty-five years, professional translators have experienced an increased technification of their work. Barely thirty years ago, a professional translator would not have received a translation assignment attached to an e-mail or via an FTP and yet, for the younger generation of professional translators, receiving an assignment by electronic means is the only reality they know. In addition, as pointed out in several works such as Folaron (2010) and Kenny (2011), professional translators now have a myriad of tools available to use in the translation process.
    • Grammatical annotation of historical Portuguese: Generating a corpus-based diachronic dictionary

      Bick, Eckhard; Zampieri, Marcos (Springer, 2016-09-03)
      In this paper, we present an automatic system for the morphosyntactic annotation and lexicographical evaluation of historical Portuguese corpora. Using rule-based orthographical normalization, we were able to apply a standard parser (PALAVRAS) to historical data (Colonia corpus) and to achieve accurate annotation for both POS and syntax. By aligning original and standardized word forms, our method allows to create tailor-made standardization dictionaries for historical Portuguese with optional period or author frequencies.
    • Identification of multiword expressions: A fresh look at modelling and evaluation

      Taslimipoor, Shiva; Rohanian, Omid; Mitkov, Ruslan; Fazly, Afsaneh; Markantonatou, Stella; Ramisch, Carlos; Savary, Agata; Vincze, Veronika (Language Science Press, 2018-10-25)
    • Inteliterm: in search of efficient terminology lookup tools for translators

      Corpas Pastor, G.; Durán-Muñoz, Isabel; Domínguez Vázquez, María José; Mirazo Balsa, Mónica; Valcárcel Riveiro, Carlos (De Gruyter, 2019-12-16)
    • Intelligent Natural Language Processing: Trends and Applications

      Orăsan, Constantin; Evans, Richard; Mitkov, Ruslan (Springer, 2017)
      Autistic Spectrum Disorder (ASD) is a neurodevelopmental disorder which has a life-long impact on the lives of people diagnosed with the condition. In many cases, people with ASD are unable to derive the gist or meaning of written documents due to their inability to process complex sentences, understand non-literal text, and understand uncommon and technical terms. This paper presents FIRST, an innovative project which developed language technology (LT) to make documents more accessible to people with ASD. The project has produced a powerful editor which enables carers of people with ASD to prepare texts suitable for this population. Assessment of the texts generated using the editor showed that they are not less readable than those generated more slowly as a result of onerous unaided conversion and were significantly more readable than the originals. Evaluation of the tool shows that it can have a positive impact on the lives of people with ASD.
    • Intelligent text processing to help readers with autism

      Orăsan, C; Evans, R; Mitkov, R (Springer International Publishing, 2017-11-18)
      © 2018, Springer International Publishing AG. Autistic Spectrum Disorder (ASD) is a neurodevelopmental disorder which has a life-long impact on the lives of people diagnosed with the condition. In many cases, people with ASD are unable to derive the gist or meaning of written documents due to their inability to process complex sentences, understand non-literal text, and understand uncommon and technical terms. This paper presents FIRST, an innovative project which developed language technology (LT) to make documents more accessible to people with ASD. The project has produced a powerful editor which enables carers of people with ASD to prepare texts suitable for this population. Assessment of the texts generated using the editor showed that they are not less readable than those generated more slowly as a result of onerous unaided conversion and were significantly more readable than the originals. Evaluation of the tool shows that it can have a positive impact on the lives of people with ASD.
    • Introduction

      Corpas Pastor, Gloria; Colson, Jean-Pierre (John Benjamins Publishing Company, 2020-05-08)
    • Las tecnologías de interpretación a distancia en los servicios públicos: uso e impacto

      Gaber, Mahmoud; Corpas Pastor, Gloria; Postigo Pinazo, Encarnación (Peter Lang, 2020-02-27)
      This chapter deals with the use of distance interpreting technologies and their impact on public services interpreters. Remote (or distance) interpreting offers a wide range of solutions in order to successfully satisfy the pressing need for languages services in both the public and private sectors. This study focuses on telephone-mediated and video-mediated interpreting, presenting their advantages and disadvantages. We have designed a survey to gather data about the psychological and physiological impact that remote interpreting technologies generate in community interpreters. Our main aim is to ascertain interpreters’ general view on technology, so as to detect deficiencies and suggest ways of improvement. This study is a first contribution in the direction of optimising distance interpreting technologies. Current demand reveals the enormous potential of distance interpreting, its rapid evolution and generalised presence that this modality will have in the future.