Recent Submissions

  • Translationese and register variation in English-to-Russian professional translation

    Kunilovskaya, Maria; Corpas Pastor, Gloria; Wang, Vincent; Lim, Lily; Li, Defeng (Springer Singapore, 2021-10-12)
    This study explores the impact of register on the properties of translations. We compare sources, translations and non-translated reference texts to describe the linguistic specificity of translations common and unique between four registers. Our approach includes bottom-up identification of translationese effects that can be used to define translations in relation to contrastive properties of each register. The analysis is based on an extended set of features that reflect morphological, syntactic and text-level characteristics of translations. We also experiment with lexis-based features from n-gram language models estimated on large bodies of originally- authored texts from the included registers. Our parallel corpora are built from published English-to-Russian professional translations of general domain mass-media texts, popular-scientific books, fiction and analytical texts on political and economic news. The number of observations and the data sizes for parallel and reference components are comparable within each register and range from 166 (fiction) to 525 (media) text pairs; from 300,000 to 1 million tokens. Methodologically, the research relies on a series of supervised and unsupervised machine learning techniques, including those that facilitate visual data exploration. We learn a number of text classification models and study their performance to assess our hypotheses. Further on, we analyse the usefulness of the features for these classifications to detect the best translationese indicators in each register. The multivariate analysis via text classification is complemented by univariate statistical analysis which helps to explain the observed deviation of translated registers through a number of translationese effects and detect the features that contribute to them. Our results demonstrate that each register generates a unique form of translationese that can be only partially explained by cross-linguistic factors. Translated registers differ in the amount and type of prevalent translationese. The same translationese tendencies in different registers are manifested through different features. In particular, the notorious shining-through effect is more noticeable in general media texts and news commentary and is less prominent in fiction.
  • Source language difficulties in learner translation: Evidence from an error-annotated corpus

    Kunilovskaia, Mariia; Ilyushchenya, Tatyana; Morgoun, Natalia; Mitkov, Ruslan (John Benjamins Publishing, 2022-06-30)
    This study uses an error-annotated, mass-media subset of a sentence-aligned, multi-parallel learner translator corpus, to reveal source language items that are challenging in English-to-Russian translation. Our data includes multiple translations to most challenging source sentences, distilled from a large collection of student translations on the basis of error statistics. This sample was subjected to manual contrastive-comparative analysis, which resulted in a list of English items that were difficult to students. The outcome of the analysis was compared to the topics discussed in dozens of translation textbooks that are recommended to BA and specialist-degree students in Russia at the initial stage of professional education. We discuss items that deserve more prominence in training as well as items that call for improvements to traditional learning activities. This study presents evidence that a more empirically-motivated design of practical translation syllabus as part of translator education is required.
  • Findings of the WMT 2021 shared task on quality estimation

    Specia, Lucia; Blain, Frederic; Fomicheva, Marina; Zerva, Chrysoula; Li, Zhenhao; Chaudhary, Vishrav; Martins, André (Association for Computational Linguistics, 2021-12-31)
    We report the results of the WMT 2021 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels. This edition focused on two main novel additions: (i) prediction for unseen languages, i.e. zero-shot settings, and (ii) prediction of sentences with catastrophic errors. In addition, new data was released for a number of languages, especially post-edited data. Participating teams from 19 institutions submitted altogether 1263 systems to different task variants and language pairs.
  • Using linguistic features to predict the response process complexity associated with answering clinical MCQs

    Yaneva, Victoria; Jurich, Daniel; Ha, Le An; Baldwin, Peter (Association for Computational Linguistics, 2021-04-30)
    This study examines the relationship between the linguistic characteristics of a test item and the complexity of the response process required to answer it correctly. Using data from a large-scale medical licensing exam, clustering methods identified items that were similar with respect to their relative difficulty and relative response-time intensiveness to create low response process complexity and high response process complexity item classes. Interpretable models were used to investigate the linguistic features that best differentiated between these classes from a descriptive and predictive framework. Results suggest that nuanced features such as the number of ambiguous medical terms help explain response process complexity beyond superficial item characteristics such as word count. Yet, although linguistic features carry signal relevant to response process complexity, the classification of individual items remains challenging.
  • An exploratory analysis of multilingual word-level quality estimation with cross-lingual transformers

    Ranasinghe, Tharindu; Orasan, Constantin; Mitkov, Ruslan (Association for Computational Linguistics, 2021-08-31)
    Most studies on word-level Quality Estimation (QE) of machine translation focus on language-specific models. The obvious disadvantages of these approaches are the need for labelled data for each language pair and the high cost required to maintain several language-specific models. To overcome these problems, we explore different approaches to multilingual, word-level QE. We show that these QE models perform on par with the current language-specific models. In the cases of zero-shot and few-shot QE, we demonstrate that it is possible to accurately predict word-level quality for any given new language pair from models trained on other language pairs. Our findings suggest that the word-level QE models based on powerful pre-trained transformers that we propose in this paper generalise well across languages, making them more useful in real-world scenarios.
  • deepQuest-py: large and distilled models for quality estimation

    Alva-Manchego, Fernando; Obamuyide, Abiola; Gajbhiye, Amit; Blain, Frederic; Fomicheva, Marina; Specia, Lucia (Association for Computational Linguistics, 2021-12-31)
    We introduce deepQuest-py, a framework for training and evaluation of large and lightweight models for Quality Estimation (QE). deepQuest-py provides access to (1) state-ofthe-art models based on pre-trained Transformers for sentence-level and word-level QE; (2) light-weight and efficient sentence-level models implemented via knowledge distillation; and (3) a web interface for testing models and visualising their predictions. deepQuestpy is available at sheffieldnlp/deepQuest-py under a CC BY-NC-SA licence.
  • Pushing the right buttons: adversarial evaluation of quality estimation

    Kanojia, Diptesh; Fomicheva, Marina; Ranasinghe, Tharindu; Blain, Frederic; Orasan, Constantin; Specia, Lucia; Orasan (Association for Computational Linguistics, 2021-12-31)
    Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their reliability in practice. Quality Estimation (QE) is the task of automatically assessing the performance of MT systems at test time. Thus, in order to be useful, QE systems should be able to detect such errors. However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements. In this work, we bridge this gap by proposing a general methodology for adversarial testing of QE for MT. First, we show that despite a high correlation with human judgements achieved by the recent SOTA, certain types of meaning errors are still problematic for QE to detect. Second, we show that on average, the ability of a given model to discriminate between meaningpreserving and meaning-altering perturbations is predictive of its overall performance, thus potentially allowing for comparing QE systems without relying on manual quality annotation.
  • Robust fragment-based framework for cross-lingual sentence retrieval

    Trijakwanich, Nattapol; Limkonchotiwat, Peerat; Sarwar, Raheem; Phatthiyaphaibun, Wannaphong; Chuangsuwanich, Ekapol; Nutanong, Sarana (Association for Computational Linguistics, 2021-12-31)
    Cross-lingual Sentence Retrieval (CLSR) aims at retrieving parallel sentence pairs that are translations of each other from a multilingual set of comparable documents. The retrieved parallel sentence pairs can be used in other downstream NLP tasks such as machine translation and cross-lingual word sense disambiguation. We propose a CLSR framework called Robust Fragment-level Representation (RFR) CLSR framework to address Out-of- Domain (OOD) CLSR problems. In particular, we improve the sentence retrieval robustness by representing each sentence as a collection of fragments. In this way, we change the retrieval granularity from the sentence to the fragment level. We performed CLSR experiments based on three OOD datasets, four language pairs, and three base well-known sentence encoders: m-USE, LASER, and LaBSE. Experimental results show that RFR significantly improves the base encoders’ performance for more than 85% of the cases.
  • Linguistic features evaluation for hadith authenticity through automatic machine learning

    Mohamed, Emad; Sarwar, Raheem (Oxford University Press, 2021-12-31)
    There has not been any research that provides an evaluation of the linguistic features extracted from the matn (text) of a Hadith. Moreover, none of the fairly large corpora are publicly available as a benchmark corpus for Hadith authenticity, and there is a need to build a “gold standard” corpus for good practices in Hadith authentication. We write a scraper in Python programming language and collect a corpus of 3651 authentic prophetic traditions and 3593 fake ones. We process the corpora with morphological segmentation and perform extensive experimental studies using a variety of machine learning algorithms, mainly through Automatic Machine Learning, to distinguish between these two categories. With a feature set including words, morphological segments, characters, top N words, top N segments, function words and several vocabulary richness features, we analyse the results in terms of both prediction and interpretability to explain which features are more characteristic of each class. Many experiments have produced good results and the highest accuracy (i.e., 78.28%) is achieved using word n-grams as features using the Multinomial Naive Bayes classifier. Our extensive experimental studies conclude that, at least for Digital Humanities, feature engineering may still be desirable due to the high interpretability of the features. The corpus and software (scripts) will be made publicly available to other researchers in an effort to promote progress and replicability.
  • A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases

    Parodi, Giovanni; Evans, Richard; Ha, Le An; Mitkov, Ruslan; Julio, Cristóbal; Olivares-López, Raúl Ignacio (Springer, 2021-09-04)
    Encapsulators are linguistic units which establish coherent referential connections to the preceding discourse in a text. In this paper, we address the challenge of automatically analysing the pronominal encapsulator ello in Spanish text. Our method identifies, for each occurrence, the antecedent of the pronoun (including its grammatical type), the connective phrase which combines with the pronoun to express a discourse relation linking the antecedent text segment to the following text segment, and the type of semantic relation expressed by the complex discourse marker formed by the connective phrase and pronoun. We describe our annotation of a corpus to inform the development of our method and to finetune an automatic analyser based on bidirectional encoder representation transformers (BERT). On testing our method, we find that it performs with greater accuracy than three baselines (0.76 for the resolution task), and sets a promising benchmark for the automatic annotation of occurrences of the pronoun ello, their antecedents, and the semantic relations between the two text segments linked by the connective in combination with the pronoun.
  • Exploiting tweet sentiments in altmetrics large-scale data

    Hassan, Saeed-Ul; Aljohani, Naif Radi; Iqbal Tarar, Usman; Safder, Iqra; Sarwar, Raheem; Alelyani, Salem; Nawaz, Raheel (SAGE, 2021-12-31)
    This article aims to exploit social exchanges on scientific literature, specifically tweets, to analyse social media users' sentiments towards publications within a research field. First, we employ the SentiStrength tool, extended with newly created lexicon terms, to classify the sentiments of 6,482,260 tweets associated with 1,083,535 publications provided by Then, we propose harmonic means-based statistical measures to generate a specialized lexicon, using positive and negative sentiment scores and frequency metrics. Next, we adopt a novel article-level summarization approach to domain-level sentiment analysis to gauge the opinion of social media users on Twitter about the scientific literature. Last, we propose and employ an aspect-based analytical approach to mine users' expressions relating to various aspects of the article, such as tweets on its title, abstract, methodology, conclusion, or results section. We show that research communities exhibit dissimilar sentiments towards their respective fields. The analysis of the field-wise distribution of article aspects shows that in Medicine, Economics, Business & Decision Sciences, tweet aspects are focused on the results section. In contrast, Physics & Astronomy, Materials Sciences, and Computer Science these aspects are focused on the methodology section. Overall, the study helps us to understand the sentiments of online social exchanges of the scientific community on scientific literature. Specifically, such a fine-grained analysis may help research communities in improving their social media exchanges about the scientific articles to disseminate their scientific findings effectively and to further increase their societal impact.
  • SemEval-2021 task 1: Lexical complexity prediction

    Shardlow, Matthew; Evans, Richard; Paetzold, Gustavo Henrique; Zampieri, Marcos (Association for Computational Linguistics, 2021-08-01)
    This paper presents the results and main findings of SemEval-2021 Task 1 - Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al. 2020). CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a five point Likert scale. SemEval-2021 Task 1 featured two Sub-tasks: Sub-task 1 focused on single words and Sub-task 2 focused on MWEs. The competition attracted 198 teams in total, of which 54 teams submitted official runs on the test data to Sub-task 1 and 37 to Sub-task 2.
  • Natural language processing for mental disorders: an overview

    Calixto, Iacer; Yaneva, Viktoriya; Cardoso, Raphael (CRC Press, 2021-12-31)
  • Remote interpreting in public service settings: Technology, perceptions and practice

    Corpas Pastor, Gloria; Gaber, Mahmoud (The Slovak Association for the Study of English, 2020-12-31)
    Remote interpretation technology is developing extremely fast, enabling affordable and instant access to interpreting services worldwide. This paper focuses on the subjective perceptions of public service interpreters about the psychological and physical impact of using remote interpreting, and the effects on their own performance. To this end, a survey study has been conducted by means of an on-line questionnaire. Both structured and unstructured questions have been used to tap into interpreters’ view on technology, elicit information about perceived effects, and identify pitfalls and prospects.
  • Management of 201 individuals with emotionally unstable personality disorders: A naturalistic observational study in real-world inpatient setting

    Shahpesandy, Homayun; Mohammed-Ali, Rosemary; Oakes, Michael; Al-Kubaisy, Tarik; Cheetham, Anna; Anene, Moses; The Hartsholme Centre, Long Leys Road, Lincoln, LN1 1FS, Lincolnshire NHS Foundation Trust, UK. (Maghira & Maas Publications, 2021-06-03)
    BACKGROUND: Emotionally unstable personality disorder (EUPD) is a challenging condition with a prevalence of 20% in inpatient services. Psychotherapy is the preferred treatment; nevertheless, off-license medications are widely used. OBJECTIVES: To identify socio-demographics, clinical and service-delivery characteristics of people with EUPD admitted to inpatient services between 1st January 2017 and 31st December 2018. METHODS: A retrospective review using data from patients' records. Individuals, age 18-65 were included. Statistical analysis was conducted by the Mann-Whitney-Wilcoxon test and Chi-squared test with Yates's continuity correction. RESULTS: Of 1646 inpatients, 201 (12.2%); had the diagnosis of EUPD; 133 (66.0%) women, 68 (44.0%). EUPD was significantly (P < .001) more prevalent in women (18.2%) than men (7.4%). EUPD patients were significantly (P < .001) younger (32.2 years) than patients without EUPD (46 years), and had significantly (P < .001) more admissions (1.74) than patients without EUPD (1.2 admission). 70.5% of patients had one and 17.0% two Axis-I psychiatric co-morbidities. Substance use was significantly (P < .001) more often in men (57.3%) than in women (28.5%). Significantly (P = 0.047) more women (68.4%) than men (53.0%) reported sexual abuse. 87.5% used polypharmacy. Antidepressants were significantly (P = 0.02) often prescribed to women (76.6%) than men (69.1%). Significantly (P = 0.02) more women (83.5%) than men (67.6%) were on antipsychotics. 57.2% of the patients were on anxiolytics, 40.0% on hypnotics and 25.8% on mood stabilisers. CONCLUSION: EUPD is a complex condition with widespread comorbidity. The term EUPD, Borderline Personality Disorder is unsuitable, stigmatising and too simplistic to reflect the nature, gravity and psychopathology of this syndrome.
  • Urdu AI: writeprints for Urdu authorship identification

    Sarwar, Raheem; Hassan, Saeed-Ul (Association for Computing Machinery, 2021-12-31)
    The authorship identification task aims at identifying the original author of an anonymous text sample from a set of candidate authors. It has several application domains such as digital text forensics and information retrieval. These application domains are not limited to a specific language. However, most of the authorship identification studies are focused on English and limited attention has been paid to Urdu. On the other hand, existing Urdu authorship identification solutions drop accuracy as the number of training samples per candidate author reduces, and when the number of candidate author increases. Consequently, these solutions are inapplicable to real-world cases. To overcome these limitations, we formulate a stylometric feature space. Based on this feature space we use an authorship identification solution that transforms each text sample into point set, retrieves candidate text samples, and relies the nearest neighbour classifier to predict the original author of the anonymous text sample. To evaluate our method, we create a significantly larger corpus than existing studies and conduct several experimental studies which show that our solution can overcome the limitations of existing studies and report an accuracy level of 94.03%, which is higher than all previous authorship identification works.
  • Combining text and images for film age appropriateness classification

    Ha, Le; Mohamed, Emad (Elsevier, 2021-07-14)
    We combine textual information from a corpus of film scripts and the images of important scenes from IMDB that correspond to these films to create a bimodal dataset (the dataset and scripts can be obtained from for film age appropriateness classification with the objective of improving the prediction of age appropriateness for parents and children. We use state-of-the art Deep Learning image feature extraction, including DENSENet, ResNet, Inception, and NASNet. We have tested several Machine learning algorithms and have found xgboost to yield the best results. Previously reported classification accuracy, using only textual features, were 79.1% and 65.3% for American MPAA and British BBFC classification respectively. Using images alone, we achieve 64.8% and 56.7% classification accuracy. The most consistent combination of textual features and images’ features achieves 81.1% and 66.8%, both statistically significant improvements over the use of text only.
  • Decálogo de características de la literatura poscolonial: propuesta de una taxonomía para la crítica literaria y los estudios de literatura comparada

    Fernández Ruiz, María Remedios; Corpas Pastor, Gloria; Seghiri, Míriam (Editorial CSIC, 2021-06-22)
    El objetivo de este artículo es ofrecer una propuesta de clasificación de los rasgos presentes, en mayor o menor medida, en la literatura poscolonial en cualquier idioma. A pesar de que esta taxonomía toma como punto de partida definiciones teóricas previas de los conceptos clave relacionados con la literatura poscolonial (Edwards 2008, Nayar 2008 y Ramone 2011), parece ser la primera clasificación formal que se ha elaborado al respecto. De este modo, se analizan conceptos consolidados a la par que presenta la nueva noción de plasticidad de géneros literarios y explora las corrientes actuales en la investigación de la interseccionalidad. Como resultado, proporcionaremos un decálogo de características de la literatura poscolonial que favorecerá la crítica literaria y los estudios de literatura comparada.
  • Handling cross and out-of-domain samples in Thai word segmentation

    Limkonchotiwat, Peerat; Phatthiyaphaibun, Wannaphong; Sarwar, Raheem; Chuangsuwanich, Ekapol; Nutanong, Sarana (Association for Computational Linguistics, 2021-08-01)
    While word segmentation is a solved problem in many languages, it is still a challenge in continuous-script or low-resource languages. Like other NLP tasks, word segmentation is domain-dependent, which can be a challenge in low-resource languages like Thai and Urdu since there can be domains with insufficient data. This investigation proposes a new solution to adapt an existing domaingeneric model to a target domain, as well as a data augmentation technique to combat the low-resource problems. In addition to domain adaptation, we also propose a framework to handle out-of-domain inputs using an ensemble of domain-specific models called MultiDomain Ensemble (MDE). To assess the effectiveness of the proposed solutions, we conducted extensive experiments on domain adaptation and out-of-domain scenarios. Moreover, we also proposed a multiple task dataset for Thai text processing, including word segmentation. For domain adaptation, we compared our solution to the state-of-the-art Thai word segmentation (TWS) method and obtained improvements from 93.47% to 98.48% at the character level and 84.03% to 96.75% at the word level. For out-of-domain scenarios, our MDE method significantly outperformed the state-of-the-art TWS and multi-criteria methods. Furthermore, to demonstrate our method’s generalizability, we also applied our MDE framework to other languages, namely Chinese, Japanese, and Urdu, and obtained improvements similar to Thai’s.
  • Sentiment analysis for Urdu online reviews using deep learning models

    Safder, Iqra; Mehmood, Zainab; Sarwar, Raheem; Hassan, Saeed-Ul; Zaman, Farooq; Adeel Nawab, Rao Muhammad; Bukhari, Faisal; Ayaz Abbasi, Rabeeh; Alelyani, Salem; Radi Aljohani, Naif; et al. (Wiley, 2021-06-28)
    Most existing studies are focused on popular languages like English, Spanish, Chinese, Japanese, and others, however, limited attention has been paid to Urdu despite having more than 60 million native speakers. In this paper, we develop a deep learning model for the sentiments expressed in this under-resourced language. We develop an open-source corpus of 10,008 reviews from 566 online threads on the topics of sports, food, software, politics, and entertainment. The objectives of this work are bi-fold (1) the creation of a human-annotated corpus for the research of sentiment analysis in Urdu; and (2) measurement of up-to-date model performance using a corpus. For their assessment, we performed binary and ternary classification studies utilizing another model, namely LSTM, RCNN Rule-Based, N-gram, SVM, CNN, and LSTM. The RCNN model surpasses standard models with 84.98 % accuracy for binary classification and 68.56 % accuracy for ternary classification. To facilitate other researchers working in the same domain, we have open-sourced the corpus and code developed for this research.

View more