• Are classic references cited first? An analysis of citation order within article sections

      Thelwall, Mike (Springer, 2020-12-31)
      Early citations within an article section may have an agenda-setting role but contribute little to the new research. To investigate whether this practice may be common, this article assesses whether the average impact of cited references is influenced by the order in which they are cited within article sections. This is tested on 1,683,299,868 citations to 41,068,375 unique journal articles from 1,470,209 research articles in the PubMed Open Access collection, split into 22 fields. The results show that the first cited article in the Introduction and Background have much higher average citation impacts than later articles, and the same is true to a lesser extent for the Discussion and Conclusion in most fields, but not the Methods and Results. The findings do not prove that early citations are less central to the citing article but nevertheless add to previous evidence suggesting that this practice may be widespread. It may therefore be useful to distinguish between initial introductory citations when evaluating citation impact, or to use impact indicators that implicitly or explicitly give less weight to the citation counts of highly cited articles.
    • A report on the Third VarDial evaluation campaign

      Zampieri, Marcos; Malmasi, Shervin; Scherrer, Yves; Samardžić, Tanja; Tyers, Francis; Silfverberg, Miikka; Klyueva, Natalia; Pan, Tung-Le; Huang, Chu-Ren; Ionescu, Radu Tudor; et al. (Association for Computational Linguistics, 2019-12-31)
      In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019. This year, the campaign included five shared tasks, including one task re-run – German Dialect Identification (GDI) – and four new tasks – Cross-lingual Morphological Analysis (CMA), Discriminating between Mainland and Taiwan variation of Mandarin Chinese (DMT), Moldavian vs. Romanian Cross-dialect Topic identification (MRC), and Cuneiform Language Identification (CLI). A total of 22 teams submitted runs across the five shared tasks. After the end of the competition, we received 14 system description papers, which are published in the VarDial workshop proceedings and referred to in this report.
    • RGCL at IDAT: deep learning models for irony detection in Arabic language

      Ranasinghe, Tharindu; Saadany, Hadeel; Plum, Alistair; Mandhari, Salim; Mohamed, Emad; Orasan, Constantin; Mitkov, Ruslan (IDAT, 2019-12-12)
      This article describes the system submitted by the RGCL team to the IDAT 2019 Shared Task: Irony Detection in Arabic Tweets. The system detects irony in Arabic tweets using deep learning. The paper evaluates the performance of several deep learning models, as well as how text cleaning and text pre-processing influence the accuracy of the system. Several runs were submitted. The highest F1 score achieved for one of the submissions was 0.818 making the team RGCL rank 4th out of 10 teams in final results. Overall, we present a system that uses minimal pre-processing but capable of achieving competitive results.
    • RGCL at GermEval 2019: offensive language detection with deep learning

      Plum, A; Ranasinghe, Tharindu; Orasan, Constantin; Mitkov, R (German Society for Computational Linguistics & Language Technology, 2019-10-08)
      This paper describes the system submitted by the RGCL team to GermEval 2019 Shared Task 2: Identification of Offensive Language. We experimented with five different neural network architectures in order to classify Tweets in terms of offensive language. By means of comparative evaluation, we select the best performing for each of the three subtasks. Overall, we demonstrate that using only minimal preprocessing we are able to obtain competitive results.
    • Automatic summarisation: 25 years On

      Orăsan, Constantin (Cambridge University Press (CUP), 2019-09-19)
      Automatic text summarisation is a topic that has been receiving attention from the research community from the early days of computational linguistics, but it really took off around 25 years ago. This article presents the main developments from the last 25 years. It starts by defining what a summary is and how its definition changed over time as a result of the interest in processing new types of documents. The article continues with a brief history of the field and highlights the main challenges posed by the evaluation of summaries. The article finishes with some thoughts about the future of the field.
    • Do online resources give satisfactory answers to questions about meaning and phraseology?

      Hanks, Patrick; Franklin, Emma (Springer, 2019-09-18)
      In this paper we explore some aspects of the differences between printed paper dictionaries and online dictionaries in the ways in which they explain meaning and phraseology. After noting the importance of the lexicon as an inventory of linguistic items and the neglect in both linguistics and lexicography of phraseological aspects of that inventory, we investigate the treatment in online resources of phraseology – in particular, the phrasal verbs wipe out and put down – and we go on to investigate a word, dope, that has undergone some dramatic meaning changes during the 20th century. In the course of discussion, we mention the new availability of corpus evidence and the technique of Corpus Pattern Analysis, which is important for linking phraseology and meaning and distinguishing normal phraseology from rare and unusual phraseology. The online resources that we discuss include Google, the Urban Dictionary (UD), and Wiktionary.
    • Profiling idioms: a sociolexical approach to the study of phraseological patterns

      Moze, Sara; Mohamed, Emad (Springer, 2019-09-18)
      This paper introduces a novel approach to the study of lexical and pragmatic meaning called ‘sociolexical profiling’, which aims at correlating the use of lexical items with author-attributed demographic features, such as gender, age, profession, and education. The approach was applied to a case study of a set of English idioms derived from the Pattern Dictionary of English Verbs (PDEV), a corpus-driven lexical resource which defines verb senses in terms of the phraseological patterns in which a verb typically occurs. For each selected idiom, a gender profile was generated based on data extracted from the Blog Authorship Corpus (BAC) in order to establish whether any statistically significant differences can be detected in the way men and women use idioms in every-day communication. A quantitative and qualitative analysis of the gender profiles was subsequently performed, enabling us to test the validity of the proposed approach. If performed on a large scale, we believe that sociolexical profiling will have important implications for several areas of research, including corpus lexicography, translation, creative writing, forensic linguistics, and natural language processing.
    • The reading background of Goodreads book club members: A female fiction canon?

      Thelwall, Mike; Bourrier, Karen (Emerald, 2019-09-09)
      Purpose - Despite the social, educational and therapeutic benefits of book clubs, little is known about which books participants are likely to have read. In response, this article investigates the public bookshelves of those that have joined a group within the Goodreads social network site. Design/methodology/approach – Books listed as read by members of fifty large English language Goodreads groups - with a genre focus or other theme - were compiled by author and title. Findings – Recent and youth-oriented fiction dominate the fifty books most read by book club members, while almost half are works of literature frequently taught at the secondary and postsecondary level (literary classics). Whilst JK Rowling is almost ubiquitous (at least 63% as frequently listed as other authors in any group, including groups for other genres), most authors, including Shakespeare (15%), Goulding (6%) and Hemmingway (9%), are little read by some groups. Nor are individual recent literary prize-winners or works in languages other than English frequently read. Research limitations/implications – Although these results are derived from a single popular website, knowing more about what book club members are likely to have read should help participants, organisers and moderators. For example, recent literary prize winners might be a good choice, given that few members may have read them. Originality/value – This is the first large scale study of book group members’ reading patterns. Whilst typical reading is likely to vary by group theme and average age, there seems to be a mainly female canon of about 14 authors and 19 books that Goodreads book club members are likely to have read.
    • Large-scale data harvesting for biographical data

      Plum, Alistair; Zampieri, Marcos; Orasan, Constantin; Wandl-Vogt, Eveline; Mitkov, R (CEUR, 2019-09-05)
      This paper explores automatic methods to identify relevant biography candidates in large databases, and extract biographical information from encyclopedia entries and databases. In this work, relevant candidates are defined as people who have made an impact in a certain country or region within a pre-defined time frame. We investigate the case of people who had an impact in the Republic of Austria and died between 1951 and 2019. We use Wikipedia and Wikidata as data sources and compare the performance of our information extraction methods on these two databases. We demonstrate the usefulness of a natural language processing pipeline to identify suitable biography candidates and, in a second stage, extract relevant information about them. Even though they are considered by many as an identical resource, our results show that the data from Wikipedia and Wikidata differs in some cases and they can be used in a complementary way providing more data for the compilation of biographies.
    • Automatic question answering for medical MCQs: Can it go further than information retrieval?

      Ha, Le An; Yaneva, Viktoriya (RANLP, 2019-09-04)
      We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require training data. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.
    • Toponym detection in the bio-medical domain: A hybrid approach with deep learning

      Plum, Alistair; Ranasinghe, Tharindu; Orăsan, Constantin (RANLP, 2019-09-02)
      This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts. We compare five different state-of-the-art machine learning classifiers in order to predict whether a sentence contains a location or not. Following this classification task, we use a string matching algorithm with a gazetteer to identify the exact index of a toponym within the sentence. We evaluate different approaches in terms of machine learning classifiers, text pre-processing and location extraction on the SemEval-2019 Task 12 dataset, compiled for toponym resolution in the bio-medical domain. Finally, we compare the results with our system that was previously submitted to the SemEval-2019 task evaluation.
    • Sentence simplification for semantic role labelling and information extraction

      Evans, Richard; Orasan, Constantin (RANLP, 2019-09-02)
      In this paper, we report on the extrinsic evaluation of an automatic sentence simplification method with respect to two NLP tasks: semantic role labelling (SRL) and information extraction (IE). The paper begins with our observation of challenges in the intrinsic evaluation of sentence simplification systems, which motivates the use of extrinsic evaluation of these systems with respect to other NLP tasks. We describe the two NLP systems and the test data used in the extrinsic evaluation, and present arguments and evidence motivating the integration of a sentence simplification step as a means of improving the accuracy of these systems. Our evaluation reveals that their performance is improved by the simplification step: the SRL system is better able to assign semantic roles to the majority of the arguments of verbs and the IE system is better able to identify fillers for all IE template slots.
    • Enhancing unsupervised sentence similarity methods with deep contextualised word representations

      Ranashinghe, Tharindu; Orasan, Constantin; Mitkov, Ruslan (RANLP, 2019-09-02)
      Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. All modern state of the art STS methods rely on word embeddings one way or another. The recently introduced contextualised word embeddings have proved more effective than standard word embeddings in many natural language processing tasks. This paper evaluates the impact of several contextualised word embeddings on unsupervised STS methods and compares it with the existing supervised/unsupervised STS methods for different datasets in different languages and different domains.
    • Semantic textual similarity with siamese neural networks

      Orasan, Constantin; Mitkov, Ruslan; Ranasinghe, Tharindu (RANLP, 2019-09-02)
      Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. This paper evaluates Siamese recurrent architectures, a special type of neural networks, which are used here to measure STS. Several variants of the architecture are compared with existing methods
    • A survey of the perceived text adaptation needs of adults with autism

      Yaneva, Viktoriya; Orasan, Constantin; Ha, L; Ponomareva, Natalia (RANLP, 2019-09-02)
      NLP approaches to automatic text adaptation often rely on user-need guidelines which are generic and do not account for the differences between various types of target groups. One such group are adults with high-functioning autism, who are usually able to read long sentences and comprehend difficult words but whose comprehension may be impeded by other linguistic constructions. This is especially challenging for real-world usergenerated texts such as product reviews, which cannot be controlled editorially and are thus in a stronger need of automatic adaptation. To address this problem, we present a mixedmethods survey conducted with 24 adult webusers diagnosed with autism and an agematched control group of 33 neurotypical participants. The aim of the survey is to identify whether the group with autism experiences any barriers when reading online reviews, what these potential barriers are, and what NLP methods would be best suited to improve the accessibility of online reviews for people with autism. The group with autism consistently reported significantly greater difficulties with understanding online product reviews compared to the control group and identified issues related to text length, poor topic organisation, identifying the intention of the author, trustworthiness, and the use of irony, sarcasm and exaggeration.
    • Predicting the difficulty of multiple choice questions in a high-stakes medical exam

      Ha, Le; Yaneva, Victoria; Balwin, Peter; Mee, Janet (Association for Computational Linguistics, 2019-08-02)
      Predicting the construct-relevant difficulty of Multiple-Choice Questions (MCQs) has the potential to reduce cost while maintaining the quality of high-stakes exams. In this paper, we propose a method for estimating the difficulty of MCQs from a high-stakes medical exam, where all questions were deliberately written to a common reading level. To accomplish this, we extract a large number of linguistic features and embedding types, as well as features quantifying the difficulty of the items for an automatic question-answering system. The results show that the proposed approach outperforms various baselines with a statistically significant difference. Best results were achieved when using the full feature set, where embeddings had the highest predictive power, followed by linguistic features. An ablation study of the various types of linguistic features suggested that information from all levels of linguistic processing contributes to predicting item difficulty, with features related to semantic ambiguity and the psycholinguistic properties of words having a slightly higher importance. Owing to its generic nature, the presented approach has the potential to generalize over other exams containing MCQs.
    • Computing Happiness from Textual Data

      Mohamed, Emad; Mostafa, Safa (MDPI, 2019-07-03)
      In this paper, we use a corpus of about 100,000 happy moments written by people of different genders, marital statuses, parenthood statuses, and ages to explore the following questions: Are there differences between men and women, married and unmarried individuals, parents and non-parents, and people of different age groups in terms of their causes of happiness and how they express happiness? Can gender, marital status, parenthood status and/or age be predicted from textual data expressing happiness? The first question is tackled in two steps: first, we transform the happy moments into a set of topics, lemmas, part of speech sequences, and dependency relations; then, we use each set as predictors in multi-variable binary and multinomial logistic regressions to rank these predictors in terms of their influence on each outcome variable (gender, marital status, parenthood status and age). For the prediction task, we use character, lexical, grammatical, semantic, and syntactic features in a machine learning document classification approach. The classification algorithms used include logistic regression, gradient boosting, and fastText. Our results show that textual data expressing moments of happiness can be quite beneficial in understanding the “causes of happiness” for different social groups, and that social characteristics like gender, marital status, parenthood status, and, to some extent age, can be successfully predicted form such textual data. This research aims to bring together elements from philosophy and psychology to be examined by computational corpus linguistics methods in a way that promotes the use of Natural Language Processing for the Humanities.
    • Exploiting Data-Driven Hybrid Approaches to Translation in the EXPERT Project

      Orăsan, Constantin; Escartín, Carla Parra; Torres, Lianet Sepúlveda; Barbu, Eduard; Ji, Meng; Oakes, Michael (Cambridge University Press, 2019-06-13)
      Technologies have transformed the way we work, and this is also applicable to the translation industry. In the past thirty to thirty-five years, professional translators have experienced an increased technification of their work. Barely thirty years ago, a professional translator would not have received a translation assignment attached to an e-mail or via an FTP and yet, for the younger generation of professional translators, receiving an assignment by electronic means is the only reality they know. In addition, as pointed out in several works such as Folaron (2010) and Kenny (2011), professional translators now have a myriad of tools available to use in the translation process.
    • RGCL-WLV at SemEval-2019 Task 12: Toponym Detection

      Plum, Alistair; Ranasinghe, Tharindu; Calleja, Pablo; Orasan, Constantin; Mitkov, Ruslan (ACL, 2019-06-07)
      This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific papers. The system detects toponyms using a bootstrapped machine learning (ML) approach which classifies names identified using gazetteers extracted from the GeoNames geographical database. The paper evaluates the performance of several ML classifiers, as well as how the gazetteers influence the accuracy of the system. Several runs were submitted. The highest precision achieved for one of the submissions was 89%, albeit it at a relatively low recall of 49%.
    • GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks

      Taslimipoor, Shiva; Rohanian, Omid; Može, Sara (Association for Computational Linguistics, 2019-06-06)
      This paper describes the system submitted to the SemEval 2019 shared task 1 ‘Cross-lingual Semantic Parsing with UCCA’. We rely on the semantic dependency parse trees provided in the shared task which are converted from the original UCCA files and model the task as tagging. The aim is to predict the graph structure of the output along with the types of relations among the nodes. Our proposed neural architecture is composed of Graph Convolution and BiLSTM components. The layers of the system share their weights while predicting dependency links and semantic labels. The system is applied to the CONLLU format of the input data and is best suited for semantic dependency parsing.