• Verbal multiword expressions for identification of metaphor

      Rohanian, Omid; Rei, Marek; Taslimipoor, Shiva; Ha, Le (ACL, 2020-07-06)
      Metaphor is a linguistic device in which a concept is expressed by mentioning another. Identifying metaphorical expressions, therefore, requires a non-compositional understanding of semantics. Multiword Expressions (MWEs), on the other hand, are linguistic phenomena with varying degrees of semantic opacity and their identification poses a challenge to computational models. This work is the first attempt at analysing the interplay of metaphor and MWEs processing through the design of a neural architecture whereby classification of metaphors is enhanced by informing the model of the presence of MWEs. To the best of our knowledge, this is the first “MWE-aware” metaphor identification system paving the way for further experiments on the complex interactions of these phenomena. The results and analyses show that this proposed architecture reach state-of-the-art on two different established metaphor datasets.
    • A first dataset for film age appropriateness investigation

      Mohamed, Emad; Ha, Le An (LREC, 2020-05-13)
    • Detecting high-functioning autism in adults using eye tracking and machine learning

      Yaneva, Victoria; Ha, Le An; Eraslan, Sukru; Yesilada, Yeliz; Mitkov, Ruslan (Institute of Electrical and Electronics Engineers (IEEE), 2020-04-30)
      The purpose of this study is to test whether visual processing differences between adults with and without highfunctioning autism captured through eye tracking can be used to detect autism. We record the eye movements of adult participants with and without autism while they look for information within web pages. We then use the recorded eye-tracking data to train machine learning classifiers to detect the condition. The data was collected as part of two separate studies involving a total of 71 unique participants (31 with autism and 40 control), which enabled the evaluation of the approach on two separate groups of participants, using different stimuli and tasks. We explore the effects of a number of gaze-based and other variables, showing that autism can be detected automatically with around 74% accuracy. These results confirm that eye-tracking data can be used for the automatic detection of high-functioning autism in adults and that visual processing differences between the two groups exist when processing web pages.
    • Autism detection based on eye movement sequences on the web: a scanpath trend analysis approach

      Eraslan, Sukru; Yesilada, Yeliz; Yaneva, Victoria; Harper, Simon; Duarte, Carlos; Drake, Ted; Hwang, Faustina; Lewis, Clayton (ACM, 2020-04-20)
      Autism diagnostic procedure is a subjective, challenging and expensive procedure and relies on behavioral, historical and parental report information. In our previous, we proposed a machine learning classifier to be used as a potential screening tool or used in conjunction with other diagnostic methods, thus aiding established diagnostic methods. The classifier uses eye movements of people on web pages but it only considers non-sequential data. It achieves the best accuracy by combining data from several web pages and it has varying levels of accuracy on different web pages. In this present paper, we investigate whether it is possible to detect autism based on eye-movement sequences and achieve stable accuracy across different web pages to be not dependent on specific web pages. We used Scanpath Trend Analysis (STA) which is designed for identifying a trending path of a group of users on a web page based on their eye movements. We first identify trending paths of people with autism and neurotypical people. To detect whether or not a person has autism, we calculate the similarity of his/her path to the trending paths of people with autism and neurotypical people. If the path is more similar to the trending path of neurotypical people, we classify the person as a neurotypical person. Otherwise, we classify her/him as a person with autism. We systematically evaluate our approach with an eye-tracking dataset of 15 verbal and highly-independent people with autism and 15 neurotypical people on six web pages. Our evaluation shows that the STA approach performs better on individual web pages and provides more stable accuracy across different pages.
    • Using natural language processing to predict item response times and improve test construction

      Baldwin, Peter; Yaneva, Victoria; Mee, Janet; Clauser, Brian E; Ha, Le An (Wiley, 2020-02-24)
      In this article, it is shown how item text can be represented by (a) 113 features quantifying the text's linguistic characteristics, (b) 16 measures of the extent to which an information‐retrieval‐based automatic question‐answering system finds an item challenging, and (c) through dense word representations (word embeddings). Using a random forests algorithm, these data then are used to train a prediction model for item response times and predicted response times then are used to assemble test forms. Using empirical data from the United States Medical Licensing Examination, we show that timing demands are more consistent across these specially assembled forms than across forms comprising randomly‐selected items. Because an exam's timing conditions affect examinee performance, this result has implications for exam fairness whenever examinees are compared with each other or against a common standard.
    • “Keep it simple!”: an eye-tracking study for exploring complexity and distinguishability of web pages for people with autism

      Eraslan, Sukru; Yesilada, Yeliz; Yaneva, Victoria; Ha, Le An (Springer Science and Business Media LLC, 2020-02-03)
      A major limitation of the international well-known standard web accessibility guidelines for people with cognitive disabilities is that they have not been empirically evaluated by using relevant user groups. Instead, they aim to anticipate issues that may arise following the diagnostic criteria. In this paper, we address this problem by empirically evaluating two of the most popular guidelines related to the visual complexity of web pages and the distinguishability of web-page elements. We conducted a comparative eye-tracking study with 19 verbal and highly independent people with autism and 19 neurotypical people on eight web pages with varying levels of visual complexity and distinguishability, with synthesis and browsing tasks. Our results show that people with autism have a higher number of fixations and make more transitions with synthesis tasks. When we consider the number of elements which are not related to given tasks, our analysis shows that they look at more irrelevant elements while completing the synthesis task on visually complex pages or on pages whose elements are not easily distinguishable. To the best of our knowledge, this is the first empirical behavioural study which evaluates these guidelines by showing that the high visual complexity of pages or the low distinguishability of page elements causes non-equivalent experience for people with autism.
    • A report on the Third VarDial evaluation campaign

      Zampieri, Marcos; Malmasi, Shervin; Scherrer, Yves; Samardžić, Tanja; Tyers, Francis; Silfverberg, Miikka; Klyueva, Natalia; Pan, Tung-Le; Huang, Chu-Ren; Ionescu, Radu Tudor; et al. (Association for Computational Linguistics, 2019-12-31)
      In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019. This year, the campaign included five shared tasks, including one task re-run – German Dialect Identification (GDI) – and four new tasks – Cross-lingual Morphological Analysis (CMA), Discriminating between Mainland and Taiwan variation of Mandarin Chinese (DMT), Moldavian vs. Romanian Cross-dialect Topic identification (MRC), and Cuneiform Language Identification (CLI). A total of 22 teams submitted runs across the five shared tasks. After the end of the competition, we received 14 system description papers, which are published in the VarDial workshop proceedings and referred to in this report.
    • RGCL at IDAT: deep learning models for irony detection in Arabic language

      Ranasinghe, Tharindu; Saadany, Hadeel; Plum, Alistair; Mandhari, Salim; Mohamed, Emad; Orasan, Constantin; Mitkov, Ruslan (IDAT, 2019-12-12)
      This article describes the system submitted by the RGCL team to the IDAT 2019 Shared Task: Irony Detection in Arabic Tweets. The system detects irony in Arabic tweets using deep learning. The paper evaluates the performance of several deep learning models, as well as how text cleaning and text pre-processing influence the accuracy of the system. Several runs were submitted. The highest F1 score achieved for one of the submissions was 0.818 making the team RGCL rank 4th out of 10 teams in final results. Overall, we present a system that uses minimal pre-processing but capable of achieving competitive results.
    • RGCL at GermEval 2019: offensive language detection with deep learning

      Plum, A; Ranasinghe, Tharindu; Orasan, Constantin; Mitkov, R (German Society for Computational Linguistics & Language Technology, 2019-10-08)
      This paper describes the system submitted by the RGCL team to GermEval 2019 Shared Task 2: Identification of Offensive Language. We experimented with five different neural network architectures in order to classify Tweets in terms of offensive language. By means of comparative evaluation, we select the best performing for each of the three subtasks. Overall, we demonstrate that using only minimal preprocessing we are able to obtain competitive results.
    • Automatic summarisation: 25 years On

      Orăsan, Constantin (Cambridge University Press (CUP), 2019-09-19)
      Automatic text summarisation is a topic that has been receiving attention from the research community from the early days of computational linguistics, but it really took off around 25 years ago. This article presents the main developments from the last 25 years. It starts by defining what a summary is and how its definition changed over time as a result of the interest in processing new types of documents. The article continues with a brief history of the field and highlights the main challenges posed by the evaluation of summaries. The article finishes with some thoughts about the future of the field.
    • Do online resources give satisfactory answers to questions about meaning and phraseology?

      Hanks, Patrick; Franklin, Emma (Springer, 2019-09-18)
      In this paper we explore some aspects of the differences between printed paper dictionaries and online dictionaries in the ways in which they explain meaning and phraseology. After noting the importance of the lexicon as an inventory of linguistic items and the neglect in both linguistics and lexicography of phraseological aspects of that inventory, we investigate the treatment in online resources of phraseology – in particular, the phrasal verbs wipe out and put down – and we go on to investigate a word, dope, that has undergone some dramatic meaning changes during the 20th century. In the course of discussion, we mention the new availability of corpus evidence and the technique of Corpus Pattern Analysis, which is important for linking phraseology and meaning and distinguishing normal phraseology from rare and unusual phraseology. The online resources that we discuss include Google, the Urban Dictionary (UD), and Wiktionary.
    • Profiling idioms: a sociolexical approach to the study of phraseological patterns

      Moze, Sara; Mohamed, Emad (Springer, 2019-09-18)
      This paper introduces a novel approach to the study of lexical and pragmatic meaning called ‘sociolexical profiling’, which aims at correlating the use of lexical items with author-attributed demographic features, such as gender, age, profession, and education. The approach was applied to a case study of a set of English idioms derived from the Pattern Dictionary of English Verbs (PDEV), a corpus-driven lexical resource which defines verb senses in terms of the phraseological patterns in which a verb typically occurs. For each selected idiom, a gender profile was generated based on data extracted from the Blog Authorship Corpus (BAC) in order to establish whether any statistically significant differences can be detected in the way men and women use idioms in every-day communication. A quantitative and qualitative analysis of the gender profiles was subsequently performed, enabling us to test the validity of the proposed approach. If performed on a large scale, we believe that sociolexical profiling will have important implications for several areas of research, including corpus lexicography, translation, creative writing, forensic linguistics, and natural language processing.
    • The reading background of Goodreads book club members: A female fiction canon?

      Thelwall, Mike; Bourrier, Karen (Emerald, 2019-09-09)
      Purpose - Despite the social, educational and therapeutic benefits of book clubs, little is known about which books participants are likely to have read. In response, this article investigates the public bookshelves of those that have joined a group within the Goodreads social network site. Design/methodology/approach – Books listed as read by members of fifty large English language Goodreads groups - with a genre focus or other theme - were compiled by author and title. Findings – Recent and youth-oriented fiction dominate the fifty books most read by book club members, while almost half are works of literature frequently taught at the secondary and postsecondary level (literary classics). Whilst JK Rowling is almost ubiquitous (at least 63% as frequently listed as other authors in any group, including groups for other genres), most authors, including Shakespeare (15%), Goulding (6%) and Hemmingway (9%), are little read by some groups. Nor are individual recent literary prize-winners or works in languages other than English frequently read. Research limitations/implications – Although these results are derived from a single popular website, knowing more about what book club members are likely to have read should help participants, organisers and moderators. For example, recent literary prize winners might be a good choice, given that few members may have read them. Originality/value – This is the first large scale study of book group members’ reading patterns. Whilst typical reading is likely to vary by group theme and average age, there seems to be a mainly female canon of about 14 authors and 19 books that Goodreads book club members are likely to have read.
    • Large-scale data harvesting for biographical data

      Plum, Alistair; Zampieri, Marcos; Orasan, Constantin; Wandl-Vogt, Eveline; Mitkov, R (CEUR, 2019-09-05)
      This paper explores automatic methods to identify relevant biography candidates in large databases, and extract biographical information from encyclopedia entries and databases. In this work, relevant candidates are defined as people who have made an impact in a certain country or region within a pre-defined time frame. We investigate the case of people who had an impact in the Republic of Austria and died between 1951 and 2019. We use Wikipedia and Wikidata as data sources and compare the performance of our information extraction methods on these two databases. We demonstrate the usefulness of a natural language processing pipeline to identify suitable biography candidates and, in a second stage, extract relevant information about them. Even though they are considered by many as an identical resource, our results show that the data from Wikipedia and Wikidata differs in some cases and they can be used in a complementary way providing more data for the compilation of biographies.
    • Automatic question answering for medical MCQs: Can it go further than information retrieval?

      Ha, Le An; Yaneva, Viktoriya (RANLP, 2019-09-04)
      We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require training data. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.
    • Sentence simplification for semantic role labelling and information extraction

      Evans, Richard; Orasan, Constantin (RANLP, 2019-09-02)
      In this paper, we report on the extrinsic evaluation of an automatic sentence simplification method with respect to two NLP tasks: semantic role labelling (SRL) and information extraction (IE). The paper begins with our observation of challenges in the intrinsic evaluation of sentence simplification systems, which motivates the use of extrinsic evaluation of these systems with respect to other NLP tasks. We describe the two NLP systems and the test data used in the extrinsic evaluation, and present arguments and evidence motivating the integration of a sentence simplification step as a means of improving the accuracy of these systems. Our evaluation reveals that their performance is improved by the simplification step: the SRL system is better able to assign semantic roles to the majority of the arguments of verbs and the IE system is better able to identify fillers for all IE template slots.
    • Semantic textual similarity with siamese neural networks

      Orasan, Constantin; Mitkov, Ruslan; Ranasinghe, Tharindu (RANLP, 2019-09-02)
      Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. This paper evaluates Siamese recurrent architectures, a special type of neural networks, which are used here to measure STS. Several variants of the architecture are compared with existing methods
    • Enhancing unsupervised sentence similarity methods with deep contextualised word representations

      Ranashinghe, Tharindu; Orasan, Constantin; Mitkov, Ruslan (RANLP, 2019-09-02)
      Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. All modern state of the art STS methods rely on word embeddings one way or another. The recently introduced contextualised word embeddings have proved more effective than standard word embeddings in many natural language processing tasks. This paper evaluates the impact of several contextualised word embeddings on unsupervised STS methods and compares it with the existing supervised/unsupervised STS methods for different datasets in different languages and different domains.
    • A survey of the perceived text adaptation needs of adults with autism

      Yaneva, Viktoriya; Orasan, Constantin; Ha, L; Ponomareva, Natalia (RANLP, 2019-09-02)
      NLP approaches to automatic text adaptation often rely on user-need guidelines which are generic and do not account for the differences between various types of target groups. One such group are adults with high-functioning autism, who are usually able to read long sentences and comprehend difficult words but whose comprehension may be impeded by other linguistic constructions. This is especially challenging for real-world usergenerated texts such as product reviews, which cannot be controlled editorially and are thus in a stronger need of automatic adaptation. To address this problem, we present a mixedmethods survey conducted with 24 adult webusers diagnosed with autism and an agematched control group of 33 neurotypical participants. The aim of the survey is to identify whether the group with autism experiences any barriers when reading online reviews, what these potential barriers are, and what NLP methods would be best suited to improve the accessibility of online reviews for people with autism. The group with autism consistently reported significantly greater difficulties with understanding online product reviews compared to the control group and identified issues related to text length, poor topic organisation, identifying the intention of the author, trustworthiness, and the use of irony, sarcasm and exaggeration.
    • Toponym detection in the bio-medical domain: A hybrid approach with deep learning

      Plum, Alistair; Ranasinghe, Tharindu; Orăsan, Constantin (RANLP, 2019-09-02)
      This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts. We compare five different state-of-the-art machine learning classifiers in order to predict whether a sentence contains a location or not. Following this classification task, we use a string matching algorithm with a gazetteer to identify the exact index of a toponym within the sentence. We evaluate different approaches in terms of machine learning classifiers, text pre-processing and location extraction on the SemEval-2019 Task 12 dataset, compiled for toponym resolution in the bio-medical domain. Finally, we compare the results with our system that was previously submitted to the SemEval-2019 task evaluation.