Recent Submissions

  • Autism detection based on eye movement sequences on the web: a scanpath trend analysis approach

    Eraslan, Sukru; Yesilada, Yeliz; Yaneva, Victoria; Harper, Simon; Duarte, Carlos; Drake, Ted; Hwang, Faustina; Lewis, Clayton (ACM, 2020-04-20)
    Autism diagnostic procedure is a subjective, challenging and expensive procedure and relies on behavioral, historical and parental report information. In our previous, we proposed a machine learning classifier to be used as a potential screening tool or used in conjunction with other diagnostic methods, thus aiding established diagnostic methods. The classifier uses eye movements of people on web pages but it only considers non-sequential data. It achieves the best accuracy by combining data from several web pages and it has varying levels of accuracy on different web pages. In this present paper, we investigate whether it is possible to detect autism based on eye-movement sequences and achieve stable accuracy across different web pages to be not dependent on specific web pages. We used Scanpath Trend Analysis (STA) which is designed for identifying a trending path of a group of users on a web page based on their eye movements. We first identify trending paths of people with autism and neurotypical people. To detect whether or not a person has autism, we calculate the similarity of his/her path to the trending paths of people with autism and neurotypical people. If the path is more similar to the trending path of neurotypical people, we classify the person as a neurotypical person. Otherwise, we classify her/him as a person with autism. We systematically evaluate our approach with an eye-tracking dataset of 15 verbal and highly-independent people with autism and 15 neurotypical people on six web pages. Our evaluation shows that the STA approach performs better on individual web pages and provides more stable accuracy across different pages.
  • Detecting high-functioning autism in adults using eye tracking and machine learning

    Yaneva, Victoria; Ha, Le An; Eraslan, Sukru; Yesilada, Yeliz; Mitkov, Ruslan (Institute of Electrical and Electronics Engineers (IEEE), 2020-04-30)
    The purpose of this study is to test whether visual processing differences between adults with and without highfunctioning autism captured through eye tracking can be used to detect autism. We record the eye movements of adult participants with and without autism while they look for information within web pages. We then use the recorded eye-tracking data to train machine learning classifiers to detect the condition. The data was collected as part of two separate studies involving a total of 71 unique participants (31 with autism and 40 control), which enabled the evaluation of the approach on two separate groups of participants, using different stimuli and tasks. We explore the effects of a number of gaze-based and other variables, showing that autism can be detected automatically with around 74% accuracy. These results confirm that eye-tracking data can be used for the automatic detection of high-functioning autism in adults and that visual processing differences between the two groups exist when processing web pages.
  • Verbal multiword expressions for identification of metaphor

    Rohanian, Omid; Rei, Marek; Taslimipoor, Shiva; Ha, Le (ACL, 2020-07-06)
    Metaphor is a linguistic device in which a concept is expressed by mentioning another. Identifying metaphorical expressions, therefore, requires a non-compositional understanding of semantics. Multiword Expressions (MWEs), on the other hand, are linguistic phenomena with varying degrees of semantic opacity and their identification poses a challenge to computational models. This work is the first attempt at analysing the interplay of metaphor and MWEs processing through the design of a neural architecture whereby classification of metaphors is enhanced by informing the model of the presence of MWEs. To the best of our knowledge, this is the first “MWE-aware” metaphor identification system paving the way for further experiments on the complex interactions of these phenomena. The results and analyses show that this proposed architecture reach state-of-the-art on two different established metaphor datasets.
  • Cross-lingual transfer learning and multitask learning for capturing multiword expressions

    Taslimipoor, Shiva; Rohanian, Omid; Ha, Le An (Association for Computational Linguistics, 2019-08-31)
    Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches.
  • Crossing the border between postcolonial reality and the ‘outer world’: Translation and representation of the third space into a fourth space

    Fernández Ruiz, María Remedios; Corpas Pastor, Gloria; Seghiri, Míriam (Universitat Jaume I, 2019-03-31)
    Stemming from poststructuralist interpretations of space and following Bhabha’s third space enunciation, in this paper we have coined the term fourth space and used this concept as a heuristic tool to address the need to establish a coherent standpoint for the analysis of postcolonial literature reception within a society with no immediate relation to the specific decolonisation process of the author’s country. We explore this concept through the case of the Spanish reception of African postcolonial literature. In Spain, this perspective has remained under-theorised in an era when representation of hybridity is at a vital point, since such representation will provide the social scaffolding for each person’s identity construction. Under these circumstances, literature can be transformative and the role of translation as a decolonising tool can help to create unbiased knowledge through an ethical interpretation of the original texts. We will analyse how those differentiating elements affect the translational process.
  • NLP-enhanced self-study learning materials for quality healthcare in Europe

    Urbano Mendaña, Míriam; Corpas Pastor, Gloria; Seghiri Domínguez, Míriam; Aguado de Cea, G; Aussenac-Gilles, N; Nazarenko, A; Szulman, S (Université Paris 13, 2013-10)
    In this paper we present an overview of the TELL-ME project, which aims to develop innovative e-learning tools and self-study materials for teaching vocationally-specific languages to healthcare professionals, helping them to communicate at work. The TELL-ME e-learning platform incorporates a variety of NLP techniques to provide an array of diverse work-related exercises, selfassessment tools and an interactive dictionary of key vocabulary and concepts aimed at medics for Spanish, English and German. A prototype of the e-learning platform is currently under evaluation.
  • Object and subject Heavy-NP shift in Arabic

    Mohamed, Emad (Research in Corpus Linguistics, 2014-12-31)
    In order to examine whether Arabic has Heavy Noun Phrase Shifting (HNPS), I have extracted from the Prague Arabic Dependency Treebank a data set in which a verb governs either an object NP and an Adjunct Phrase (PP or AdvP) or a subject NP and an Adjunct Phrase. I have used binary logistic regression where the criterion variable is whether the subject/object NP shifts, and used as predictor variables heaviness (the number of tokens per NP, adjunct), part of speech tag, verb disposition (ie. whether the verb has a history of taking double objects or sentential objects), NP number, NP definiteness, and the presence of referring pronouns in either the NP or the adjunct. The results show that only object heaviness and adjunct heaviness are useful predictors of object HNPS, while subject heaviness, adjunct heaviness, subject part of speech tag, definiteness, and adjunct head POS tags are active predictors of subject HNPS. I also show that HNPS can in principle be predicted from sentence structure.
  • Arabic-SOS: Segmentation, stemming, and orthography standardization for classical and pre-modern standard Arabic

    Mohamed, Emad; Sayed, Zeeshan (ACM, 2019-05-31)
    While morphological segmentation has always been a hot topic in Arabic, due to the morphological complexity of the language and the orthography, most effort has focused on Modern Standard Arabic. In this paper, we focus on pre-MSA texts. We use the Gradient Boosting algorithm to train a morphological segmenter with a corpus derived from Al-Manar, a late 19th/early 20th century magazine that focused on the Arabic and Islamic heritage. Since most of the cultural heritage Arabic available suffers from substandard orthography, we have trained a machine learner to standardize the text. Our segmentation accuracy reaches 98.47%, and the orthography standardization an F-macro of 0.98 and an F-micro of 0.99. We also produce stemming as a by-product of segmentation.
  • Using natural language processing to predict item response times and improve test construction

    Baldwin, Peter; Yaneva, Victoria; Mee, Janet; Clauser, Brian E; Ha, Le An (Wiley, 2020-02-24)
    In this article, it is shown how item text can be represented by (a) 113 features quantifying the text's linguistic characteristics, (b) 16 measures of the extent to which an information‐retrieval‐based automatic question‐answering system finds an item challenging, and (c) through dense word representations (word embeddings). Using a random forests algorithm, these data then are used to train a prediction model for item response times and predicted response times then are used to assemble test forms. Using empirical data from the United States Medical Licensing Examination, we show that timing demands are more consistent across these specially assembled forms than across forms comprising randomly‐selected items. Because an exam's timing conditions affect examinee performance, this result has implications for exam fairness whenever examinees are compared with each other or against a common standard.
  • A first dataset for film age appropriateness investigation

    Mohamed, Emad; Ha, Le An (LREC, 2020-05-13)
  • “Keep it simple!”: an eye-tracking study for exploring complexity and distinguishability of web pages for people with autism

    Eraslan, Sukru; Yesilada, Yeliz; Yaneva, Victoria; Ha, Le An (Springer Science and Business Media LLC, 2020-02-03)
    A major limitation of the international well-known standard web accessibility guidelines for people with cognitive disabilities is that they have not been empirically evaluated by using relevant user groups. Instead, they aim to anticipate issues that may arise following the diagnostic criteria. In this paper, we address this problem by empirically evaluating two of the most popular guidelines related to the visual complexity of web pages and the distinguishability of web-page elements. We conducted a comparative eye-tracking study with 19 verbal and highly independent people with autism and 19 neurotypical people on eight web pages with varying levels of visual complexity and distinguishability, with synthesis and browsing tasks. Our results show that people with autism have a higher number of fixations and make more transitions with synthesis tasks. When we consider the number of elements which are not related to given tasks, our analysis shows that they look at more irrelevant elements while completing the synthesis task on visually complex pages or on pages whose elements are not easily distinguishable. To the best of our knowledge, this is the first empirical behavioural study which evaluates these guidelines by showing that the high visual complexity of pages or the low distinguishability of page elements causes non-equivalent experience for people with autism.
  • Trouble on the road: Finding reasons for commuter stress from tweets

    Gopalakrishna Pillai, Reshmi; Thelwall, Mike; Orasan, Constantin (Association for Computational Linguistics, 2018-11-30)
    Intelligent Transportation Systems could benefit from harnessing social media content to get continuous feedback. In this work, we implement a system to identify reasons for stress in tweets related to traffic using a word vector strategy to select a reason from a predefined list generated by topic modeling and clustering. The proposed system, which performs better than standard machine learning algorithms, could provide inputs to warning systems for commuters in the area and feedback for the authorities.
  • Three kinds of semantic resonance

    Hanks, Patrick (Ivane Javakhishvili Tbilisi University Press, 2016-09-06)
    This presentation suggests some reasons why lexicographers of the future will need to pay more attention to phraseology and non-literal meaning. It argues that not only do words have literal meaning, but also that much meaning is non-literal, being lexical, i.e. metaphorical or figurative, experiential, or intertextual.
  • RGCL at GermEval 2019: offensive language detection with deep learning

    Plum, A; Ranasinghe, Tharindu; Orasan, Constantin; Mitkov, R (German Society for Computational Linguistics & Language Technology, 2019-10-08)
    This paper describes the system submitted by the RGCL team to GermEval 2019 Shared Task 2: Identification of Offensive Language. We experimented with five different neural network architectures in order to classify Tweets in terms of offensive language. By means of comparative evaluation, we select the best performing for each of the three subtasks. Overall, we demonstrate that using only minimal preprocessing we are able to obtain competitive results.
  • RGCL at IDAT: deep learning models for irony detection in Arabic language

    Ranasinghe, Tharindu; Saadany, Hadeel; Plum, Alistair; Mandhari, Salim; Mohamed, Emad; Orasan, Constantin; Mitkov, Ruslan (IDAT, 2019-12-12)
    This article describes the system submitted by the RGCL team to the IDAT 2019 Shared Task: Irony Detection in Arabic Tweets. The system detects irony in Arabic tweets using deep learning. The paper evaluates the performance of several deep learning models, as well as how text cleaning and text pre-processing influence the accuracy of the system. Several runs were submitted. The highest F1 score achieved for one of the submissions was 0.818 making the team RGCL rank 4th out of 10 teams in final results. Overall, we present a system that uses minimal pre-processing but capable of achieving competitive results.
  • Large-scale data harvesting for biographical data

    Plum, Alistair; Zampieri, Marcos; Orasan, Constantin; Wandl-Vogt, Eveline; Mitkov, R (CEUR, 2019-09-05)
    This paper explores automatic methods to identify relevant biography candidates in large databases, and extract biographical information from encyclopedia entries and databases. In this work, relevant candidates are defined as people who have made an impact in a certain country or region within a pre-defined time frame. We investigate the case of people who had an impact in the Republic of Austria and died between 1951 and 2019. We use Wikipedia and Wikidata as data sources and compare the performance of our information extraction methods on these two databases. We demonstrate the usefulness of a natural language processing pipeline to identify suitable biography candidates and, in a second stage, extract relevant information about them. Even though they are considered by many as an identical resource, our results show that the data from Wikipedia and Wikidata differs in some cases and they can be used in a complementary way providing more data for the compilation of biographies.
  • Automatic question answering for medical MCQs: Can it go further than information retrieval?

    Ha, Le An; Yaneva, Viktoriya (RANLP, 2019-09-04)
    We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require training data. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.
  • Automatic summarisation: 25 years On

    Orăsan, Constantin (Cambridge University Press (CUP), 2019-09-19)
    Automatic text summarisation is a topic that has been receiving attention from the research community from the early days of computational linguistics, but it really took off around 25 years ago. This article presents the main developments from the last 25 years. It starts by defining what a summary is and how its definition changed over time as a result of the interest in processing new types of documents. The article continues with a brief history of the field and highlights the main challenges posed by the evaluation of summaries. The article finishes with some thoughts about the future of the field.
  • A survey of the perceived text adaptation needs of adults with autism

    Yaneva, Viktoriya; Orasan, Constantin; Ha, L; Ponomareva, Natalia (RANLP, 2019-09-02)
    NLP approaches to automatic text adaptation often rely on user-need guidelines which are generic and do not account for the differences between various types of target groups. One such group are adults with high-functioning autism, who are usually able to read long sentences and comprehend difficult words but whose comprehension may be impeded by other linguistic constructions. This is especially challenging for real-world usergenerated texts such as product reviews, which cannot be controlled editorially and are thus in a stronger need of automatic adaptation. To address this problem, we present a mixedmethods survey conducted with 24 adult webusers diagnosed with autism and an agematched control group of 33 neurotypical participants. The aim of the survey is to identify whether the group with autism experiences any barriers when reading online reviews, what these potential barriers are, and what NLP methods would be best suited to improve the accessibility of online reviews for people with autism. The group with autism consistently reported significantly greater difficulties with understanding online product reviews compared to the control group and identified issues related to text length, poor topic organisation, identifying the intention of the author, trustworthiness, and the use of irony, sarcasm and exaggeration.
  • Semantic textual similarity with siamese neural networks

    Orasan, Constantin; Mitkov, Ruslan; Ranasinghe, Tharindu (RANLP, 2019-09-02)
    Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. This paper evaluates Siamese recurrent architectures, a special type of neural networks, which are used here to measure STS. Several variants of the architecture are compared with existing methods

View more