• Adults with High-functioning Autism Process Web Pages With Similar Accuracy but Higher Cognitive Effort Compared to Controls

      Yaneva, Victoria; Ha, Le; Eraslan, Sukru; Yesilada, Yeliz (ACM, 2019-05-31)
      To accommodate the needs of web users with high-functioning autism, a designer's only option at present is to rely on guidelines that: i) have not been empirically evaluated and ii) do not account for the di erent levels of autism severity. Before designing effective interventions, we need to obtain an empirical understanding of the aspects that speci c user groups need support with. This has not yet been done for web users at the high ends of the autism spectrum, as often they appear to execute tasks effortlessly, without facing barriers related to their neurodiverse processing style. This paper investigates the accuracy and efficiency with which high-functioning web users with autism and a control group of neurotypical participants obtain information from web pages. Measures include answer correctness and a number of eye-tracking features. The results indicate similar levels of accuracy for the two groups at the expense of efficiency for the autism group, showing that the autism group invests more cognitive effort in order to achieve the same results as their neurotypical counterparts.
    • Arabic-SOS: Segmentation, stemming, and orthography standardization for classical and pre-modern standard Arabic

      Mohamed, Emad; Sayed, Zeeshan (ACM, 2019-05-31)
      While morphological segmentation has always been a hot topic in Arabic, due to the morphological complexity of the language and the orthography, most effort has focused on Modern Standard Arabic. In this paper, we focus on pre-MSA texts. We use the Gradient Boosting algorithm to train a morphological segmenter with a corpus derived from Al-Manar, a late 19th/early 20th century magazine that focused on the Arabic and Islamic heritage. Since most of the cultural heritage Arabic available suffers from substandard orthography, we have trained a machine learner to standardize the text. Our segmentation accuracy reaches 98.47%, and the orthography standardization an F-macro of 0.98 and an F-micro of 0.99. We also produce stemming as a by-product of segmentation.
    • Autism detection based on eye movement sequences on the web: a scanpath trend analysis approach

      Eraslan, Sukru; Yesilada, Yeliz; Yaneva, Victoria; Harper, Simon; Duarte, Carlos; Drake, Ted; Hwang, Faustina; Lewis, Clayton (ACM, 2020-04-20)
      Autism diagnostic procedure is a subjective, challenging and expensive procedure and relies on behavioral, historical and parental report information. In our previous, we proposed a machine learning classifier to be used as a potential screening tool or used in conjunction with other diagnostic methods, thus aiding established diagnostic methods. The classifier uses eye movements of people on web pages but it only considers non-sequential data. It achieves the best accuracy by combining data from several web pages and it has varying levels of accuracy on different web pages. In this present paper, we investigate whether it is possible to detect autism based on eye-movement sequences and achieve stable accuracy across different web pages to be not dependent on specific web pages. We used Scanpath Trend Analysis (STA) which is designed for identifying a trending path of a group of users on a web page based on their eye movements. We first identify trending paths of people with autism and neurotypical people. To detect whether or not a person has autism, we calculate the similarity of his/her path to the trending paths of people with autism and neurotypical people. If the path is more similar to the trending path of neurotypical people, we classify the person as a neurotypical person. Otherwise, we classify her/him as a person with autism. We systematically evaluate our approach with an eye-tracking dataset of 15 verbal and highly-independent people with autism and 15 neurotypical people on six web pages. Our evaluation shows that the STA approach performs better on individual web pages and provides more stable accuracy across different pages.
    • A cascaded unsupervised model for PoS tagging

      Bölücü, Necva; Can, Burcu (ACM, 2021-03-31)
      Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing (NLP), that assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective etc). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g. dependency parsing) and thereby extract the meaning of the sentence (e.g. semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.
    • Scientific web intelligence: finding relationships in university webs

      Thelwall, Mike (ACM, 2005)
      Methods for analyzing university Web sites demonstrate strong patterns that can reveal interconnections between research fields.
    • Tuning language representation models for classification of Turkish news

      Tokgöz, Meltem; Turhan, Fatmanur; Bölücü, Necva; Can, Burcu (ACM, 2021-02-19)
      Pre-trained language representation models are very efficient in learning language representation independent from natural language processing tasks to be performed. The language representation models such as BERT and DistilBERT have achieved amazing results in many language understanding tasks. Studies on text classification problems in the literature are generally carried out for the English language. This study aims to classify the news in the Turkish language using pre-trained language representation models. In this study, we utilize BERT and DistilBERT by tuning both models for the text classification task to learn the categories of Turkish news with different tokenization methods. We provide a quantitative analysis of the performance of BERT and DistilBERT on the Turkish news dataset by comparing the models in terms of their representation capability in the text classification task. The highest performance is obtained with DistilBERT with an accuracy of 97.4%.