• Management of 201 individuals with emotionally unstable personality disorders: A naturalistic observational study in real-world inpatient setting

      Shahpesandy, Homayun; Mohammed-Ali, Rosemary; Oakes, Michael; Al-Kubaisy, Tarik; Cheetham, Anna; Anene, Moses; The Hartsholme Centre, Long Leys Road, Lincoln, LN1 1FS, Lincolnshire NHS Foundation Trust, UK. (Maghira & Maas Publications, 2021-06-03)
      BACKGROUND: Emotionally unstable personality disorder (EUPD) is a challenging condition with a prevalence of 20% in inpatient services. Psychotherapy is the preferred treatment; nevertheless, off-license medications are widely used. OBJECTIVES: To identify socio-demographics, clinical and service-delivery characteristics of people with EUPD admitted to inpatient services between 1st January 2017 and 31st December 2018. METHODS: A retrospective review using data from patients' records. Individuals, age 18-65 were included. Statistical analysis was conducted by the Mann-Whitney-Wilcoxon test and Chi-squared test with Yates's continuity correction. RESULTS: Of 1646 inpatients, 201 (12.2%); had the diagnosis of EUPD; 133 (66.0%) women, 68 (44.0%). EUPD was significantly (P < .001) more prevalent in women (18.2%) than men (7.4%). EUPD patients were significantly (P < .001) younger (32.2 years) than patients without EUPD (46 years), and had significantly (P < .001) more admissions (1.74) than patients without EUPD (1.2 admission). 70.5% of patients had one and 17.0% two Axis-I psychiatric co-morbidities. Substance use was significantly (P < .001) more often in men (57.3%) than in women (28.5%). Significantly (P = 0.047) more women (68.4%) than men (53.0%) reported sexual abuse. 87.5% used polypharmacy. Antidepressants were significantly (P = 0.02) often prescribed to women (76.6%) than men (69.1%). Significantly (P = 0.02) more women (83.5%) than men (67.6%) were on antipsychotics. 57.2% of the patients were on anxiolytics, 40.0% on hypnotics and 25.8% on mood stabilisers. CONCLUSION: EUPD is a complex condition with widespread comorbidity. The term EUPD, Borderline Personality Disorder is unsuitable, stigmatising and too simplistic to reflect the nature, gravity and psychopathology of this syndrome.
    • The Matecat Tool

      Federico, Marcello; Bertoldi, Nicola; Cettolo, Mauro; Negri, Matteo; Turchi, Marco; Trombetti, Marco; Cattelan, Alessandro; Farina, Antonio; Lupinetti, Domenico; Marines, Andrea; et al. (Dublin City University and Association for Computational Linguistics, 2014-08-31)
      We present a new web-based CAT tool providing translators with a professional work environment, integrating translation memories, terminology bases, concordancers, and machine translation. The tool is completely developed as open source software and has been already successfully deployed for business, research and education. The MateCat Tool represents today probably the best available open source platform for investigating, integrating, and evaluating under realistic conditions the impact of new machine translation technology on human post-editing.
    • Mendeley readership altmetrics for medical articles: An analysis of 45 fields

      Wilson, Paul; Thelwall, Mike; Statistical Cybermetrics Research Group; School of Mathematics and Computer Science; University of Wolverhampton; Wulfruna Street Wolverhampton WV1 1LY UK; Statistical Cybermetrics Research Group; School of Mathematics and Computer Science; University of Wolverhampton; Wulfruna Street Wolverhampton WV1 1LY UK (Wiley Blackwell, 2015-05)
      2330-1643
    • Methodologies for crawler based Web surveys.

      Thelwall, Mike (MCB UP Ltd, 2002)
      There have been many attempts to study the content of the Web, either through human or automatic agents. Describes five different previously used Web survey methodologies, each justifiable in its own right, but presents a simple experiment that demonstrates concrete differences between them. The concept of crawling the Web also bears further inspection, including the scope of the pages to crawl, the method used to access and index each page, and the algorithm for the identification of duplicate pages. The issues involved here will be well-known to many computer scientists but, with the increasing use of crawlers and search engines in other disciplines, they now require a public discussion in the wider research community. Concludes that any scientific attempt to crawl the Web must make available the parameters under which it is operating so that researchers can, in principle, replicate experiments or be aware of and take into account differences between methodologies. Also introduces a new hybrid random page selection methodology.
    • Methods and algorithms for unsupervised learning of morphology

      Can, Burcu; Manandhar, Suresh (Springer, 2014-12-31)
      This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.
    • Microsoft Academic automatic document searches: accuracy for journal articles and suitability for citation analysis

      Thelwall, Mike (Elsevier, 2017-11-22)
      Microsoft Academic is a free academic search engine and citation index that is similar to Google Scholar but can be automatically queried. Its data is potentially useful for bibliometric analysis if it is possible to search effectively for individual journal articles. This article compares different methods to find journal articles in its index by searching for a combination of title, authors, publication year and journal name and uses the results for the widest published correlation analysis of Microsoft Academic citation counts for journal articles so far. Based on 126,312 articles from 323 Scopus subfields in 2012, the optimal strategy to find articles with DOIs is to search for them by title and filter out those with incorrect DOIs. This finds 90% of journal articles. For articles without DOIs, the optimal strategy is to search for them by title and then filter out matches with dissimilar metadata. This finds 89% of journal articles, with an additional 1% incorrect matches. The remaining articles seem to be mainly not indexed by Microsoft Academic or indexed with a different language version of their title. From the matches, Scopus citation counts and Microsoft Academic counts have an average Spearman correlation of 0.95, with the lowest for any single field being 0.63. Thus, Microsoft Academic citation counts are almost universally equivalent to Scopus citation counts for articles that are not recent but there are national biases in the results.
    • Modeling morpheme triplets with a three-level hierarchical Dirichlet process

      Kumyol, Serkan; Can, Burcu (IEEE, 2017-03-13)
      Morphemes are not independent units and attached to each other based on morphotactics. However, they are assumed to be independent from each other to cope with the complexity in most of the models in the literature. We introduce a language independent model for unsupervised morphological segmentation using hierarchical Dirichlet process (HDP). We model the morpheme dependencies in terms of morpheme trigrams in each word. Trigrams, bigrams and unigrams are modeled within a three-level HDP, where the trigram Dirichlet process (DP) uses the bigram DP and bigram DP uses unigram DP as the base distribution. The results show that modeling morpheme dependencies improve the F-measure noticeably in English, Turkish and Finnish.
    • Monitoring Twitter strategies to discover resonating topics: The case of the UNDP

      Thelwall, Mike; Cugelman, Brian (EPI - El Profesional de la información., 2017-08-02)
      Many organizations use social media to attract supporters, disseminate information and advocate change. Services like Twitter can theoretically deliver messages to a huge audience that would be difficult to reach by other means. This article introduces a method to monitor an organization’s Twitter strategy and applies it to tweets from United Nations Development Programme (UNDP) accounts. The Resonating Topic Method uses automatic analyses with free software to detect successful themes within the organization’s tweets, categorizes the most successful tweets, and analyses a comparable organization to identify new successful strategies. In the case of UNDP tweets from November 2014 to March 2015, the results confirm the importance of official social media accounts as well as those of high profile individuals and general supporters. Official accounts seem to be more successful at encouraging action, which is a critical aspect of social media campaigning. An analysis of Oxfam found a successful social media approach that the UNDP had not adopted, showing the value of analyzing other organizations to find potential strategy gaps.
    • Motivations for academic web site interlinking: evidence for the Web as a novel source of information on informal scholarly communication

      Wilkinson, David; Harries, Gareth; Thelwall, Mike; Price, Liz (Sage, 2003)
      The need to understand authors’ motivations for creating links between university web sites is addressed by a survey of a random collection of 414 such links from the ac.uk domain. A classification scheme was created and applied to this collection. Obtaining inter-classifier agreement as to the single main link creation cause was very difficult because of multiple potential motivations and the fluidity of genre on the Web. Nevertheless, it was clear that, whilst the vast majority, over 90%, was created for broadly scholarly reasons, only two were equivalent to journal citations. It is concluded that academic web link metrics will be dominated by a range of informal types of scholarly communication. Since formal communication can be extensively studied through citation analysis, this provides an exciting new window through which to investigate a facet of a previously obscured type of communication activity.
    • Multi-document summarization of news articles using an event-based framework

      Ou, Shiyan; Khoo, Christopher S.G.; Goh, Dion H. (Emerald, 2006)
      Purpose – The purpose of this research is to develop a method for automatic construction of multi-document summaries of sets of news articles that might be retrieved by a web search engine in response to a user query. Design/methodology/approach – Based on the cross-document discourse analysis, an event-based framework is proposed for integrating and organizing information extracted from different news articles. It has a hierarchical structure in which the summarized information is presented at the top level and more detailed information given at the lower levels. A tree-view interface was implemented for displaying a multi-document summary based on the framework. A preliminary user evaluation was performed by comparing the framework-based summaries against the sentence-based summaries. Findings – In a small evaluation, all the human subjects preferred the framework-based summaries to the sentence-based summaries. It indicates that the event-based framework is an effective way to summarize a set of news articles reporting an event or a series of relevant events. Research limitations/implications – Limited to event-based news articles only, not applicable to news critiques and other kinds of news articles. A summarization system based on the event-based framework is being implemented. Practical implications – Multi-document summarization of news articles can adopt the proposed event-based framework. Originality/value – An event-based framework for summarizing sets of news articles was developed and evaluated using a tree-view interface for displaying such summaries.
    • Multimodal quality estimation for machine translation

      Okabe, Shu; Blain, Frédéric; Specia, Lucia (Association for Computational Linguistics, 2020-07)
      We propose approaches to Quality Estimation (QE) for Machine Translation that explore both text and visual modalities for Multimodal QE. We compare various multimodality integration and fusion strategies. For both sentence-level and document-level predictions, we show that state-of-the-art neural and feature-based QE frameworks obtain better results when using the additional modality.
    • Multiword units in machine translation and translation technology

      Ruslan, Mitkov; Monti, Johanna; Corpas Pastor, Gloria; Seretan, Violeta (John Benjamins, 2018-07-20)
      The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but we believe that there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. In this chapter, we present a survey of the field with particular reference to Machine Translation and Translation Technology.
    • Mutual terminology extraction using a statistical framework

      Ha, Le An; Mitkov, Ruslan; Pastor, Gloria Corpas (Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN), 2008-06-16)
      In this paper, we explore a statistical framework for mutual bilingual terminology extraction. We propose three probabilistic models to assess the proposition that automatic alignment can play an active role in bilingual terminology extraction and translate it into mutual bilingual terminology extraction. The results indicate that such models are valid and can show that mutual bilingual terminology extraction is indeed a viable approach.
    • National Scientific Performance Evolution Patterns: Retrenchment, Successful Expansion, or Overextension

      Thelwall, Mike; Levitt, Jonathan M. (Wiley-Blackwell, 2017-11-17)
      National governments would like to preside over an expanding and increasingly high impact science system but are these two goals largely independent or closely linked? This article investigates the relationship between changes in the share of the world’s scientific output and changes in relative citation impact for 2.6 million articles from 26 fields in the 25 countries with the most Scopus-indexed journal articles from 1996 to 2015. There is a negative correlation between expansion and relative citation impact but their relationship varies. China, Spain, Australia, and Poland were successful overall across the 26 fields, expanding both their share of the world’s output and its relative citation impact, whereas Japan, France, Sweden and Israel had decreased shares and relative citation impact. In contrast, the USA, UK, Germany, Italy, Russia, Netherlands, Switzerland, Finland, and Denmark all enjoyed increased relative citation impact despite a declining share of publications. Finally, India, South Korea, Brazil, Taiwan, and Turkey all experienced sustained expansion but a recent fall in relative citation impact. These results may partly reflect changes in the coverage of Scopus and the selection of fields.
    • Native language identification of fluent and advanced non-native writers

      Sarwar, Raheem; Rutherford, Attapol T; Hassan, Saeed-Ul; Rakthanmanon, Thanawin; Nutanong, Sarana (Association for Computing Machinery (ACM), 2020-04-30)
      Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are fluent and advanced non-native speakers of a second language. Existing NLI studies with UGC (i) rely on the content-specific/social-network features and may not be generalizable to other domains and datasets, (ii) are unable to capture the variations of the language-usage-patterns within a text sample, and (iii) are not associated with any outlier handling mechanism. Moreover, since there is a sizable number of people who have acquired non-English second languages due to the economic and immigration policies, there is a need to gauge the applicability of NLI with UGC to other languages. Unlike existing solutions, we define a topic-independent feature space, which makes our solution generalizable to other domains and datasets. Based on our feature space, we present a solution that mitigates the effect of outliers in the data and helps capture the variations of the language-usage-patterns within a text sample. Specifically, we represent each text sample as a point set and identify the top-k stylistically similar text samples (SSTs) from the corpus. We then apply the probabilistic k nearest neighbors’ classifier on the identified top-k SSTs to predict the native languages of the authors. To conduct experiments, we create three new corpora where each corpus is written in a different language, namely, English, French, and German. Our experimental studies show that our solution outperforms competitive methods and reports more than 80% accuracy across languages.
    • Natural language processing for mental disorders: an overview

      Calixto, Iacer; Yaneva, Viktoriya; Cardoso, Raphael (CRC Press, 2021-12-31)
    • Neural sentiment analysis of user reviews to predict user ratings

      Gezici, Bahar; Bolucu, Necva; Tarhan, Ayca; Can, Burcu (IEEE, 2019-11-21)
      The significance of user satisfaction is increasing in the competitive open source software (OSS) market. Application stores let users send their feedbacks for applications, which are in the form of user reviews or ratings. Developers are informed about bugs or any additional requirements with the help of this feedback and use it to increase the quality of the software. Moreover, potential users rely on this information as a success indicator to decide downloading the applications. Since it is usually costly to read all the reviews and evaluate their content, the ratings are taken as the base for the assessment. This makes the consistency of the contents with the ratings of the reviews important for healthy evaluation of the applications. In this study, we use recurrent neural networks to analyze the reviews automatically, and thereby predict the user ratings based on the reviews. We apply transfer learning from a huge volume, gold dataset of Amazon Customer Reviews. We evaluate the performance of our model on three mobile OSS applications in the Google Play Store and compare the predicted ratings and the original ratings of the users. Eventually, the predicted ratings have an accuracy of 87.61% compared to the original ratings of the users, which seems promising to obtain the ratings from the reviews especially if the former is absent or its consistency with the reviews is weak.
    • Neural text normalization for Turkish social media

      Goker, Sinan; Can, Burcu (IEEE, 2018-12-10)
      Social media has become a rich data source for natural language processing tasks with its worldwide use; however, it is hard to process social media data due to its informal nature. Text normalization is the task of transforming the noisy text into its canonical form. It generally serves as a preprocessing task in other NLP tasks that are applied to noisy text. In this study, we apply two approaches for Turkish text normalization: Contextual Normalization approach using distributed representations of words and Sequence-to-Sequence Normalization approach using neural encoder-decoder models. As the approaches applied to Turkish and also other languages are mostly rule-based, additional rules are required to be added to the normalization model in order to detect new error patterns arising from the change of the language use in social media. In contrast to rule-based approaches, the proposed approaches provide the advantage of normalizing different error patterns that change over time by training with a new dataset and updating the normalization model. Therefore, the proposed methods provide a solution to language change dependency in social media by updating the normalization model without defining new rules.
    • New directions in the study of family names

      Hanks, Patrick; Boullón Agrelo, Ana Isabel (Consello da Cultura Galega, 2018-12-28)
      This paper explores and explains recent radical developments in resources and methodology for studying the origins, cultural associations, and histories of family names (also called ‘surnames’). It summarizes the current state of the art and outlines new resources and procedures that are now becoming available. It shows how such innovations can enable the correction of errors in previous work and improve the accuracy of dictionaries of family names, with a focus on the English-speaking world. Developments such as the digitization of archives are having a profound effect, not only on the interpretation and understanding of traditional, ‘established’ family names and their histories, but also of names in other languages and other cultures. There are literally millions of different family names in the world today, many of which have never been studied at all. What are good criteria for selection of entries in a dictionary of family names, and what can be said about them? What is the nature of the evidence? How stable (or how variable) are family names over time? What are the effects of factors such as migration? What is the relationship between family names and geographical locations, given that people can and do move around? What is the relationship between traditional philological and historical approaches to the subject and statistical analysis of newly available digitized data? The paper aims to contribute to productive discussion of such questions.
    • New versions of PageRank employing alternative Web document models

      Thelwall, Mike; Vaughan, Liwen (Emerald Group Publishing Limited, 2004)
      Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects’ rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.