• FGFR1 expression and role in migration in low and high grade pediatric gliomas

      Egbivwie, Naomi; Cockle, Julia V.; Humphries, Matthew; Ismail, Azzam; Esteves, Filomena; Taylor, Claire; Karakoula, Katherine; Morton, Ruth; Warr, Tracy; Short, Susan C.; et al. (Frontiers Media, 2019-03-13)
      The heterogeneous and invasive nature of pediatric gliomas poses significant treatment challenges, highlighting the importance of identifying novel chemotherapeutic targets. Recently, recurrent Fibroblast growth factor receptor 1 (FGFR1) mutations in pediatric gliomas have been reported. Here, we explored the clinical relevance of FGFR1 expression, cell migration in low and high grade pediatric gliomas and the role of FGFR1 in cell migration/invasion as a potential chemotherapeutic target. A high density tissue microarray (TMA) was used to investigate associations between FGFR1 and activated phosphorylated FGFR1 (pFGFR1) expression and various clinicopathologic parameters. Expression of FGFR1 and pFGFR1 were measured by immunofluorescence and by immunohistochemistry (IHC) in 3D spheroids in five rare patient-derived pediatric low-grade glioma (pLGG) and two established high-grade glioma (pHGG) cell lines. Two-dimensional (2D) and three-dimensional (3D) migration assays were performed for migration and inhibitor studies with three FGFR1 inhibitors. High FGFR1 expression was associated with age, malignancy, tumor location and tumor grade among astrocytomas. Membranous pFGFR1 was associated with malignancy and tumor grade. All glioma cell lines exhibited varying levels of FGFR1 and pFGFR1 expression and migratory phenotypes. There were significant anti-migratory effects on the pHGG cell lines with inhibitor treatment and anti-migratory or pro-migratory responses to FGFR1 inhibition in the pLGGs. Our findings support further research to target FGFR1 signaling in pediatric gliomas.
    • Figshare: A universal repository for academic resource sharing?

      Thelwall, Mike; Kousha, Kayvan (Emerald Group Publishing Limited, 2015-12-18)
      Purpose A number of subject-orientated and general websites have emerged to host academic resources. It is important to evaluate the uptake of such services in order to decide which depositing strategies are effective and should be encouraged. Design/methodology/approach This article evaluates the views and shares of resources in the generic repository Figshare by subject category and resource type. Findings Figshare use and common resource types vary substantially by subject category but resources can be highly viewed even in subjects with few members. Subject areas with more resources deposited do not tend to have higher viewing or sharing statistics. Practical implications Limited uptake of Figshare within a subject area should not be a barrier to its use. Several highly successful innovative uses for Figshare show that it can reach beyond a purely academic audience. Originality/value This is the first analysis of the uptake and use of a generic academic resource sharing repository.
    • GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks

      Taslimipoor, Shiva; Rohanian, Omid; Može, Sara (Association for Computational Linguistics, 2019-06-06)
      This paper describes the system submitted to the SemEval 2019 shared task 1 ‘Cross-lingual Semantic Parsing with UCCA’. We rely on the semantic dependency parse trees provided in the shared task which are converted from the original UCCA files and model the task as tagging. The aim is to predict the graph structure of the output along with the types of relations among the nodes. Our proposed neural architecture is composed of Graph Convolution and BiLSTM components. The layers of the system share their weights while predicting dependency links and semantic labels. The system is applied to the CONLLU format of the input data and is best suited for semantic dependency parsing.
    • Gender and image sharing on Facebook, Twitter, Instagram, Snapchat and WhatsApp in the UK: Hobbying alone or filtering for friends?

      Thelwall, Mike; Vis, Farida (Emerald, 2017-10-01)
      Purpose: Despite the ongoing shift from text-based to image-based communication in the social web, supported by the affordances of smartphones, little is known about the new image sharing practices. Both gender and platform type seem likely to be important, but it is unclear how. Design/methodology/approach: This article surveys an age-balanced sample of UK Facebook, Twitter, Instagram, Snapchat and WhatsApp image sharers with a range of exploratory questions about platform use, privacy, interactions, technology use and profile pictures. Findings: Females shared photos more often overall and shared images more frequently on Snapchat, but males shared more images on Twitter, particularly for hobbies. Females also tended to have more privacy-related concerns but were more willing, in principle, to share pictures of their children. Females also interacted more through others’ images by liking and commenting on them. Both genders used supporting apps but in different ways: females applied filters and posted to albums whereas males retouched photos and used photo organising apps. Finally, males were more likely to be alone in their profile pictures. Practical implications: Those designing visual social web communication strategies to reach out to users should consider the different ways in which platforms are used by males and females to optimise their message for their target audience. Social implications: There are clear gender and platform differences in visual communication strategies. Overall, males may tend to have more informational, and females more relationship-based, skills or needs. Originality/value: This is the first detailed survey of electronic image sharing practices and the first to systematically compare the current generation of platforms.
    • Gender and research Publishing in India: Uniformly high inequality?

      Thelwall, Mike; Bailey, Carol; Makita, Meiko; Sud, Pardeep; Madalli, Devika P. (Elsevier, 2018-12-10)
      Gender inequalities have been a persistent feature of all modern societies. Although employment-related gender discrimination in various forms is legally prohibited, prejudice and violence against females have not been eradicated. Moreover, gendered social expectations can constrain the career choices of both males and females. Within academia, continuing gender imbalances have been found in many countries (Larivière, Ni, Gingras, Cronin, & Sugimoto, 2013), and particularly at senior levels (e.g., Ucal, O'Neil, & Toktas, 2015; Weisshaar, 2017; Winchester & Browning, 2015). India was the fifth largest research producer in 2017, according to Scopus, but has the highest United Nations Development Programme (UNDP) gender inequality index of the 30 largest research producers in Scopus (/hdr.undp.org/en/data) and so is an important case for global science. Moreover, the complex web of influences that have led to women being underrepresented in science in India is not well understood (Gupta, 2015). The absence of basic information about gender inequalities is a serious limitation because gender issues in India differ from the better researched case of the USA, due to economic conditions, probably stronger family influences (Vindhya, 2007), greater female safety concerns (Vindhya, 2007), and differing cultural expectations (Chandrakar, 2014).
    • Gender bias in machine learning for sentiment analysis

      Thelwall, Mike (Emerald, 2018-01-01)
      Purpose: This paper investigates whether machine learning induces gender biases in the sense of results that are more accurate for male authors than for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis. Design/methodology/approach: This article uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection. Findings: Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender datasets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders. Practical implications: End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with. Originality/value: This is the first demonstration of gender bias in machine learning sentiment analysis.
    • Gender bias in sentiment analysis

      Thelwall, Mike (Emerald, 2018-02-14)
      Purpose: To test if there are biases in lexical sentiment analysis accuracy between reviews authored by males and females. Design: This paper uses datasets of TripAdvisor reviews of hotels and restaurants in the UK written by UK residents to contrast the accuracy of lexical sentiment analysis for males and females. Findings: Male sentiment is harder to detect because it is less explicit. There was no evidence that this problem could be solved by gender-specific lexical sentiment analysis. Research limitations: Only one lexical sentiment analysis algorithm was used. Practical implications: Care should be taken when drawing conclusions about gender differences from automatic sentiment analysis results. When comparing opinions for product aspects that appeal differently to men and women, female sentiments are likely to be overrepresented, biasing the results. Originality/value: This is the first evidence that lexical sentiment analysis is less able to detect the opinions of one gender than another.
    • Gender differences in research areas, methods and topics: Can people and thing orientations explain the results?

      Thelwall, Mike; Bailey, Carol; Tobin, Catherine; Bradshaw, Noel-Ann (Elsevier, 2018-12-26)
      Although the gender gap in academia has narrowed, females are underrepresented within some fields in the USA. Prior research suggests that the imbalances between science, technology, engineering and mathematics fields may be partly due to greater male interest in things and greater female interest in people, or to off-putting masculine cultures in some disciplines. To seek more detailed insights across all subjects, this article compares practising US male and female researchers between and within 285 narrow Scopus fields inside 26 broad fields from their first-authored articles published in 2017. The comparison is based on publishing fields and the words used in article titles, abstracts, and keywords. The results cannot be fully explained by the people/thing dimensions. Exceptions include greater female interest in veterinary science and cell biology and greater male interest in abstraction, patients, and power/control fields, such as politics and law. These may be due to other factors, such as the ability of a career to provide status or social impact or the availability of alternative careers. As a possible side effect of the partial people/thing relationship, females are more likely to use exploratory and qualitative methods and males are more likely to use quantitative methods. The results suggest that the necessary steps of eliminating explicit and implicit gender bias in academia are insufficient and might be complemented by measures to make fields more attractive to minority genders.
    • Goodreads Reviews to Assess the Wider Impacts of Books

      Kousha, Kayvan; Thelwall, Mike; Abdoli, Mahshid (John Wiley & Sons, 2017-07-17)
      Although peer-review and citation counts are commonly used to help assess the scholarly impact of published research, informal reader feedback might also be exploited to help assess the wider impacts of books, such as their educational or cultural value. The social website Goodreads seems to be a reasonable source for this purpose because it includes a large number of book reviews and ratings by many users inside and outside of academia. To check this, Goodreads book metrics were compared with different book-based impact indicators for 15,928 academic books across broad fields. Goodreads engagements were numerous enough in the Arts (85% of books had at least one), Humanities (80%) and Social Sciences (67%) for use as a source of impact evidence. Low and moderate correlations between Goodreads book metrics and scholarly or non-scholarly indicators suggest that reader feedback in Goodreads reflects the many purposes of books rather than a single type of impact. Although Goodreads book metrics can be manipulated they could be used guardedly by academics, authors, and publishers in evaluations.
    • Goodreads: A social network site for book readers

      Thelwall, Mike; Kousha, Kayvan (John Wiley & Sons, Inc., 2016-12-21)
      Goodreads is an Amazon‐owned book‐based social web site for members to share books, read, review books, rate books, and connect with other readers. Goodreads has tens of millions of book reviews, recommendations, and ratings that may help librarians and readers to select relevant books. This article describes a first investigation of the properties of Goodreads users, using a random sample of 50,000 members. The results suggest that about three quarters of members with a public profile are female, and that there is little difference between male and female users in patterns of behavior, except for females registering more books and rating them less positively. Goodreads librarians and super‐users engage extensively with most features of the site. The absence of strong correlations between book‐based and social usage statistics (e.g., numbers of friends, followers, books, reviews, and ratings) suggests that members choose their own individual balance of social and book activities and rarely ignore one at the expense of the other. Goodreads is therefore neither primarily a book‐based website nor primarily a social network site but is a genuine hybrid, social navigation site.
    • Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories

      Martín-Martín, Alberto; Orduna-Malea, Enrique; Thelwall, Mike; Delgado López-Cózar, Emilio (Elsevier, 2018-10-05)
      Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2299 English-language highly-cited documents from 252 GS subject categories published in 2006, comparing GS, the WoS Core Collection, and Scopus. GS consistently found the largest percentage of citations across all areas (93%–96%), far ahead of Scopus (35%–77%) and WoS (27%–73%). GS found nearly all the WoS (95%) and Scopus (92%) citations. Most citations found only by GS were from non-journal sources (48%–65%), including theses, books, conference papers, and unpublished materials. Many were non-English (19%–38%), and they tended to be much less cited than citing sources that were also in Scopus or WoS. Despite the many unique GS citing sources, Spearman correlations between citation counts in GS and WoS or Scopus are high (0.78-0.99). They are lower in the Humanities, and lower between GS and WoS than between GS and Scopus. The results suggest that in all areas GS citation data is essentially a superset of WoS and Scopus, with substantial extra coverage.
    • Grammatical annotation of historical Portuguese: Generating a corpus-based diachronic dictionary

      Bick, Eckhard; Zampieri, Marcos (Springer, 2016-09-03)
      In this paper, we present an automatic system for the morphosyntactic annotation and lexicographical evaluation of historical Portuguese corpora. Using rule-based orthographical normalization, we were able to apply a standard parser (PALAVRAS) to historical data (Colonia corpus) and to achieve accurate annotation for both POS and syntax. By aligning original and standardized word forms, our method allows to create tailor-made standardization dictionaries for historical Portuguese with optional period or author frequencies.
    • Graph structure in three national academic Webs: Power laws with anomalies

      Thelwall, Mike; Wilkinson, David (Wiley, 2003)
      The graph structures of three national university publicly indexable Webs from Australia, New Zealand, and the UK were analyzed. Strong scale-free regularities for page indegrees, outdegrees, and connected component sizes were in evidence, resulting in power laws similar to those previously identified for individual university Web sites and for the AltaVista-indexed Web. Anomalies were also discovered in most distributions and were tracked down to root causes. As a result, resource driven Web sites and automatically generated pages were identified as representing a significant break from the assumptions of previous power law models. It follows that attempts to track average Web linking behavior would benefit from using techniques to minimize or eliminate the impact of such anomalies.
    • Guideline references and academic citations as evidence of the clinical value of health research

      Thelwall, Mike; Maflahi, Nabeil; Statistical Cybermetrics Research Group; School of Mathematics and Computer Science; University of Wolverhampton; Wulfruna Street Wolverhampton WV1 1LY United Kingdom; Statistical Cybermetrics Research Group; School of Mathematics and Computer Science; University of Wolverhampton; Wulfruna Street Wolverhampton WV1 1LY United Kingdom (2016-03-14)
      This article introduces a new source of evidence of the value of medical-related research: citations from clinical guidelines. These give evidence that research findings have been used to inform the day-to-day practice of medical staff. To identify whether citations from guidelines can give different information from that of traditional citation counts, this article assesses the extent to which references in clinical guidelines tend to be highly cited in the academic literature and highly read in Mendeley. Using evidence from the United Kingdom, references associated with the UK's National Institute of Health and Clinical Excellence (NICE) guidelines tended to be substantially more cited than comparable articles, unless they had been published in the most recent 3 years. Citation counts also seemed to be stronger indicators than Mendeley readership altmetrics. Hence, although presence in guidelines may be particularly useful to highlight the contributions of recently published articles, for older articles citation counts may already be sufficient to recognize their contributions to health in society.
    • How quickly do publications get read? The evolution of Mendeley reader counts for new articles

      Maflahi, Nabeil; Thelwall, Mike (Wiley-Blackwell, 2017-08-29)
      Within science, citation counts are widely used to estimate research impact but publication delays mean that they are not useful for recent research. This gap can be filled by Mendeley reader counts, which are valuable early impact indicators for academic articles because they appear before citations and correlate strongly with them. Nevertheless, it is not known how Mendeley readership counts accumulate within the year of publication, and so it is unclear how soon they can be used. In response, this paper reports a longitudinal weekly study of the Mendeley readers of articles in six library and information science journals from 2016. The results suggest that Mendeley readers accrue from when articles are first available online and continue to steadily build. For journals with large publication delays, articles can already have substantial numbers of readers by their publication date. Thus, Mendeley reader counts may even be useful as early impact indicators for articles before they have been officially published in a journal issue. If field normalised indicators are needed, then these can be generated when journal issues are published using the online first date.
    • Hybrid Arabic–French machine translation using syntactic re-ordering and morphological pre-processing

      Mohamed, Emad; Sadat, Fatiha (Elsevier BV, 2014-11-08)
      Arabic is a highly inflected language and a morpho-syntactically complex language with many differences compared to several languages that are heavily studied. It may thus require good pre-processing as it presents significant challenges for Natural Language Processing (NLP), specifically for Machine Translation (MT). This paper aims to examine how Statistical Machine Translation (SMT) can be improved using rule-based pre-processing and language analysis. We describe a hybrid translation approach coupling an Arabic–French statistical machine translation system using the Moses decoder with additional morphological rules that reduce the morphology of the source language (Arabic) to a level that makes it closer to that of the target language (French). Moreover, we introduce additional swapping rules for a structural matching between the source language and the target language. Two structural changes involving the positions of the pronouns and verbs in both the source and target languages have been attempted. The results show an improvement in the quality of translation and a gain in terms of BLEU score after introducing a pre-processing scheme for Arabic and applying these rules based on morphological variations and verb re-ordering (VS into SV constructions) in the source language (Arabic) according to their positions in the target language (French). Furthermore, a learning curve shows the improvement in terms on BLEU score under scarce- and large-resources conditions. The proposed approach is completed without increasing the amount of training data or radically changing the algorithms that can affect the translation or training engines.
    • Hyperlinks as a data source for science mapping

      Harries, Gareth; Wilkinson, David; Price, Liz; Fairclough, Ruth; Thelwall, Mike (Sage, 2004)
      Hyperlinks between academic web sites, like citations, can potentially be used to map disciplinary structures and identify evidence of connections between disciplines. In this paper we classified a sample of links originating in three different disciplines: maths, physics and sociology. Links within a discipline were found to be different in character to links between pages in different disciplines. There were also disciplinary differences in both types of link. As a consequence, we argue that interpretations of web science maps covering multiple disciplines will need to be sensitive to the contexts of the links mapped.
    • Identification of multiword expressions: A fresh look at modelling and evaluation

      Taslimipoor, Shiva; Rohanian, Omid; Mitkov, Ruslan; Fazly, Afsaneh; Markantonatou, Stella; Ramisch, Carlos; Savary, Agata; Vincze, Veronika (Language Science Press, 2018-10-25)
    • Identification of translationese: a machine learning approach

      Ilisei, Iustina; Inkpen, Diana; Corpas Pastor, Gloria; Mitkov, Ruslan; Gelbukh, A (Springer, 2010)
      This paper presents a machine learning approach to the study of translationese. The goal is to train a computer system to distinguish between translated and non-translated text, in order to determine the characteristic features that influence the classifiers. Several algorithms reach up to 97.62% success rate on a technical dataset. Moreover, the SVM classifier consistently reports a statistically significant improved accuracy when the learning system benefits from the addition of simplification features to the basic translational classifier system. Therefore, these findings may be considered an argument for the existence of the Simplification Universal.
    • Identifying Signs of Syntactic Complexity for Rule-Based Sentence Simplification

      Evans, Richard; Orasan, Constantin (Cambridge University Press, 2018-10-31)
      This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output.