• Gender differences in research areas, methods and topics: Can people and thing orientations explain the results?

      Thelwall, Mike; Bailey, Carol; Tobin, Catherine; Bradshaw, Noel-Ann (Elsevier, 2019-12-31)
      Although the gender gap in academia has narrowed, females are underrepresented within some fields in the USA. Prior research suggests that the imbalances between science, technology, engineering and mathematics fields may be partly due to greater male interest in things and greater female interest in people, or to off-putting masculine cultures in some disciplines. To seek more detailed insights across all subjects, this article compares practising US male and female researchers between and within 285 narrow Scopus fields inside 26 broad fields from their first-authored articles published in 2017. The comparison is based on publishing fields and the words used in article titles, abstracts, and keywords. The results cannot be fully explained by the people/thing dimensions. Exceptions include greater female interest in veterinary science and cell biology and greater male interest in abstraction, patients, and power/control fields, such as politics and law. These may be due to other factors, such as the ability of a career to provide status or social impact or the availability of alternative careers. As a possible side effect of the partial people/thing relationship, females are more likely to use exploratory and qualitative methods and males are more likely to use quantitative methods. The results suggest that the necessary steps of eliminating explicit and implicit gender bias in academia are insufficient and might be complemented by measures to make fields more attractive to minority genders.
    • She’s Reddit: A source of statistically significant gendered interest information

      Thelwall, Mike; Stuart, Emma (Elsevier, 2018-12-31)
      Information about gender differences in interests is necessary to disentangle the effects of discrimination and choice when gender inequalities occur, such as in employment. This article assesses gender differences in interests within the popular social news and entertainment site Reddit. A method to detect terms that are statistically significantly used more by males or females in 181 million comments in 100 subreddits shows that gender affects both the selection of subreddits and activities within most of them. The method avoids the hidden gender biases of topic modelling for this task. Although the method reveals statistically significant gender differences in interests for topics that are extensively discussed on Reddit, it cannot give definitive causes, and imitation and sharing within the site mean that additional checking is needed to verify the results. Nevertheless, with care, Reddit can serve as a useful source of insights into gender differences in interests.
    • Gender and research Publishing in India: Uniformly high inequality?

      Thelwall, Mike; Bailey, Carol; Makita, Meiko; Sud, Pardeep; Madalli, Devika P. (Elsevier, 2018-12-10)
      Gender inequalities have been a persistent feature of all modern societies. Although employment-related gender discrimination in various forms is legally prohibited, prejudice and violence against females have not been eradicated. Moreover, gendered social expectations can constrain the career choices of both males and females. Within academia, continuing gender imbalances have been found in many countries (Larivière, Ni, Gingras, Cronin, & Sugimoto, 2013), and particularly at senior levels (e.g., Ucal, O'Neil, & Toktas, 2015; Weisshaar, 2017; Winchester & Browning, 2015). India was the fifth largest research producer in 2017, according to Scopus, but has the highest United Nations Development Programme (UNDP) gender inequality index of the 30 largest research producers in Scopus (/hdr.undp.org/en/data) and so is an important case for global science. Moreover, the complex web of influences that have led to women being underrepresented in science in India is not well understood (Gupta, 2015). The absence of basic information about gender inequalities is a serious limitation because gender issues in India differ from the better researched case of the USA, due to economic conditions, probably stronger family influences (Vindhya, 2007), greater female safety concerns (Vindhya, 2007), and differing cultural expectations (Chandrakar, 2014).
    • Assessing the teaching value of non-English academic books: The case of Spain

      Mas Bleda, Amalia; Thelwall, Mike (Consejo Superior de Investigaciones Científicas, 2018-12-01)
    • Identifying Signs of Syntactic Complexity for Rule-Based Sentence Simplification

      Evans, Richard; Orasan, Constantin (Cambridge University Press, 2018-10-31)
    • Do prestigious Spanish scholarly book publishers have more teaching impact?

      Mas-Bleda, Amalia; Thelwall, Mike (Emerald Publishing Limited, 2018-10-10)
      Purpose The purpose of this paper is to assess the educational value of prestigious and productive Spanish scholarly publishers based on mentions of their books in online scholarly syllabi. Design/methodology/approach Syllabus mentions of 15,117 books from 27 publishers were searched for, manually checked and compared with Microsoft Academic (MA) citations. Findings Most books published by Ariel, Síntesis, Tecnos and Cátedra have been mentioned in at least one online syllabus, indicating that their books have consistently high educational value. In contrast, few books published by the most productive publishers were mentioned in online syllabi. Prestigious publishers have both the highest educational impact based on syllabus mentions and the highest research impact based on MA citations. Research limitations/implications The results might be different for other publishers. The online syllabus mentions found may be a small fraction of the syllabus mentions of the sampled books. Practical implications Authors of Spanish-language social sciences and humanities books should consider general prestige when selecting a publisher if they want educational uptake for their work. Originality/value This is the first study assessing book publishers based on syllabus mentions.
    • Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories

      Martín-Martín, Alberto; Orduna-Malea, Enrique; Thelwall, Mike; Delgado López-Cózar, Emilio (Elsevier, 2018-10-05)
      Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2299 English-language highly-cited documents from 252 GS subject categories published in 2006, comparing GS, the WoS Core Collection, and Scopus. GS consistently found the largest percentage of citations across all areas (93%–96%), far ahead of Scopus (35%–77%) and WoS (27%–73%). GS found nearly all the WoS (95%) and Scopus (92%) citations. Most citations found only by GS were from non-journal sources (48%–65%), including theses, books, conference papers, and unpublished materials. Many were non-English (19%–38%), and they tended to be much less cited than citing sources that were also in Scopus or WoS. Despite the many unique GS citing sources, Spearman correlations between citation counts in GS and WoS or Scopus are high (0.78-0.99). They are lower in the Humanities, and lower between GS and WoS than between GS and Scopus. The results suggest that in all areas GS citation data is essentially a superset of WoS and Scopus, with substantial extra coverage.
    • Does female-authored research have more educational impact than male-authored research?

      Thelwall, Mike (Levy Library Press, 2018-10-04)
      Female academics are more likely to be in teaching-related roles in some countries, including the USA. As a side effect of this, female-authored journal articles may tend to be more useful for students. This study assesses this hypothesis by investigating whether female first-authored research has more uptake in education than male first-authored research. Based on an analysis of Mendeley readers of articles from 2014 in five countries and 100 narrow Scopus subject categories, the results show that female-authored articles attract more student readers than male-authored articles in Spain, Turkey, the UK and USA but not India. They also attract fewer professorial readers in Spain, the UK and the USA, but not India and Turkey, and tend to be less popular with senior academics. Because the results are based on analysis of differences within narrow fields they cannot be accounted for by females working in more education-related disciplines. The apparent additional educational impact for female-authored research could be due to selecting more accessible micro-specialisms, however, such as health-related instruments within the instrumentation narrow field. Whatever the cause, the results suggest that citation-based research evaluations may undervalue the wider impact of female researchers.
    • Do gendered citation advantages influence field participation? Four unusual fields in the USA 1996-2017

      Thelwall, Mike (Springer, 2018-09-29)
      Gender inequalities in science are an ongoing concern, but their current causes are not well understood. This article investigates four fields with unusual proportions of female researchers in the USA for their subject matter, according to some current theories. It assesses how their gender composition and gender differences in citation rates have changed over time. All fields increased their share of female first-authored research, but at varying rates. The results give no evidence of the importance of citations, despite their unusual gender characteristics. For example, the field with the highest share of female-authored research and the most rapid increase had the largest male citation advantage. Differing micro-specialisms seems more likely than bias to be a cause of gender differences in citation rates, when present.
    • Do females create higher impact research? Scopus citations and Mendeley readers for articles from five countries

      Thelwall, Mike (Elsevier, 2018-09-01)
      There are known gender imbalances in participation in scientific fields, from female dominance of nursing to male dominance of mathematics. It is not clear whether there is also a citation imbalance, with some claiming that male-authored research tends to be more cited. No previous study has assessed gender differences in the readers of academic research on a large scale, however. In response, this article assesses whether there are gender differences in the average citations and/or Mendeley readers of academic publications. Field normalised logged Scopus citations and Mendeley readers from mid-2018 for articles published in 2014 were investigated for articles with first authors from India, Spain, Turkey, the UK and the USA in up to 251 fields with at least 50 male and female authors. Although female-authored research is less cited in Turkey (−4.0%) and India (−3.6%), it is marginally more cited in Spain (0.4%), the UK (0.4%), and the USA (0.2%). Female-authored research has fewer Mendeley readers in India (−1.1%) but more in Spain (1.4%), Turkey (1.1%), the UK (2.7%) and the USA (3.0%). Thus, whilst there may be little practical gender difference in citation impact in countries with mature science systems, the higher female readership impact suggests a wider audience for female-authored research. The results also show that the conclusions from a gender analysis depend on the field normalisation method. A theoretically informed decision must therefore be made about which normalisation to use. The results also suggest that arithmetic mean-based field normalisation is favourable to males.
    • Are Mendeley Reader Counts Useful Impact Indicators in all Fields?

      Thelwall, Mike (Springer, 2018-08)
      Reader counts from the social reference sharing site Mendeley are known to be valuable for early research evaluation. They have strong correlations with citation counts for journal articles but appear about a year before them. There are disciplinary differences in the value of Mendeley reader counts but systematic evidence is needed at the level of narrow fields to reveal its extent. In response, this article compares Mendeley reader counts with Scopus citation counts for journal articles from 2012 in 325 narrow Scopus fields. Despite strong positive correlations in most fields, averaging 0.671, the correlations in some fields are as weak as 0.255. Technical reasons explain most weaker correlations, suggesting that the underlying relationship is almost always strong. The exceptions are caused by unusually high educational or professional use or topics of interest within countries that avoid Mendeley. The findings suggest that if care is taken then Mendeley reader counts can be used for early citation impact evidence in almost all fields and for related impact in some of the remainder. As an additional application of the results, cross-checking with Mendeley data can be used to identify indexing anomalies in citation databases.
    • Multiword units in machine translation and translation technology

      Ruslan, Mitkov; Monti, Johanna; Corpas Pastor, Gloria; Seretan, Violeta (John Benjamins, 2018-07-20)
      The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but we believe that there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. In this chapter, we present a survey of the field with particular reference to Machine Translation and Translation Technology.
    • Which US and European Higher Education Institutions are visible in ResearchGate and what affects their RG Score?

      Lepori, Benedetto; Thelwall, Michael; Hoorani, Bareerah Hafeez (Elsevier, 2018-07-19)
      While ResearchGate has become the most popular academic social networking site in terms of regular users, not all institutions have joined and the scores it assigns to academics and institutions are controversial. This paper assesses the presence in ResearchGate of higher education institutions in Europe and the US in 2017, and the extent to which institutional ResearchGate Scores reflect institutional academic impact. Most of the 2258 European and 4355 US higher educational institutions included in the sample had an institutional ResearchGate profile, with near universal coverage for PhD-awarding institutions found in the Web of Science (WoS). For non-PhD awarding institutions that did not publish, size (number of staff members) was most associated with presence in ResearchGate. For PhD-awarding institutions in WoS, presence in RG was strongly related to the number of WoS publications. In conclusion, a) institutional RG scores reflect research volume more than visibility and b) this indicator is highly correlated to the number of WoS publications. Hence, the value of RG Scores for institutional comparisons is limited.
    • Differences between journals and years in the proportions of students, researchers and faculty registering Mendeley articles

      Thelwall, Mike (Springer, 2018-07)
      This article contains two investigations into Mendeley reader counts with the same dataset. Mendeley reader counts provide evidence of early scholarly impact for journal articles, but reflect the reading of a relatively young subset of all researchers. To investigate whether this age bias is constant or varies by narrow field and publication year, this article compares the proportions of student, researcher and faculty readers for articles published 1996-2016 in 36 large monodisciplinary journals. In these journals, undergraduates recorded the newest research and faculty the oldest, with large differences between journals. The existence of substantial differences in the composition of readers between related fields points to the need for caution when using Mendeley readers as substitutes for citations for broad fields. The second investigation shows, with the same data, that there are substantial differences between narrow fields in the time taken for Scopus citations to be as numerous as Mendeley readers. Thus, even narrow field differences can impact on the relative value of Mendeley compared to citation counts.
    • Aggressive language identification using word embeddings and sentiment features

      Orasan, Constantin (Association for Computational Linguistics, 2018-06-25)
      This paper describes our participation in the First Shared Task on Aggression Identification. The method proposed relies on machine learning to identify social media texts which contain aggression. The main features employed by our method are information extracted from word embeddings and the output of a sentiment analyser. Several machine learning methods and different combinations of features were tried. The official submissions used Support Vector Machines and Random Forests. The official evaluation showed that for texts similar to the ones in the training dataset Random Forests work best, whilst for texts which are different SVMs are a better choice. The evaluation also showed that despite its simplicity the method performs well when compared with more elaborated methods.
    • Dissecting tweets in search of irony

      Rohanian, Omid; Taslimipoor, Shiva; Evans, Richard; Mitkov, Ruslan (Association for Computational Linguistics, 2018-06-05)
      This paper describes the systems submitted to SemEval 2018 Task 3 “Irony detection in English tweets” for both subtasks A and B. The first system leveraging a combination of sentiment, distributional semantic, and text surface features is ranked third among 44 teams according to the official leaderboard of the subtask A. The second system with slightly different representation of the features ranked ninth in subtask B. We present a method that entails decomposing tweets into separate parts. Searching for contrast within the constituents of a tweet is an integral part of our system. We embrace an extensive definition of contrast which leads to a vast coverage in detecting ironic content.
    • Semantic discrimination based on knowledge and association

      Taslimipoor, Shiva; Rohanian, Omid; Ha, Le An; Corpas Pastor, Gloria; Mitkov, Ruslan (Association for Computational Linguistics, 2018-06)
      This paper describes the system submitted to SemEval 2018 shared task 10 ‘Capturing Discriminative Attributes’. We use a combination of knowledge-based and co-occurrence features to capture the semantic difference between two words in relation to an attribute. We define scores based on association measures, ngram counts, word similarity, and ConceptNet relations. The system is ranked 4th (joint) on the official leaderboard of the task.
    • Academic information on Twitter: A user survey

      Mohammadi, Ehsan; Thelwall, Mike; Kwasny, Mary; Holmes, Kristi L. (PLOS, 2018-05-17)
      Although counts of tweets citing academic papers are used as an informal indicator of interest, little is known about who tweets academic papers and who uses Twitter to find scholarly information. Without knowing this, it is difficult to draw useful conclusions from a publication being frequently tweeted. This study surveyed 1,912 users that have tweeted journal articles to ask about their scholarly-related Twitter uses. Almost half of the respondents (45%) did not work in academia, despite the sample probably being biased towards academics. Twitter was used most by people with a social science or humanities background. People tend to leverage social ties on Twitter to find information rather than searching for relevant tweets. Twitter is used in academia to acquire and share real-time information and to develop connections with others. Motivations for using Twitter vary by discipline, occupation, and employment sector, but not much by gender. These factors also influence the sharing of different types of academic information. This study provides evidence that Twitter plays a significant role in the discovery of scholarly information and cross-disciplinary knowledge spreading. Most importantly, the large numbers of non-academic users support the claims of those using tweet counts as evidence for the non-academic impacts of scholarly research
    • Co-saved, co-tweeted, and co-cited networks

      Didegah, Fereshteh; Thelwall, Mike; Danish Centre for Studies in Research & Research Policy, Department of Political Science & Government; Aarhus University; Aarhus Denmark; Statistical Cybermetrics Research Group, University of Wolverhampton, Wulfruna Street; Wolverhampton WV1 1LY UK (Wiley-Blackwell, 2018-05-14)
      Counts of tweets and Mendeley user libraries have been proposed as altmetric alternatives to citation counts for the impact assessment of articles. Although both have been investigated to discover whether they correlate with article citations, it is not known whether users tend to tweet or save (in Mendeley) the same kinds of articles that they cite. In response, this article compares pairs of articles that are tweeted, saved to a Mendeley library, or cited by the same user, but possibly a different user for each source. The study analyzes 1,131,318 articles published in 2012, with minimum tweeted (10), saved to Mendeley (100), and cited (10) thresholds. The results show surprisingly minor overall overlaps between the three phenomena. The importance of journals for Twitter and the presence of many bots at different levels of activity suggest that this site has little value for impact altmetrics. The moderate differences between patterns of saving and citation suggest that Mendeley can be used for some types of impact assessments, but sensitivity is needed for underlying differences.
    • Linguistic features of genre and method variation in translation: A computational perspective

      Lapshinova-Koltunski, Ekaterina; Zampieri, Marcos (Mouton De Grouter, 2018-04-09)
      In this contribution we describe the use of text classification methods to investigate genre and method variation in an English - German translation corpus. For this purpose we use linguistically motivated features representing texts using a combination of part-of-speech tags arranged in bigrams, trigrams, and 4-grams. The classification method used in this study is a Bayesian classifier with Laplace smoothing. We use the output of the classifiers to carry out an extensive feature analysis on the main difference between genres and methods of translation.