• Dimensions: A Competitor to Scopus and the Web of Science?

      Thelwall, Mike (Elsevier, 2018-08)
      Dimensions is a partly free scholarly database launched by Digital Science in January 2018. Dimensions includes journal articles and citation counts, making it a potential new source of impact data. This article explores the value of Dimensions from an impact assessment perspective with an examination of Food Science research 2008-2018 and a random sample of 10,000 Scopus articles from 2012. The results include high correlations between citation counts from Scopus and Dimensions (0.96 by narrow field in 2012) as well as similar average counts. Almost all Scopus articles with DOIs were found in Dimensions (97% in 2012). Thus, the scholarly database component of Dimensions seems to be a plausible alternative to Scopus and the Web of Science for general citation analyses and for citation data in support of some types of research evaluations.
    • Are Mendeley Reader Counts Useful Impact Indicators in all Fields?

      Thelwall, Mike (Springer, 2018-08)
      Reader counts from the social reference sharing site Mendeley are known to be valuable for early research evaluation. They have strong correlations with citation counts for journal articles but appear about a year before them. There are disciplinary differences in the value of Mendeley reader counts but systematic evidence is needed at the level of narrow fields to reveal its extent. In response, this article compares Mendeley reader counts with Scopus citation counts for journal articles from 2012 in 325 narrow Scopus fields. Despite strong positive correlations in most fields, averaging 0.671, the correlations in some fields are as weak as 0.255. Technical reasons explain most weaker correlations, suggesting that the underlying relationship is almost always strong. The exceptions are caused by unusually high educational or professional use or topics of interest within countries that avoid Mendeley. The findings suggest that if care is taken then Mendeley reader counts can be used for early citation impact evidence in almost all fields and for related impact in some of the remainder. As an additional application of the results, cross-checking with Mendeley data can be used to identify indexing anomalies in citation databases.
    • Differences between journals and years in the proportions of students, researchers and faculty registering Mendeley articles

      Thelwall, Mike (Springer, 2018-07)
      This article contains two investigations into Mendeley reader counts with the same dataset. Mendeley reader counts provide evidence of early scholarly impact for journal articles, but reflect the reading of a relatively young subset of all researchers. To investigate whether this age bias is constant or varies by narrow field and publication year, this article compares the proportions of student, researcher and faculty readers for articles published 1996-2016 in 36 large monodisciplinary journals. In these journals, undergraduates recorded the newest research and faculty the oldest, with large differences between journals. The existence of substantial differences in the composition of readers between related fields points to the need for caution when using Mendeley readers as substitutes for citations for broad fields. The second investigation shows, with the same data, that there are substantial differences between narrow fields in the time taken for Scopus citations to be as numerous as Mendeley readers. Thus, even narrow field differences can impact on the relative value of Mendeley compared to citation counts.
    • Academic information on Twitter: A user survey

      Mohammadi, Ehsan; Thelwall, Mike; Kwasny, Mary; Holmes, Kristi L. (PLOS, 2018-05-17)
      Although counts of tweets citing academic papers are used as an informal indicator of interest, little is known about who tweets academic papers and who uses Twitter to find scholarly information. Without knowing this, it is difficult to draw useful conclusions from a publication being frequently tweeted. This study surveyed 1,912 users that have tweeted journal articles to ask about their scholarly-related Twitter uses. Almost half of the respondents (45%) did not work in academia, despite the sample probably being biased towards academics. Twitter was used most by people with a social science or humanities background. People tend to leverage social ties on Twitter to find information rather than searching for relevant tweets. Twitter is used in academia to acquire and share real-time information and to develop connections with others. Motivations for using Twitter vary by discipline, occupation, and employment sector, but not much by gender. These factors also influence the sharing of different types of academic information. This study provides evidence that Twitter plays a significant role in the discovery of scholarly information and cross-disciplinary knowledge spreading. Most importantly, the large numbers of non-academic users support the claims of those using tweet counts as evidence for the non-academic impacts of scholarly research
    • Co-saved, co-tweeted, and co-cited networks

      Didegah, Fereshteh; Thelwall, Mike; Danish Centre for Studies in Research & Research Policy, Department of Political Science & Government; Aarhus University; Aarhus Denmark; Statistical Cybermetrics Research Group, University of Wolverhampton, Wulfruna Street; Wolverhampton WV1 1LY UK (Wiley-Blackwell, 2018-05-14)
      Counts of tweets and Mendeley user libraries have been proposed as altmetric alternatives to citation counts for the impact assessment of articles. Although both have been investigated to discover whether they correlate with article citations, it is not known whether users tend to tweet or save (in Mendeley) the same kinds of articles that they cite. In response, this article compares pairs of articles that are tweeted, saved to a Mendeley library, or cited by the same user, but possibly a different user for each source. The study analyzes 1,131,318 articles published in 2012, with minimum tweeted (10), saved to Mendeley (100), and cited (10) thresholds. The results show surprisingly minor overall overlaps between the three phenomena. The importance of journals for Twitter and the presence of many bots at different levels of activity suggest that this site has little value for impact altmetrics. The moderate differences between patterns of saving and citation suggest that Mendeley can be used for some types of impact assessments, but sensitivity is needed for underlying differences.
    • Early Mendeley readers correlate with later citation counts

      Thelwall, Mike (Springer, 2018-03)
      Counts of the number of readers registered in the social reference manager Mendeley have been proposed as an early impact indicator for journal articles. Although previous research has shown that Mendeley reader counts for articles tend to have a strong positive correlation with synchronous citation counts after a few years, no previous studies have compared early Mendeley reader counts with later citation counts. In response, this first diachronic analysis compares reader counts within a month of publication with citation counts after 20 months for ten fields. There were moderate or strong correlations in eight out of ten fields, with the two exceptions being the smallest categories (n=18, 36) with wide confidence intervals. The correlations are higher than the correlations between later citations and early citations, showing that Mendeley reader counts are more useful early impact indicators than citation counts.
    • Can Microsoft Academic be used for citation analysis of preprint archives? The case of the Social Science Research Network

      Thelwall, Mike (Springer, 2018-03)
      Preprint archives play an important scholarly communication role within some fields. The impact of archives and individual preprints are difficult to analyse because online repositories are not indexed by the Web of Science or Scopus. In response, this article assesses whether the new Microsoft Academic can be used for citation analysis of preprint archives, focusing on the Social Science Research Network (SSRN). Although Microsoft Academic seems to index SSRN comprehensively, it groups a small fraction of SSRN papers into an easily retrievable set that has variations in character over time, making any field normalisation or citation comparisons untrustworthy. A brief parallel analysis of arXiv suggests that similar results would occur for other online repositories. Systematic analyses of preprint archives are nevertheless possible with Microsoft Academic when complete lists of archive publications are available from other sources because of its promising coverage and citation results.
    • A comparison of title words for journal articles and Wikipedia pages: Coverage and stylistic differences?

      Thelwall, Mike; Sud, Pardeep (La Fundación Española para la Ciencia y la Tecnología (FECYT), 2018-02-12)
      This article assesses whether there are gaps in Wikipedia’s coverage of academic information and whether there are non-obvious stylistic differences from academic journal articles that Wikipedia users and editors should be aware of. For this, it analyses terms in the titles of journal articles that are absent from all English Wikipedia page titles for each of 27 Scopus subject categories. The results show that English Wikipedia has lower coverage of issues of interest to non-English nations and there are gaps probably caused by a lack of willing subject specialist editors in some areas. There were also stylistic disciplinary differences in the results, with some fields using synonyms of “analysing” that were ignored in Wikipedia, and others using the present tense in titles to emphasise research outcomes. Since Wikipedia is broadly effective at covering academic research topics from all disciplines, it might be relied upon by non-specialists. Specialists should therefore check for coverage gaps within their areas for useful topics and librarians should caution users that important topics may be missing.
    • Microsoft Academic automatic document searches: accuracy for journal articles and suitability for citation analysis

      Thelwall, Mike (Elsevier, 2018-02)
      Microsoft Academic is a free academic search engine and citation index that is similar to Google Scholar but can be automatically queried. Its data is potentially useful for bibliometric analysis if it is possible to search effectively for individual journal articles. This article compares different methods to find journal articles in its index by searching for a combination of title, authors, publication year and journal name and uses the results for the widest published correlation analysis of Microsoft Academic citation counts for journal articles so far. Based on 126,312 articles from 323 Scopus subfields in 2012, the optimal strategy to find articles with DOIs is to search for them by title and filter out those with incorrect DOIs. This finds 90% of journal articles. For articles without DOIs, the optimal strategy is to search for them by title and then filter out matches with dissimilar metadata. This finds 89% of journal articles, with an additional 1% incorrect matches. The remaining articles seem to be mainly not indexed by Microsoft Academic or indexed with a different language version of their title. From the matches, Scopus citation counts and Microsoft Academic counts have an average Spearman correlation of 0.95, with the lowest for any single field being 0.63. Thus, Microsoft Academic citation counts are almost universally equivalent to Scopus citation counts for articles that are not recent but there are national biases in the results.
    • Can Microsoft Academic assess the early citation impact of in-press articles? A multi-discipline exploratory analysis

      Kousha, Kayvan; Abdoli, Mahshid; Thelwall, Mike (Elsevier, 2018-02)
      Many journals post accepted articles online before they are formally published in an issue. Early citation impact evidence for these articles could be helpful for timely research evaluation and to identify potentially important articles that quickly attract many citations. This article investigates whether Microsoft Academic can help with this task. For over 65,000 Scopus in-press articles from 2016 and 2017 across 26 fields, Microsoft Academic found 2-5 times as many citations as Scopus, depending on year and field. From manual checks of 1,122 Microsoft Academic citations not found in Scopus, Microsoft Academic’s citation indexing was faster but not much wider than Scopus for journals. It achieved this by associating citations to preprints with their subsequent in-press versions and by extracting citations from in-press articles. In some fields its coverage of scholarly digital libraries, such as arXiv.org, was also an advantage. Thus, Microsoft Academic seems to be a more comprehensive automatic source of citation counts for in-press articles than Scopus.
    • Could scientists use Altmetric.com scores to predict longer term citation counts?

      Thelwall, Mike; Nevill, Tamara (Elsevier, 2018-02)
      Altmetrics from Altmetric.com are widely used by publishers and researchers to give earlier evidence of attention than citation counts. This article assesses whether Altmetric.com scores are reliable early indicators of likely future impact and whether they may also reflect non-scholarly impacts. A preliminary factor analysis suggests that the main altmetric indicator of scholarly impact is Mendeley reader counts, with weaker news, informational and social network discussion/promotion dimensions in some fields. Based on a regression analysis of Altmetric.com data from November 2015 and Scopus citation counts from October 2017 for articles in 30 narrow fields, only Mendeley reader counts are consistent predictors of future citation impact. Most other Altmetric.com scores can help predict future impact in some fields. Overall, the results confirm that early Altmetric.com scores can predict later citation counts, although less well than journal impact factors, and the optimal strategy is to consider both Altmetric.com scores and journal impact factors. Altmetric.com scores can also reflect dimensions of non-scholarly impact in some fields.
    • Confidence intervals for normalised citation counts: Can they delimit underlying research capability?

      Thelwall, Mike (Elsevier, 2018-02)
      Normalised citation counts are routinely used to assess the average impact of research groups or nations. There is controversy over whether confidence intervals for them are theoretically valid or practically useful. In response, this article introduces the concept of a group’s underlying research capability to produce impactful research. It then investigates whether confidence intervals could delimit the underlying capability of a group in practice. From 123120 confidence interval comparisons for the average citation impact of the national outputs of ten countries within 36 individual large monodisciplinary journals, moderately fewer than 95% of subsequent indicator values fall within 95% confidence intervals from prior years, with the percentage declining over time. This is consistent with confidence intervals effectively delimiting the research capability of a group, although it does not prove that this is the cause of the results. The results are unaffected by whether internationally collaborative articles are included.
    • Assessing the teaching value of non-English academic books: The case of Spain

      Mas Bleda, Amalia; Thelwall, Mike (Consejo Superior de Investigaciones Científicas, 2018)
    • Gender bias in machine learning for sentiment analysis

      Thelwall, Mike (Emerald, 2017-12)
      Purpose: This paper investigates whether machine learning induces gender biases in the sense of results that are more accurate for male authors than for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis. Design/methodology/approach: This article uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection. Findings: Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender datasets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders. Practical implications: End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with. Originality/value: This is the first demonstration of gender bias in machine learning sentiment analysis.
    • Gender bias in sentiment analysis

      Thelwall, Mike (Emerald, 2017-12)
      Purpose: To test if there are biases in lexical sentiment analysis accuracy between reviews authored by males and females. Design: This paper uses datasets of TripAdvisor reviews of hotels and restaurants in the UK written by UK residents to contrast the accuracy of lexical sentiment analysis for males and females. Findings: Male sentiment is harder to detect because it is less explicit. There was no evidence that this problem could be solved by gender-specific lexical sentiment analysis. Research limitations: Only one lexical sentiment analysis algorithm was used. Practical implications: Care should be taken when drawing conclusions about gender differences from automatic sentiment analysis results. When comparing opinions for product aspects that appeal differently to men and women, female sentiments are likely to be overrepresented, biasing the results. Originality/value: This is the first evidence that lexical sentiment analysis is less able to detect the opinions of one gender than another.
    • Can Social News Websites Pay for Content and Curation? The SteemIt Cryptocurrency Model

      Thelwall, Mike (Sage, 2017-12)
      SteemIt is a Reddit-like social news site that pays members for posting and curating content. It uses micropayments backed by a tradeable currency, exploiting the Bitcoin cryptocurrency generation model to finance content provision in conjunction with advertising. If successful, this paradigm might change the way in which volunteer-based sites operate. This paper investigates 925,092 new members’ first posts for insights into what drives financial success in the site. Initial blog posts on average received $0.01, although the maximum accrued was $20,680.83. Longer, more sentiment-rich or more positive comments with personal information received the greatest financial reward in contrast to more informational or topical content. Thus, there is a clear financial value in starting with a friendly introduction rather than immediately attempting to provide useful content, despite the latter being the ultimate site goal. Follow-up posts also tended to be more successful when more personal, suggesting that interpersonal communication rather than quality content provision has driven the site so far. It remains to be seen whether the model of small typical rewards and the possibility that a post might generate substantially more are enough to incentivise long term participation or a greater focus on informational posts in the long term.
    • A decade of Garfield readers

      Thelwall, Mike (Springer, 2017-11-30)
      This brief note discusses Garfield’s continuing influence from the perspective of the Mendeley readers of his articles. This reflects the direct impact of his work since the launch of Mendeley in August 2008. In the last decade, his work is still extensively read by younger scientists, especially in computer and information sciences and the social sciences, and with a broad international spread. His work on citation indexes, impact factors and science history tracking seems to have the most contemporary relevance.
    • National Scientific Performance Evolution Patterns: Retrenchment, Successful Expansion, or Overextension

      Thelwall, Mike; Levitt, Jonathan M. (Wiley-Blackwell, 2017-11)
      National governments would like to preside over an expanding and increasingly high impact science system but are these two goals largely independent or closely linked? This article investigates the relationship between changes in the share of the world’s scientific output and changes in relative citation impact for 2.6 million articles from 26 fields in the 25 countries with the most Scopus-indexed journal articles from 1996 to 2015. There is a negative correlation between expansion and relative citation impact but their relationship varies. China, Spain, Australia, and Poland were successful overall across the 26 fields, expanding both their share of the world’s output and its relative citation impact, whereas Japan, France, Sweden and Israel had decreased shares and relative citation impact. In contrast, the USA, UK, Germany, Italy, Russia, Netherlands, Switzerland, Finland, and Denmark all enjoyed increased relative citation impact despite a declining share of publications. Finally, India, South Korea, Brazil, Taiwan, and Turkey all experienced sustained expansion but a recent fall in relative citation impact. These results may partly reflect changes in the coverage of Scopus and the selection of fields.
    • The research production of nations and departments: A statistical model for the share of publications

      Thelwall, Mike (Elsevier, 2017-11)
      Policy makers and managers sometimes assess the share of research produced by a group (country, department, institution). This takes the form of the percentage of publications in a journal, field or broad area that has been published by the group. This quantity is affected by essentially random influences that obscure underlying changes over time and differences between groups. A model of research production is needed to help identify whether differences between two shares indicate underlying differences. This article introduces a simple production model for indicators that report the share of the world’s output in a journal or subject category, assuming that every new article has the same probability to be authored by a given group. With this assumption, confidence limits can be calculated for the underlying production capability (i.e., probability to publish). The results of a time series analysis of national contributions to 36 large monodisciplinary journals 1996-2016 are broadly consistent with this hypothesis. Follow up tests of countries and institutions in 26 Scopus subject categories support the conclusions but highlight the importance of ensuring consistent subject category coverage.
    • Are Mendeley reader counts high enough for research evaluations when articles are published?

      Thelwall, Mike (Emerald, 2017-10-27)
      Purpose –Mendeley reader counts have been proposed as early indicators for the impact of academic publications. In response, this article assesses whether there are enough Mendeley readers for research evaluation purposes during the month when an article is first published. Design/methodology/approach – Average Mendeley reader counts were compared to average Scopus citation counts for 104520 articles from ten disciplines during the second half of 2016. Findings - Articles attracted, on average, between 0.1 and 0.8 Mendeley readers per article in the month in which they first appeared in Scopus. This is about ten times more than the average Scopus citation count. Research limitations/implications – Other subjects may use Mendeley more or less than the ten investigated here. The results are dependent on Scopus’s indexing practices, and Mendeley reader counts can be manipulated and have national and seniority biases. Practical implications – Mendeley reader counts during the month of publication are more powerful than Scopus citations for comparing the average impacts of groups of documents but are not high enough to differentiate between the impacts of typical individual articles. Originality/value - This is the first multi-disciplinary and systematic analysis of Mendeley reader counts from the publication month of an article.