• A comparison of sources of links for academic Web impact factor calculations

      Thelwall, Mike (MCB UP Ltd, 2002)
      There has been much recent interest in extracting information from collections of Web links. One tool that has been used is Ingwersen¿s Web impact factor. It has been demonstrated that several versions of this metric can produce results that correlate with research ratings of British universities showing that, despite being a measure of a purely Internet phenomenon, the results are susceptible to a wider interpretation. This paper addresses the question of which is the best possible domain to count backlinks from, if research is the focus of interest. WIFs for British universities calculated from several different source domains are compared, primarily the .edu, .ac.uk and .uk domains, and the entire Web. The results show that all four areas produce WIFs that correlate strongly with research ratings, but that none produce incontestably superior figures. It was also found that the WIF was less able to differentiate in more homogeneous subsets of universities, although positive results are still possible.
    • A comparison of title words for journal articles and Wikipedia pages: Coverage and stylistic differences?

      Thelwall, Mike; Sud, Pardeep (La Fundación Española para la Ciencia y la Tecnología (FECYT), 2018-02-12)
      This article assesses whether there are gaps in Wikipedia’s coverage of academic information and whether there are non-obvious stylistic differences from academic journal articles that Wikipedia users and editors should be aware of. For this, it analyses terms in the titles of journal articles that are absent from all English Wikipedia page titles for each of 27 Scopus subject categories. The results show that English Wikipedia has lower coverage of issues of interest to non-English nations and there are gaps probably caused by a lack of willing subject specialist editors in some areas. There were also stylistic disciplinary differences in the results, with some fields using synonyms of “analysing” that were ignored in Wikipedia, and others using the present tense in titles to emphasise research outcomes. Since Wikipedia is broadly effective at covering academic research topics from all disciplines, it might be relied upon by non-specialists. Specialists should therefore check for coverage gaps within their areas for useful topics and librarians should caution users that important topics may be missing.
    • A decade of Garfield readers

      Thelwall, Mike (Springer, 2017-11-30)
      This brief note discusses Garfield’s continuing influence from the perspective of the Mendeley readers of his articles. This reflects the direct impact of his work since the launch of Mendeley in August 2008. In the last decade, his work is still extensively read by younger scientists, especially in computer and information sciences and the social sciences, and with a broad international spread. His work on citation indexes, impact factors and science history tracking seems to have the most contemporary relevance.
    • A Free Database of University Web Links: Data Collection Issues

      Thelwall, Mike (2003)
      This paper describes and gives access to a database of the link structures of 109 UK university and higher education college websites, as created by a specialist information science web crawler in June and July of 2001. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.This paper describes a free set of databases of the link structures of the university web sites from a selection of countries, as created by a specialist information science web crawler. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.
    • A layered approach for investigating the topological structure of communities in the Web.

      Thelwall, Mike (MCB UP Ltd, 2003)
      A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
    • A research and institutional size-based model for national university Web site interlinking

      Thelwall, Mike (MCB UP Ltd, 2002)
      Web links are a phenomenon of interest to bibliometricians by analogy with citations, and to others because of their use in Web navigation and search engines. It is known that very few links on university Web sites are targeted at scholarly expositions and yet, at least in the UK and Australia, a correlation has been established between link count metrics for universities and measures of institutional research. This paper operates on a finer-grained level of detail, focussing on counts of links between pairs of universities. It provides evidence of an underlying linear relationship with the quadruple product of the size and research quality of both source and target institution. This simple model is proposed as applying generally to national university systems, subject to a series of constraints to identify cases where it is unlikely to be applicable. It is hoped that the model, if confirmed by studies of other countries, will open the door to deeper mining of academic Web link data.
    • Academic information on Twitter: A user survey

      Mohammadi, Ehsan; Thelwall, Mike; Kwasny, Mary; Holmes, Kristi L. (PLOS, 2018-05-17)
      Although counts of tweets citing academic papers are used as an informal indicator of interest, little is known about who tweets academic papers and who uses Twitter to find scholarly information. Without knowing this, it is difficult to draw useful conclusions from a publication being frequently tweeted. This study surveyed 1,912 users that have tweeted journal articles to ask about their scholarly-related Twitter uses. Almost half of the respondents (45%) did not work in academia, despite the sample probably being biased towards academics. Twitter was used most by people with a social science or humanities background. People tend to leverage social ties on Twitter to find information rather than searching for relevant tweets. Twitter is used in academia to acquire and share real-time information and to develop connections with others. Motivations for using Twitter vary by discipline, occupation, and employment sector, but not much by gender. These factors also influence the sharing of different types of academic information. This study provides evidence that Twitter plays a significant role in the discovery of scholarly information and cross-disciplinary knowledge spreading. Most importantly, the large numbers of non-academic users support the claims of those using tweet counts as evidence for the non-academic impacts of scholarly research
    • The Accuracy of Confidence Intervals for Field Normalised Indicators

      Thelwall, Mike; Fairclough, Ruth (Elsevier, 2017-04-07)
      When comparing the average citation impact of research groups, universities and countries, field normalisation reduces the influence of discipline and time. Confidence intervals for these indicators can help with attempts to infer whether differences between sets of publications are due to chance factors. Although both bootstrapping and formulae have been proposed for these, their accuracy is unknown. In response, this article uses simulated data to systematically compare the accuracy of confidence limits in the simplest possible case, a single field and year. The results suggest that the MNLCS (Mean Normalised Log-transformed Citation Score) confidence interval formula is conservative for large groups but almost always safe, whereas bootstrap MNLCS confidence intervals tend to be accurate but can be unsafe for smaller world or group sample sizes. In contrast, bootstrap MNCS (Mean Normalised Citation Score) confidence intervals can be very unsafe, although their accuracy increases with sample sizes.
    • An evaluation of syntactic simplification rules for people with autism

      Evans, Richard; Orasan, Constantin; Dornescu, Iustin (Association for Computational Linguistics, 2014)
      Syntactically complex sentences constitute an obstacle for some people with Autistic Spectrum Disorders. This paper evaluates a set of simplification rules specifically designed for tackling complex and compound sentences. In total, 127 different rules were developed for the rewriting of complex sentences and 56 for the rewriting of compound sentences. The evaluation assessed the accuracy of these rules individually and revealed that fully automatic conversion of these sentences into a more accessible form is not very reliable.
    • An initial exploration of the link relationship between UK university Web sites.

      Thelwall, Mike (MCB UP Ltd, 2002)
      Aggregates of links are of interest to information scientists in the same way as citation counts are: as potential sources of data from which new knowledge can be mined. Builds on the recent discovery of a correlation between a Web link count measure and the research quality of British universities by applying a range of multivariate statistical techniques to counts of links between pairs of universities. This represents an initial attempt at developing an understanding of this phenomenon. Extracts plausible results. Also identifies outliers in the data by the techniques, some of which were verified by being tracked down to identifiable Web phenomena. This is an important outcome because successful anomaly identification is a precondition to more effective analysis of this kind of data. The identification of groupings is encouraging evidence that Web links between universities can be mined for significant results, although it is clear that more methodological development is needed, if any but the simplest patterns are to be extracted. Finally, based upon the types of patterns extracted, argues that none of the methods used are capable of fully analysing link structures on their own.
    • An investigation of the online presence of UK universities on Instagram

      Stuart, Emma; Stuart, David; Thelwall, Mike (Emerald, 2017-08)
      Purpose – Rising tuition fees and a growing importance on league tables has meant that university branding is becoming more of a necessity to attract prospective staff, students, and funding. Whilst university websites are an important branding tool, academic institutions are also beginning to exploit social media. Image-based social media services such as Instagram are particularly popular at the moment. It is therefore logical for universities to have a presence on popular image-based social media services such as Instagram. This paper investigates the online presence of UK universities on Instagram in an initial investigation of use. Design/Methodology/Approach – This study utilizes webometric data collection, and content analysis methodology. Findings – The results indicate that at the time of data analysis for this investigation (Spring, 2015), UK universities had a limited presence on Instagram for general university accounts, with only 51 out of 128 institutions having an account. The most common types of images posted were humanizing (31.0%), showcasing (28.8%), and orienting (14.3%). Orienting images were more likely to receive likes than other image types, and crowdsourcing images were more likely to receive comments. Originality/Value – This paper gives a valuable insight into the image posting practices of UK universities on Instagram. The findings are of value to heads of marketing, online content creators, social media campaign managers, and anyone who is responsible for the marketing, branding, and promoting of a university’s services.
    • Are citations from clinical trials evidence of higher impact research? An analysis of ClinicalTrials.gov

      Thelwall, Mike; Kousha, Kayvan (Springer, 2016-09)
      An important way in which medical research can translate into improved health outcomes is by motivating or influencing clinical trials that eventually lead to changes in clinical practice. Citations from clinical trials records to academic research may therefore serve as an early warning of the likely future influence of the cited articles. This paper partially assesses this hypothesis by testing whether prior articles referenced in ClinicalTrials.gov records are more highly cited than average for the publishing journal. The results from four high profile general medical journals support the hypothesis, although there may not be a cause-and effect relationship. Nevertheless, it is reasonable for researchers to use citations to their work from clinical trials records as partial evidence of the possible long-term impact of their research.
    • Are Mendeley reader counts high enough for research evaluations when articles are published?

      Thelwall, Mike (Emerald, 2017-10-27)
      Purpose –Mendeley reader counts have been proposed as early indicators for the impact of academic publications. In response, this article assesses whether there are enough Mendeley readers for research evaluation purposes during the month when an article is first published. Design/methodology/approach – Average Mendeley reader counts were compared to average Scopus citation counts for 104520 articles from ten disciplines during the second half of 2016. Findings - Articles attracted, on average, between 0.1 and 0.8 Mendeley readers per article in the month in which they first appeared in Scopus. This is about ten times more than the average Scopus citation count. Research limitations/implications – Other subjects may use Mendeley more or less than the ten investigated here. The results are dependent on Scopus’s indexing practices, and Mendeley reader counts can be manipulated and have national and seniority biases. Practical implications – Mendeley reader counts during the month of publication are more powerful than Scopus citations for comparing the average impacts of groups of documents but are not high enough to differentiate between the impacts of typical individual articles. Originality/value - This is the first multi-disciplinary and systematic analysis of Mendeley reader counts from the publication month of an article.
    • Are Mendeley Reader Counts Useful Impact Indicators in all Fields?

      Thelwall, Mike (Springer, 2018-08)
      Reader counts from the social reference sharing site Mendeley are known to be valuable for early research evaluation. They have strong correlations with citation counts for journal articles but appear about a year before them. There are disciplinary differences in the value of Mendeley reader counts but systematic evidence is needed at the level of narrow fields to reveal its extent. In response, this article compares Mendeley reader counts with Scopus citation counts for journal articles from 2012 in 325 narrow Scopus fields. Despite strong positive correlations in most fields, averaging 0.671, the correlations in some fields are as weak as 0.255. Technical reasons explain most weaker correlations, suggesting that the underlying relationship is almost always strong. The exceptions are caused by unusually high educational or professional use or topics of interest within countries that avoid Mendeley. The findings suggest that if care is taken then Mendeley reader counts can be used for early citation impact evidence in almost all fields and for related impact in some of the remainder. As an additional application of the results, cross-checking with Mendeley data can be used to identify indexing anomalies in citation databases.
    • Are raw RSS feeds suitable for broad issue scanning? A science concern case study

      Thelwall, Mike; Prabowo, Rudy; Fairclough, Ruth (Wiley InterScience, 2006)
      Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 burst words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue.
    • Assessing the teaching value of non-English academic books: The case of Spain

      Mas Bleda, Amalia; Thelwall, Mike (Consejo Superior de Investigaciones Científicas, 2018-12-01)
    • Automated Web issue analysis: A nurse prescribing case study

      Thelwall, Mike; Thelwall, Saheeda; Fairclough, Ruth (Elsevier, 2006)
      Web issue analysis, a new automated technique designed to rapidly give timely management intelligence about a topic from an automated large-scale analysis of relevant pages from the Web, is introduced and demonstrated. The technique includes hyperlink and URL analysis to identify common direct and indirect sources of Web information. In addition, text analysis through natural language processing techniques is used identify relevant common nouns and noun phrases. A case study approach is taken, applying Web issue analysis to the topic of nurse prescribing. The results are presented in descriptive form and a qualitative analysis is used to argue that new information has been found. The nurse prescribing results demonstrate interesting new findings, such as the parochial nature of the topic in the UK, an apparent absence of similar concepts internationally, at least in the English-speaking world, and a significant concern with mental health issues. These demonstrate that automated Web issue analysis is capable of quickly delivering new insights into a problem. General limitations are that the success of Web issue analysis is dependant upon the particular topic chosen and the ability to find a phrase that accurately captures the topic and is not used in other contexts, as well as being language-specific.
    • Avoiding obscure topics and generalising findings produces higher impact research

      Thelwall, Mike (Springer, 2016-10-11)
      Much academic research is never cited and may be rarely read, indicating wasted effort from the authors, referees and publishers. One reason that an article could be ignored is that its topic is, or appears to be, too obscure to be of wide interest, even if excellent scholarship produced it. This paper reports a word frequency analysis of 874,411 English article titles from 18 different Scopus natural, formal, life and health sciences categories 2009-2015 to assess the likelihood that research on obscure (rarely researched) topics is less cited. In all categories examined, unusual words in article titles associate with below average citation impact research. Thus, researchers considering obscure topics may wish to reconsider, generalise their study, or to choose a title that reflects the wider lessons that can be drawn. Authors should also consider including multiple concepts and purposes within their titles in order to attract a wider audience.
    • Blog Searching: The First General-Purpose Source of Retrospective Public Opinion in the Social Sciences?

      Thelwall, Mike (Emerald, 2007)
      Purpose – To demonstrate how blog searching can be used as a retrospective source of public opinion. Design/methodology/approach - In this paper a variety of blog searching techniques are described and illustrated with a case study of the Danish cartoons affair. Findings - A time series analysis of related blog postings suggests that the Danish cartoons issue attracted little attention in the English-speaking world for four months after the initial publication of the cartoons, exploding only after the simultaneous start of diplomatic sanctions and a commercial boycott. Research limitations/implications – Blogs only reveal the opinions of bloggers, and blog analysis is language-specific. Sections of the world and the population of individual countries that do not have access to the internet will not be adequately represented in blogspace. Moreover, bloggers are self-selected and probably not representative of internet users. Originality/value - The existence of blog search engines now allows researchers to search blogspace for posts relating to any given debate, seeking either the opinions of blogging pundits or casual mentions in personal journals. It is possible to use blogs to examine topics before they first attracted mass media attention, as well as to dissect ongoing discussions. This gives a retrospective source of public opinion that is unique to blog search engines.
    • Book genre and author gender: romance>paranormal-romance to autobiography>memoir

      Thelwall, Mike (Wiley-Blackwell, 2016-05-20)
      Although gender differences are known to exist in the publishing industry and in reader preferences, there is little public systematic evidence about them. This article uses evidence from the book-based social website Goodreads to provide a large scale analysis of 50 major English book genres based on author genders. The results show gender differences in authorship in almost all categories and gender differences the level of interest in, and ratings of, books in a minority of categories. Perhaps surprisingly in this context, there is not a clear gender-based relationship between the success of an author and their prevalence within a genre. The unexpected almost universal authorship gender differences should give new impetus to investigations of the importance of gender in fiction and the success of minority genders in some genres should encourage publishers and librarians to take their work seriously, except perhaps for most male-authored chick-lit.