• A layered approach for investigating the topological structure of communities in the Web.

      Thelwall, Mike (MCB UP Ltd, 2003)
      A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
    • A Longitudinal Study of Academic Web Links: Identifying and Explaining Change

      Payne, Nigel (University of Wolverhampton, 2007)
      A problem common to all current web link analyses is that, as the web is continuously evolving, any web-based study may be out of date by the time it is published in academic literature. It is therefore important to know how web link analyses results vary over time, with a low rate of variation lengthening the amount of time corresponding to a tolerable loss in quality. Moreover, given the lack of research on how academic web spaces change over time, from an information science perspective it would interesting to see what patterns and trends could be identified by longitudinal research and the study of university web links seems to provide a convenient means by which to do so. The aim of this research is to identify and track changes in three academic webs (UK, Australia and New Zealand) over time, tracking various aspects of academic webs including site size and overall linking characteristics, and to provide theoretical explanations of the changes found. This should therefore provide some insight into the stability of previous and future webometric analyses. Alternative Document Models (ADMs), created with the purpose of reducing the extent to which anomalies occur in counts of web links at the page level, have been used extensively within webometrics as an alternative to using the web page as the basic unit of analysis. This research carries out a longitudinal study of ADMs in an attempt to ascertain which model gives the most consistent results when applied to the UK, Australia and New Zealand academic web spaces over the last six years. The results show that the domain ADM gives the most consistent results with the directory ADM also giving more reliable results than are evident when using the standard page model. Aggregating at the site (or university) level appears to provide less consistent results than using the page as the standard unit of measure, and this finding holds true over all three academic webs and for each time period examined over the last six years. The question of whether university web sites publish the same kind of information and use the same kind of hyperlinks year on year is important from the perspective of interpreting the results of academic link analyses, because changes in link types over time would also force interpretations of link analyses to change over time. This research uses a link classification exercise to identify temporal changes in the distribution of different types of academic web links, using three academic web spaces in the years 2000 and 2006. Significant increases in ‘research oriented’, ‘social/leisure’ and ‘superficial’ links were identified as well as notable decreases in the ‘technical’ and ‘personal’ links. Some of these changes identified may be explained by general changes in the management of university web sites and some by more wide-spread Internet trends, e.g., dynamic pages, blogs and social networking. The increase in the proportion of research-oriented links is particularly hopeful for future link analysis research. Identifying quantitative trends in the UK, Australian and New Zealand academic webs from 2000 to 2005 revealed that the number of static pages and links in each of the three academic webs appears to have stabilised as far back as 2001. This stabilisation may be partly due to an increase in dynamic pages which are normally excluded from webometric analyses. In response to the problem for webometricians due to the constantly changing nature of the Internet, the results presented here are encouraging evidence that webometrics for academic spaces may have a longer-term validity than would have been previously assumed. The relationship between university inlinks and research activity indicators over time was examined, as well as the reasons for individual universities experiencing significant increases and decreases in inlinks over the last six years. The findings indicate that between 66% and 70% of outlinks remain the same year on year for all three academic web spaces, although this stability conceals large individual differences. Moreover, there is evidence of a level of stability over time for university site inlinks when measured against research. Surprisingly however, inlink counts can vary significantly from year to year for individual universities, for reasons unrelated to research, underlining that webometric results should be interpreted cautiously at the level of individual universities. Therefore, on average since 2001 the university web sites of the UK, Australia and New Zealand have been relatively stable in terms of size and linking patterns, although this hides a constant renewing of old pages and areas of the sites. In addition, the proportion of research-related links seems to be slightly increasing. Whilst the former suggests that webometric results are likely to have a surprisingly long shelf-life, perhaps closer to five years than one year, the latter suggests that webometrics is going to be increasingly useful as a tool to track research online. While there have already been many studies involving academic webs spaces, and much work has been carried out on the web from a longitudinal perspective, this thesis concentrates on filling a critical gap in current webometric research by combining the two and undertaking a longitudinal study of academic webs. In comparison with previous web-related longitudinal studies this thesis makes a number of novel contributions. Some of these stem from extending established webometric results, either by introducing a longitudinal aspect (looking at how various academic web metrics such as research activity indicators, site size or inlinks change over time) or by their application to other countries. Other contributions are made by combining traditional webometric methods (e.g. combining topical link classification exercises with longitudinal study) or by identifying and examining new areas for research (for example, dynamic pages and non-HTML documents). No previous web-based longitudinal studies have focused on academic links and so the main findings that (for UK, Australian and New Zealand academic webs between 2000 and 2006) certain academic link types exhibit changing patterns over time, approximately two-thirds of outlinks remain the same year on year and the number of static pages and links appears to have stabilised are both significant and novel.
    • Dimensions of web site credibility and their relation to active trust and behavioural impact

      Cugelman, Brian; Thelwall, Mike; Dawes, Philip L. (Association for Information Systems (AIS), 2009)
      This paper discusses two trends that threaten to undermine the effectiveness of online social marketing interventions: growing mistrust and competition. As a solution, this paper examines the relationships between Web site credibility, target audiences’ active trust and behaviour. Using structural equation modelling to evaluate two credibility models, this study concludes that Web site credibility is best considered a three-dimensional construct composed of expertise, trustworthiness and visual appeal, and that trust plays a partial mediating role between Web site credibility and behavioural impacts. The paper examines theoretical implications of conceptualizing Web sites according to a human credibility model, and factoring trust into Internet-based behavioural change interventions. Practical guidelines suggest ways to address these findings when planning online social marketing interventions.
    • Disciplinary Differences in Academic Web Presence – A Statistical Study of the UK

      Thelwall, Mike; Price, Liz (Walter de Gruyter, 2003)
      The Web has become an important tool for scholars to publicise their activities and disseminate their findings. In the information age, those who do not use it risk being bypassed. In this paper we introduce a statistical technique to assess the extent to which the broad spectrum of research areas are visible online in UK universities. Five broad subject categories are used for research, and inlink counts are used as indicators of online visibility or impact. The approach is designed to give more complete subject coverage than previous studies and to avoid the conceptual difficulties of a page classification approach, although one is used for triangulation. The results suggest that Science and Engineering dominate university Web presences, but with Humanities and Arts also achieving a high presence relative to its size, showing that high Web impact does not have to be restricted to the sciences. Research funding bodies should now consider whether action needs to be taken to ensure that opportunities are not being missed in the lower Web impact areas.
    • Do the Web sites of higher rated scholars have significantly more online impact?

      Thelwall, Mike; Harries, Gareth (Wiley, 2004)
      The quality and impact of academic Web sites is of interest to many audiences, including the scholars who use them and Web educators who need to identify best practice. Several large-scale European Union research projects have been funded to build new indicators for online scientific activity, reflecting recognition of the importance of the Web for scholarly communication. In this paper we address the key question of whether higher rated scholars produce higher impact Web sites, using the United Kingdom as a case study and measuring scholars' quality in terms of university-wide average research ratings. Methodological issues concerning the measurement of the online impact are discussed, leading to the adoption of counts of links to a university's constituent single domain Web sites from an aggregated counting metric. The findings suggest that universities with higher rated scholars produce significantly more Web content but with a similar average online impact. Higher rated scholars therefore attract more total links from their peers, but only by being more prolific, refuting earlier suggestions. It can be surmised that general Web publications are very different from scholarly journal articles and conference papers, for which scholarly quality does associate with citation impact. This has important implications for the construction of new Web indicators, for example that online impact should not be used to assess the quality of small groups of scholars, even within a single discipline.
    • Evidence for the existence of geographic trends in university web site interlinking

      Thelwall, Mike (MCB UP Ltd, 2002)
      The Web is an important medium for scholarly communication of various types, perhaps eventually to replace entirely some traditional mechanisms such as print journals. Yet the Web analogy of citations, hyperlinks, are much more varied in use and existing citation techniques are difficult to generalise to the new medium. In this context, one new challenging object of study is the modern multi-faceted, multi-genre, partly unregulated university Web site. This paper develops a methodology to analyse the patterns of interlinking between university Web sites and uses it to indicate that the degree of interlinking decreases with distance, at least in the UK. This is perhaps not in itself a surprising result, despite claims of a paradigm shift from the traditional virtual college towards collaboratories, but the methodology developed can also be used to refine existing Web link metrics to produce more powerful tools for comparing groups of sites.
    • Finding similar academic Web sites with links, bibliometric couplings and colinks

      Thelwall, Mike; Wilkinson, David (Elsevier, 2004)
      A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
    • Graph structure in three national academic Webs: Power laws with anomalies

      Thelwall, Mike; Wilkinson, David (Wiley, 2003)
      The graph structures of three national university publicly indexable Webs from Australia, New Zealand, and the UK were analyzed. Strong scale-free regularities for page indegrees, outdegrees, and connected component sizes were in evidence, resulting in power laws similar to those previously identified for individual university Web sites and for the AltaVista-indexed Web. Anomalies were also discovered in most distributions and were tracked down to root causes. As a result, resource driven Web sites and automatically generated pages were identified as representing a significant break from the assumptions of previous power law models. It follows that attempts to track average Web linking behavior would benefit from using techniques to minimize or eliminate the impact of such anomalies.
    • Hyperlinks as a data source for science mapping

      Harries, Gareth; Wilkinson, David; Price, Liz; Fairclough, Ruth; Thelwall, Mike (Sage, 2004)
      Hyperlinks between academic web sites, like citations, can potentially be used to map disciplinary structures and identify evidence of connections between disciplines. In this paper we classified a sample of links originating in three different disciplines: maths, physics and sociology. Links within a discipline were found to be different in character to links between pages in different disciplines. There were also disciplinary differences in both types of link. As a consequence, we argue that interpretations of web science maps covering multiple disciplines will need to be sensitive to the contexts of the links mapped.
    • Language evolution and the spread of ideas on the Web: A procedure for identifying emergent hybrid word family members

      Thelwall, Mike; Price, Liz (Wiley, 2006)
      Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
    • Linguistic patterns of academic Web use in Western Europe

      Thelwall, Mike; Tang, Rong; Price, Liz (Springer, 2003)
      A survey of linguistic dimensions of Web site hosting and interlinking of the universities of sixteen European countries is described. The results show that English is the dominant language both for linking pages and for all pages. In a typical country approximately half the pages were in English and half in one or more national languages. Normalised interlinking patterns showed three trends: 1) international interlinking throughout Europe in English, and additionally in Swedish in Scandinavia; 2) linking between countries sharing a common language, and 3) countries extensively hosting international links in their own major languages. This provides evidence for the multilingual character of academic use of the Web in Western Europe, at least outside the UK and Eire. Evidence was found that Greece was significantly linguistically isolated from the rest of the EU but that outsiders Norway and Switzerland were not.
    • The pros and cons of the use of altmetrics in research assessment

      Thelwall, Michael (Levy Library Press, 2020-05-12)
      Many indicators derived from the web have been proposed to supplement citation-based indicators in support of research assessments. These indicators, often called altmetrics, are available commercially from Altmetric.com and Elsevier’s Plum Analytics or can be collected directly. These organisations can also deliver altmetrics to support institutional selfevaluations. The potential advantages of altmetrics for research evaluation are that they may reflect important non-academic impacts and may appear before citations when an article is published, thus providing earlier impact evidence. Their disadvantages often include susceptibility to gaming, data sparsity, and difficulties translating the evidence into specific types of impact. Despite these limitations, altmetrics have been widely adopted by publishers, apparently to give authors, editors and readers insights into the level of interest in recently published articles. This article summarises evidence for and against extending the adoption of altmetrics to research evaluations. It argues that whilst systematicallygathered altmetrics are inappropriate for important formal research evaluations, they can play a role in some other contexts. They can be informative when evaluating research units that rarely produce journal articles, when seeking to identify evidence of novel types of impact during institutional or other self-evaluations, and when selected by individuals or groups to support narrative-based non-academic claims. In addition, Mendeley reader counts are uniquely valuable as early (mainly) scholarly impact indicators to replace citations when gaming is not possible and early impact evidence is needed. Organisations using alternative indicators need recruit or develop in-house expertise to ensure that they are not misused, however.
    • Three practical field normalised alternative indicator formulae for research evaluation

      Thelwall, Mike (Elsevier, 2017-01-04)
      Although altmetrics and other web-based alternative indicators are now commonplace in publishers’ websites, they can be difficult for research evaluators to use because of the time or expense of the data, the need to benchmark in order to assess their values, the high proportion of zeros in some alternative indicators, and the time taken to calculate multiple complex indicators. These problems are addressed here by (a) a field normalisation formula, the Mean Normalised Log-transformed Citation Score (MNLCS) that allows simple confidence limits to be calculated and is similar to a proposal of Lundberg, (b) field normalisation formulae for the proportion of cited articles in a set, the Equalised Mean-based Normalised Proportion Cited (EMNPC) and the Mean-based Normalised Proportion Cited (MNPC), to deal with mostly uncited data sets, (c) a sampling strategy to minimise data collection costs, and (d) free unified software to gather the raw data, implement the sampling strategy, and calculate the indicator formulae and confidence limits. The approach is demonstrated (but not fully tested) by comparing the Scopus citations, Mendeley readers and Wikipedia mentions of research funded by Wellcome, NIH, and MRC in three large fields for 2013–2016. Within the results, statistically significant differences in both citation counts and Mendeley reader counts were found even for sets of articles that were less than six months old. Mendeley reader counts were more precise than Scopus citations for the most recent articles and all three funders could be demonstrated to have an impact in Wikipedia that was significantly above the world average.
    • Three target document range metrics for university web sites

      Thelwall, Mike; Wilkinson, David (Wiley, 2003)
      Three new metrics are introduced that measure the range of use of a university Web site by its peers through different heuristics for counting links targeted at its pages. All three give results that correlate significantly with the research productivity of the target institution. The directory range model, which is based upon summing the number of distinct directories targeted by each other university, produces the most promising results of any link metric yet. Based upon an analysis of changes between models, it is suggested that range models measure essentially the same quantity as their predecessors but are less susceptible to spurious causes of multiple links and are therefore more robust.
    • Web Manifestations of Knowledge-Based Innovation Systems in the UK

      Thelwall, Mike; Musgrove, Peter; Wilkinson, David; Stuart, David (University of Wolverhampton, 2008)
      Innovation is widely recognised as essential to the modern economy. The term knowledgebased innovation system has been used to refer to innovation systems which recognise the importance of an economy’s knowledge base and the efficient interactions between important actors from the different sectors of society. Such interactions are thought to enable greater innovation by the system as a whole. Whilst it may not be possible to fully understand all the complex relationships involved within knowledge-based innovation systems, within the field of informetrics bibliometric methodologies have emerged that allows us to analyse some of the relationships that contribute to the innovation process. However, due to the limitations in traditional bibliometric sources it is important to investigate new potential sources of information. The web is one such source. This thesis documents an investigation into the potential of the web to provide information about knowledge-based innovation systems in the United Kingdom. Within this thesis the link analysis methodologies that have previously been successfully applied to investigations of the academic community (Thelwall, 2004a) are applied to organisations from different sections of society to determine whether link analysis of the web can provide a new source of information about knowledge-based innovation systems in the UK. This study makes the case that data may be collected ethically to provide information about the interconnections between web sites of various different sizes and from within different sectors of society, that there are significant differences in the linking practices of web sites within different sectors, and that reciprocal links provide a better indication of collaboration than uni-directional web links. Most importantly the study shows that the web provides new information about the relationships between organisations, rather than just a repetition of the same information from an alternative source. Whilst the study has shown that there is a lot of potential for the web as a source of information on knowledge-based innovation systems, the same richness that makes it such a potentially useful source makes applications of large scale studies very labour intensive.