• Design and development of a concept-based multi-document summarization system for research abstracts

      Ou, Shiyan; Khoo, Christopher S.G.; Goh, Dion H. (Sage, 2008)
      This paper describes a new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration. Dissertation abstracts in the field of sociology were selected as sample documents for this study. The summarization process includes four major steps — (1) parsing dissertation abstracts into five standard sections; (2) extracting research concepts (often operationalized as research variables) and their relationships, the research methods used and the contextual relations from specific sections of the text; (3) integrating similar concepts and relationships across different abstracts; and (4) combining and organizing the different kinds of information using a variable-based framework, and presenting them in an interactive web-based interface. The accuracy of each summarization step was evaluated by comparing the system-generated output against human coding. The user evaluation carried out in the study indicated that the majority of subjects (70%) preferred the concept-based summaries generated using the system to the sentence-based summaries generated using traditional sentence extraction techniques.
    • Detecting high-functioning autism in adults using eye tracking and machine learning

      Yaneva, Victoria; Ha, Le An; Eraslan, Sukru; Yesilada, Yeliz; Mitkov, Ruslan (Institute of Electrical and Electronics Engineers (IEEE), 2020-04-30)
      The purpose of this study is to test whether visual processing differences between adults with and without highfunctioning autism captured through eye tracking can be used to detect autism. We record the eye movements of adult participants with and without autism while they look for information within web pages. We then use the recorded eye-tracking data to train machine learning classifiers to detect the condition. The data was collected as part of two separate studies involving a total of 71 unique participants (31 with autism and 40 control), which enabled the evaluation of the approach on two separate groups of participants, using different stimuli and tasks. We explore the effects of a number of gaze-based and other variables, showing that autism can be detected automatically with around 74% accuracy. These results confirm that eye-tracking data can be used for the automatic detection of high-functioning autism in adults and that visual processing differences between the two groups exist when processing web pages.
    • Differences between journals and years in the proportions of students, researchers and faculty registering Mendeley articles

      Thelwall, Mike (Springer, 2018-02-20)
      This article contains two investigations into Mendeley reader counts with the same dataset. Mendeley reader counts provide evidence of early scholarly impact for journal articles, but reflect the reading of a relatively young subset of all researchers. To investigate whether this age bias is constant or varies by narrow field and publication year, this article compares the proportions of student, researcher and faculty readers for articles published 1996-2016 in 36 large monodisciplinary journals. In these journals, undergraduates recorded the newest research and faculty the oldest, with large differences between journals. The existence of substantial differences in the composition of readers between related fields points to the need for caution when using Mendeley readers as substitutes for citations for broad fields. The second investigation shows, with the same data, that there are substantial differences between narrow fields in the time taken for Scopus citations to be as numerous as Mendeley readers. Thus, even narrow field differences can impact on the relative value of Mendeley compared to citation counts.
    • Dimensions of web site credibility and their relation to active trust and behavioural impact

      Cugelman, Brian; Thelwall, Mike; Dawes, Philip L. (Association for Information Systems (AIS), 2009)
      This paper discusses two trends that threaten to undermine the effectiveness of online social marketing interventions: growing mistrust and competition. As a solution, this paper examines the relationships between Web site credibility, target audiences’ active trust and behaviour. Using structural equation modelling to evaluate two credibility models, this study concludes that Web site credibility is best considered a three-dimensional construct composed of expertise, trustworthiness and visual appeal, and that trust plays a partial mediating role between Web site credibility and behavioural impacts. The paper examines theoretical implications of conceptualizing Web sites according to a human credibility model, and factoring trust into Internet-based behavioural change interventions. Practical guidelines suggest ways to address these findings when planning online social marketing interventions.
    • Dimensions: A Competitor to Scopus and the Web of Science?

      Thelwall, Mike (Elsevier, 2018-03-26)
      Dimensions is a partly free scholarly database launched by Digital Science in January 2018. Dimensions includes journal articles and citation counts, making it a potential new source of impact data. This article explores the value of Dimensions from an impact assessment perspective with an examination of Food Science research 2008-2018 and a random sample of 10,000 Scopus articles from 2012. The results include high correlations between citation counts from Scopus and Dimensions (0.96 by narrow field in 2012) as well as similar average counts. Almost all Scopus articles with DOIs were found in Dimensions (97% in 2012). Thus, the scholarly database component of Dimensions seems to be a plausible alternative to Scopus and the Web of Science for general citation analyses and for citation data in support of some types of research evaluations.
    • Disciplinary Differences in Academic Web Presence – A Statistical Study of the UK

      Thelwall, Mike; Price, Liz (Walter de Gruyter, 2003)
      The Web has become an important tool for scholars to publicise their activities and disseminate their findings. In the information age, those who do not use it risk being bypassed. In this paper we introduce a statistical technique to assess the extent to which the broad spectrum of research areas are visible online in UK universities. Five broad subject categories are used for research, and inlink counts are used as indicators of online visibility or impact. The approach is designed to give more complete subject coverage than previous studies and to avoid the conceptual difficulties of a page classification approach, although one is used for triangulation. The results suggest that Science and Engineering dominate university Web presences, but with Humanities and Arts also achieving a high presence relative to its size, showing that high Web impact does not have to be restricted to the sciences. Research funding bodies should now consider whether action needs to be taken to ensure that opportunities are not being missed in the lower Web impact areas.
    • Discovery of event entailment knowledge from text corpora

      Pekar, Viktor (Elsevier, 2008)
      Event entailment is knowledge that may prove useful for a variety of applications dealing with inferencing over events described in natural language texts. In this paper, we propose a method for automatic discovery of pairs of verbs related by entailment, such as X buy Y X own Y and appoint X as Y X become Y. In contrast to previous approaches that make use of lexico-syntactic patterns and distributional evidence, the underlying assumption of our method is that the implication of one event by another manifests itself in the regular co-occurrence of the two corresponding verbs within locally coherent text. Based on the analogy with the problem of learning selectional preferences Resnik’s [Resnik, P., 1993. Selection and information: a class-based approach to lexical relationships, Ph.D. Thesis, University of Pennsylvania] association strength measure is used to score the extracted verb pairs for asymmetric association in order to discover the direction of entailment in each pair. In our experimental evaluation, we examine the effect that various local discourse indicators produce on the accuracy of this model of entailment. After that we carry out a direct evaluation of the verb pairs against human subjects’ judgements and extrinsically evaluate the pairs on the task of noun phrase coreference resolution.
    • Disseminating research with web CV hyperlinks

      Kousha, Kayvan; Thelwall, Mike (John Wiley & Sons Ltd, 2014-07-03)
      Some curricula vitae (web CVs) of academics on the web, including homepages and publication lists, link to open‐access (OA) articles, resources, abstracts in publishers' websites, or academic discussions, helping to disseminate research. To assess how common such practices are and whether they vary by discipline, gender, and country, the authors conducted a large‐scale e‐mail survey of astronomy and astrophysics, public health, environmental engineering, and philosophy across 15 European countries and analyzed hyperlinks from web CVs of academics. About 60% of the 2,154 survey responses reported having a web CV or something similar, and there were differences between disciplines, genders, and countries. A follow‐up outlink analysis of 2,700 web CVs found that a third had at least one outlink to an OA target, typically a public eprint archive or an individual self‐archived file. This proportion was considerably higher in astronomy (48%) and philosophy (37%) than in environmental engineering (29%) and public health (21%). There were also differences in linking to publishers' websites, resources, and discussions. Perhaps most important, however, the amount of linking to OA publications seems to be much lower than allowed by publishers and journals, suggesting that many opportunities for disseminating full‐text research online are being missed, especially in disciplines without established repositories. Moreover, few academics seem to be exploiting their CVs to link to discussions, resources, or article abstracts, which seems to be another missed opportunity for publicizing research.
    • Do females create higher impact research? Scopus citations and Mendeley readers for articles from five countries

      Thelwall, Mike (Elsevier, 2018-09-01)
      There are known gender imbalances in participation in scientific fields, from female dominance of nursing to male dominance of mathematics. It is not clear whether there is also a citation imbalance, with some claiming that male-authored research tends to be more cited. No previous study has assessed gender differences in the readers of academic research on a large scale, however. In response, this article assesses whether there are gender differences in the average citations and/or Mendeley readers of academic publications. Field normalised logged Scopus citations and Mendeley readers from mid-2018 for articles published in 2014 were investigated for articles with first authors from India, Spain, Turkey, the UK and the USA in up to 251 fields with at least 50 male and female authors. Although female-authored research is less cited in Turkey (−4.0%) and India (−3.6%), it is marginally more cited in Spain (0.4%), the UK (0.4%), and the USA (0.2%). Female-authored research has fewer Mendeley readers in India (−1.1%) but more in Spain (1.4%), Turkey (1.1%), the UK (2.7%) and the USA (3.0%). Thus, whilst there may be little practical gender difference in citation impact in countries with mature science systems, the higher female readership impact suggests a wider audience for female-authored research. The results also show that the conclusions from a gender analysis depend on the field normalisation method. A theoretically informed decision must therefore be made about which normalisation to use. The results also suggest that arithmetic mean-based field normalisation is favourable to males.
    • Do gendered citation advantages influence field participation? Four unusual fields in the USA 1996-2017

      Thelwall, Mike (Springer, 2018-09-29)
      Gender inequalities in science are an ongoing concern, but their current causes are not well understood. This article investigates four fields with unusual proportions of female researchers in the USA for their subject matter, according to some current theories. It assesses how their gender composition and gender differences in citation rates have changed over time. All fields increased their share of female first-authored research, but at varying rates. The results give no evidence of the importance of citations, despite their unusual gender characteristics. For example, the field with the highest share of female-authored research and the most rapid increase had the largest male citation advantage. Differing micro-specialisms seems more likely than bias to be a cause of gender differences in citation rates, when present.
    • Do journal data sharing mandates work? Life sciences evidence from Dryad

      Thelwall, Mike; Kousha, Kayvan (Emerald, 2017-01-01)
      Purpose: Data sharing is widely thought to help research quality and efficiency. Since data sharing mandates are increasingly adopted by journals this paper assesses whether they work. Design/methodology: This study examines two evolutionary biology journals, Evolution and Heredity, that have data sharing mandates and make extensive use of Dryad. It uses a quantitative analysis of presence in Dryad, downloads and citations. Findings: Within both journals, data sharing seems to be complete showing that the mandates work on a technical level. Low correlations (0.15-0.18) between data downloads and article citation counts for articles published in 2012 within these journals indicate a weak relationship between data sharing and research impact. An average of 40-55 data downloads per article after a few years suggests that some use is found for shared life sciences data. Research limitations: The value of shared data uses is unclear. Practical implications: Data sharing mandates should be encouraged as an effective strategy. Originality/value: This is the first analysis of the effectiveness of data sharing mandates.
    • Do Mendeley reader counts indicate the value of arts and humanities research?

      Thelwall, Mike (Sage, 2017-09-19)
      Mendeley reader counts are a good source of early impact evidence for the life and natural sciences articles because they are abundant, appear before citations, and correlate moderately or strongly with citations in the long term. Early studies have found less promising results for the humanities and this article assesses whether the situation has now changed. Using Mendeley reader counts for articles in twelve arts and humanities Scopus subcategories, the results show that Mendeley reader counts reflect Scopus citation counts in most arts and humanities as strongly as in other areas of scholarship. Thus, Mendeley can be used as an early citation impact indicator in the arts and humanities, although it is unclear whether reader or citation counts reflect the underlying value of arts and humanities research.
    • Do Mendeley reader counts reflect the scholarly impact of conference papers? An investigation of computer science and engineering

      Aduku, Kuku Joseph; Thelwall, Mike; Kousha, Kayvan (Springer, 2017-04-13)
      Counts of Mendeley readers may give useful evidence about the impact of published re-search. Although previous studies have found significant positive correlations between counts of Mendeley readers and citation counts for journal articles, it is not known if this is equally true for conference papers. To fill this gap, Mendeley readership data and Scopus citation counts were extracted for both journal articles and conference papers published in 2011 in four fields for which conferences are important: Computer Science Applications; Computer Software; Building & Construction Engineering; and Industrial & Manufacturing Engineer-ing. Mendeley readership counts correlated moderately with citation counts for both journal articles and conference papers in Computer Science Applications and Computer Software. The correlations were much lower between Mendeley readers and citation counts for confer-ence papers than for journal articles in Building & Construction Engineering and Industrial & Manufacturing Engineering. Hence, there seem to be disciplinary differences in the useful-ness of Mendeley readership counts as impact indicators for conference papers, even between fields for which conferences are important.
    • Do online resources give satisfactory answers to questions about meaning and phraseology?

      Hanks, Patrick; Franklin, Emma (Springer, 2019-09-18)
      In this paper we explore some aspects of the differences between printed paper dictionaries and online dictionaries in the ways in which they explain meaning and phraseology. After noting the importance of the lexicon as an inventory of linguistic items and the neglect in both linguistics and lexicography of phraseological aspects of that inventory, we investigate the treatment in online resources of phraseology – in particular, the phrasal verbs wipe out and put down – and we go on to investigate a word, dope, that has undergone some dramatic meaning changes during the 20th century. In the course of discussion, we mention the new availability of corpus evidence and the technique of Corpus Pattern Analysis, which is important for linking phraseology and meaning and distinguishing normal phraseology from rare and unusual phraseology. The online resources that we discuss include Google, the Urban Dictionary (UD), and Wiktionary.
    • Do prestigious Spanish scholarly book publishers have more teaching impact?

      Mas-Bleda, Amalia; Thelwall, Mike (Emerald Publishing Limited, 2018-10-10)
      Purpose The purpose of this paper is to assess the educational value of prestigious and productive Spanish scholarly publishers based on mentions of their books in online scholarly syllabi. Design/methodology/approach Syllabus mentions of 15,117 books from 27 publishers were searched for, manually checked and compared with Microsoft Academic (MA) citations. Findings Most books published by Ariel, Síntesis, Tecnos and Cátedra have been mentioned in at least one online syllabus, indicating that their books have consistently high educational value. In contrast, few books published by the most productive publishers were mentioned in online syllabi. Prestigious publishers have both the highest educational impact based on syllabus mentions and the highest research impact based on MA citations. Research limitations/implications The results might be different for other publishers. The online syllabus mentions found may be a small fraction of the syllabus mentions of the sampled books. Practical implications Authors of Spanish-language social sciences and humanities books should consider general prestige when selecting a publisher if they want educational uptake for their work. Originality/value This is the first study assessing book publishers based on syllabus mentions.
    • Do the Web sites of higher rated scholars have significantly more online impact?

      Thelwall, Mike; Harries, Gareth (Wiley, 2004)
      The quality and impact of academic Web sites is of interest to many audiences, including the scholars who use them and Web educators who need to identify best practice. Several large-scale European Union research projects have been funded to build new indicators for online scientific activity, reflecting recognition of the importance of the Web for scholarly communication. In this paper we address the key question of whether higher rated scholars produce higher impact Web sites, using the United Kingdom as a case study and measuring scholars' quality in terms of university-wide average research ratings. Methodological issues concerning the measurement of the online impact are discussed, leading to the adoption of counts of links to a university's constituent single domain Web sites from an aggregated counting metric. The findings suggest that universities with higher rated scholars produce significantly more Web content but with a similar average online impact. Higher rated scholars therefore attract more total links from their peers, but only by being more prolific, refuting earlier suggestions. It can be surmised that general Web publications are very different from scholarly journal articles and conference papers, for which scholarly quality does associate with citation impact. This has important implications for the construction of new Web indicators, for example that online impact should not be used to assess the quality of small groups of scholars, even within a single discipline.
    • Does Astronomy research become too dated for the public? Wikipedia citations to Astronomy and Astrophysics journal articles 1996-2014

      Thelwall, Mike (Fundacion Espanola para la ciencia y la technologia, 2016-11-14)
      Astronomy is a natural science attracting substantial public interest. On a human scale, most individual celestial objects are essentially unchanging but is the same true for interest in astronomy research? This article uses the popular online encyclopedia Wikipedia as a proxy for public interest in academic research and assesses the extent to which it cites astronomy and astrophysics articles published between 1996 and 2014. Automatic Bing searches in Webometric Analyst were used to count the number of citations to astronomy and astrophysics articles from Wikipedia. The results show that older papers from before 2008 are increasingly less likely to be cited. This is true overall and in most of the major language versions of Wikipedia, although it may reflect editors’ interests rather than the public’s interests. This is consistent with a moderate tendency towards obsolescence in public interest in research, although it is probably affected by the dates on which most Wikipedia content on the topic was created. Papers may become obsolete if they report evidence that are later superseded by improved data or if they propose a model that is later replaced.
    • Does female-authored research have more educational impact than male-authored research?

      Thelwall, Mike (Levy Library Press, 2018-10-04)
      Female academics are more likely to be in teaching-related roles in some countries, including the USA. As a side effect of this, female-authored journal articles may tend to be more useful for students. This study assesses this hypothesis by investigating whether female first-authored research has more uptake in education than male first-authored research. Based on an analysis of Mendeley readers of articles from 2014 in five countries and 100 narrow Scopus subject categories, the results show that female-authored articles attract more student readers than male-authored articles in Spain, Turkey, the UK and USA but not India. They also attract fewer professorial readers in Spain, the UK and the USA, but not India and Turkey, and tend to be less popular with senior academics. Because the results are based on analysis of differences within narrow fields they cannot be accounted for by females working in more education-related disciplines. The apparent additional educational impact for female-authored research could be due to selecting more accessible micro-specialisms, however, such as health-related instruments within the instrumentation narrow field. Whatever the cause, the results suggest that citation-based research evaluations may undervalue the wider impact of female researchers.
    • Does Mendeley provide evidence of the educational value of journal articles?

      Thelwall, Mike (Wiley-Blackwell, 2016-12-07)
      Research articles seem to have direct value for students in some subject areas, even though scholars may be their target audience. If this can be proven to be true, then subject areas with this type of educational impact could justify claims for enhanced funding. To seek evidence of disciplinary differences in the direct educational uptake of journal articles, but ignoring books, conference papers, and other scholarly outputs, this paper assesses the total number and proportions of student readers of academic articles in Mendeley across 12 different subjects. The results suggest that whilst few students read mathematics research articles, in other areas, the number of student readers is broadly proportional to the number of research readers. Although the differences in the average numbers of undergraduate readers of articles varies by up to 50 times between subjects, this could be explained by the differing levels of uptake of Mendeley rather than the differing educational value of disciplinary research. Overall, then, the results do not support the claim that journal articles in some areas have substantially more educational value than average for academia, compared with their research value.
    • Does Microsoft Academic find early citations?

      Thelwall, Mike (Springer, 2017-10-27)
      This article investigates whether Microsoft Academic can use its web search component to identify early citations to recently published articles to help solve the problem of delays in research evaluations caused by the need to wait for citation counts to accrue. The results for 44,398 articles in Nature, Science and seven library and information science journals 1996-2017 show that Microsoft Academic and Scopus citation counts are similar for all years, with no early citation advantage for either. In contrast, Mendeley reader counts are substantially higher for more recent articles. Thus, Microsoft Academic appears to be broadly like Scopus for citation count data, and is apparently not more able to take advantage of online preprints to find early citations.