• An initial exploration of the link relationship between UK university Web sites.

      Thelwall, Mike (MCB UP Ltd, 2002)
      Aggregates of links are of interest to information scientists in the same way as citation counts are: as potential sources of data from which new knowledge can be mined. Builds on the recent discovery of a correlation between a Web link count measure and the research quality of British universities by applying a range of multivariate statistical techniques to counts of links between pairs of universities. This represents an initial attempt at developing an understanding of this phenomenon. Extracts plausible results. Also identifies outliers in the data by the techniques, some of which were verified by being tracked down to identifiable Web phenomena. This is an important outcome because successful anomaly identification is a precondition to more effective analysis of this kind of data. The identification of groupings is encouraging evidence that Web links between universities can be mined for significant results, although it is clear that more methodological development is needed, if any but the simplest patterns are to be extracted. Finally, based upon the types of patterns extracted, argues that none of the methods used are capable of fully analysing link structures on their own.
    • An intelligible implementation of FastSLAM2.0 on a low-power embedded architecture

      Jiménez Serrata, Albert A.; Yang, Shufan; Li, Renfa (Springer, 2017-03-02)
      The simultaneous localisation and mapping (SLAM) algorithm has drawn increasing interests in autonomous robotic systems. However, SLAM has not been widely explored in embedded system design spaces yet due to the limitation of processing recourses in embedded systems. Especially when landmarks are not identifiable, the amount of computer processing will dramatically increase due to unknown data association. In this work, we propose an intelligible SLAM solution for an embedded processing platform to reduce computer processing time using a low-variance resampling technique. Our prototype includes a low-cost pixy camera, a Robot kit with L298N motor board and Raspberry Pi V2.0. Our prototype is able to recognise artificial landmarks in a real environment with an average 75% of identified landmarks in corner detection and corridor detection with only average 1.14 W.
    • An investigation of the online presence of UK universities on Instagram

      Stuart, Emma; Stuart, David; Thelwall, Mike (Emerald, 2017-08-01)
      Purpose – Rising tuition fees and a growing importance on league tables has meant that university branding is becoming more of a necessity to attract prospective staff, students, and funding. Whilst university websites are an important branding tool, academic institutions are also beginning to exploit social media. Image-based social media services such as Instagram are particularly popular at the moment. It is therefore logical for universities to have a presence on popular image-based social media services such as Instagram. This paper investigates the online presence of UK universities on Instagram in an initial investigation of use. Design/Methodology/Approach – This study utilizes webometric data collection, and content analysis methodology. Findings – The results indicate that at the time of data analysis for this investigation (Spring, 2015), UK universities had a limited presence on Instagram for general university accounts, with only 51 out of 128 institutions having an account. The most common types of images posted were humanizing (31.0%), showcasing (28.8%), and orienting (14.3%). Orienting images were more likely to receive likes than other image types, and crowdsourcing images were more likely to receive comments. Originality/Value – This paper gives a valuable insight into the image posting practices of UK universities on Instagram. The findings are of value to heads of marketing, online content creators, social media campaign managers, and anyone who is responsible for the marketing, branding, and promoting of a university’s services.
    • Anaphora Resolution

      Mitkov, Ruslan (Longman, 2002)
    • Anaphora Resolution: To What Extent Does It Help NLP Applications?

      Mitkov, Ruslan; Evans, Richard; Orasan, Constantin (Springer, 2007)
    • Análisis de necesidades documentales y terminológicas de médicos y traductores médicos como base para el diseño de un diccionario multilingüe de nueva generación

      Corpas Pastor, Gloria; Roldán Juárez, Marina (Universitat Jaume I, 2014)
      En el presente trabajo se plantea el diseño de un recurso lexicográfico multilingüe orientado a médicos y traductores médicos. En la actualidad, no existe ningún recurso que satisfaga a ambos colectivos por igual, debido a que estos poseen necesidades muy diferentes. Sin embargo, partimos de la premisa de que se podría crear una herramienta única, modular, adaptable y flexible, que responda a sus diversas expectativas, necesidades y preferencias. Se parte para ello de un análisis de necesidades siguiendo el método empírico de recogida de datos en línea mediante una encuesta trilingüe.
    • Are citations from clinical trials evidence of higher impact research? An analysis of ClinicalTrials.gov

      Thelwall, Mike; Kousha, Kayvan (Springer, 2016-09-03)
      An important way in which medical research can translate into improved health outcomes is by motivating or influencing clinical trials that eventually lead to changes in clinical practice. Citations from clinical trials records to academic research may therefore serve as an early warning of the likely future influence of the cited articles. This paper partially assesses this hypothesis by testing whether prior articles referenced in ClinicalTrials.gov records are more highly cited than average for the publishing journal. The results from four high profile general medical journals support the hypothesis, although there may not be a cause-and effect relationship. Nevertheless, it is reasonable for researchers to use citations to their work from clinical trials records as partial evidence of the possible long-term impact of their research.
    • Are classic references cited first? An analysis of citation order within article sections

      Thelwall, Mike (Springer, 2020-12-31)
      Early citations within an article section may have an agenda-setting role but contribute little to the new research. To investigate whether this practice may be common, this article assesses whether the average impact of cited references is influenced by the order in which they are cited within article sections. This is tested on 1,683,299,868 citations to 41,068,375 unique journal articles from 1,470,209 research articles in the PubMed Open Access collection, split into 22 fields. The results show that the first cited article in the Introduction and Background have much higher average citation impacts than later articles, and the same is true to a lesser extent for the Discussion and Conclusion in most fields, but not the Methods and Results. The findings do not prove that early citations are less central to the citing article but nevertheless add to previous evidence suggesting that this practice may be widespread. It may therefore be useful to distinguish between initial introductory citations when evaluating citation impact, or to use impact indicators that implicitly or explicitly give less weight to the citation counts of highly cited articles.
    • Are Mendeley reader counts high enough for research evaluations when articles are published?

      Thelwall, Mike (Emerald, 2017-10-27)
      Purpose –Mendeley reader counts have been proposed as early indicators for the impact of academic publications. In response, this article assesses whether there are enough Mendeley readers for research evaluation purposes during the month when an article is first published. Design/methodology/approach – Average Mendeley reader counts were compared to average Scopus citation counts for 104520 articles from ten disciplines during the second half of 2016. Findings - Articles attracted, on average, between 0.1 and 0.8 Mendeley readers per article in the month in which they first appeared in Scopus. This is about ten times more than the average Scopus citation count. Research limitations/implications – Other subjects may use Mendeley more or less than the ten investigated here. The results are dependent on Scopus’s indexing practices, and Mendeley reader counts can be manipulated and have national and seniority biases. Practical implications – Mendeley reader counts during the month of publication are more powerful than Scopus citations for comparing the average impacts of groups of documents but are not high enough to differentiate between the impacts of typical individual articles. Originality/value - This is the first multi-disciplinary and systematic analysis of Mendeley reader counts from the publication month of an article.
    • Are Mendeley Reader Counts Useful Impact Indicators in all Fields?

      Thelwall, Mike (Springer, 2017-10-27)
      Reader counts from the social reference sharing site Mendeley are known to be valuable for early research evaluation. They have strong correlations with citation counts for journal articles but appear about a year before them. There are disciplinary differences in the value of Mendeley reader counts but systematic evidence is needed at the level of narrow fields to reveal its extent. In response, this article compares Mendeley reader counts with Scopus citation counts for journal articles from 2012 in 325 narrow Scopus fields. Despite strong positive correlations in most fields, averaging 0.671, the correlations in some fields are as weak as 0.255. Technical reasons explain most weaker correlations, suggesting that the underlying relationship is almost always strong. The exceptions are caused by unusually high educational or professional use or topics of interest within countries that avoid Mendeley. The findings suggest that if care is taken then Mendeley reader counts can be used for early citation impact evidence in almost all fields and for related impact in some of the remainder. As an additional application of the results, cross-checking with Mendeley data can be used to identify indexing anomalies in citation databases.
    • Are raw RSS feeds suitable for broad issue scanning? A science concern case study

      Thelwall, Mike; Prabowo, Rudy; Fairclough, Ruth (Wiley InterScience, 2006)
      Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 burst words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue.
    • Are Wikipedia citations important evidence of the impact of scholarly articles and books?

      Thelwall, Mike; Kousha, Kayvan (Wiley-Blackwell, 2016-06-13)
      Individual academics and research evaluators often need to assess the value of published research. Whilst citation counts are a recognised indicator of scholarly impact, alternative data is needed to provide evidence of other types of impact, including within education and wider society. Wikipedia is a logical choice for both of these because the role of a general encyclopaedia is to be an understandable repository of facts about a diverse array of topics and hence it may cite research to support its claims. To test whether Wikipedia could provide new evidence about the impact of scholarly research, this article counted citations to 302,328 articles and 18,735 monographs in English indexed by Scopus in the period 2005 to 2012. The results show that citations from Wikipedia to articles are too rare for most research evaluation purposes, with only 5% of articles being cited in all fields. In contrast, a third of monographs have at least one citation from Wikipedia, with the most in the arts and humanities. Hence, Wikipedia citations can provide extra impact evidence for academic monographs. Nevertheless, the results may be relatively easily manipulated and so Wikipedia is not recommended for evaluations affecting stakeholder interests.
    • Assessing the teaching value of non-English academic books: The case of Spain

      Mas Bleda, Amalia; Thelwall, Mike (Consejo Superior de Investigaciones Científicas, 2018-12-01)
      This study examines the educational value of 15,117 Spanish-language books published by Spanish publishers in social sciences and humanities fields in the period 2002-2011, based on mentions of them extracted automatically from online course syllabi. A method was developed to collect syllabus mentions and filter out false matches. Manual checks of the 52,716 syllabus mentions found estimated an accuracy of 99.5% for filtering out false mentions and 74.7% for identifying correct mentions. A fifth of the sampled books (2,849; 19%) were mentioned at least once in online syllabi and almost all (95%) were from a third of the publishers included in the study. An in-depth analysis of the 23 books recommended most often in online syllabi showed that they are mostly single-authored humanities monographs that were originally written in Spanish. The syllabus mentions originated from 379 domains, but mostly from Spanish university websites. In conclusion, it is possible to make indicators from online syllabus mentions to assess the teaching value of Spanish-language books, although manual checks are needed if the values ​​are used for assessing individual books.
    • Automated Web issue analysis: A nurse prescribing case study

      Thelwall, Mike; Thelwall, Saheeda; Fairclough, Ruth (Elsevier, 2006)
      Web issue analysis, a new automated technique designed to rapidly give timely management intelligence about a topic from an automated large-scale analysis of relevant pages from the Web, is introduced and demonstrated. The technique includes hyperlink and URL analysis to identify common direct and indirect sources of Web information. In addition, text analysis through natural language processing techniques is used identify relevant common nouns and noun phrases. A case study approach is taken, applying Web issue analysis to the topic of nurse prescribing. The results are presented in descriptive form and a qualitative analysis is used to argue that new information has been found. The nurse prescribing results demonstrate interesting new findings, such as the parochial nature of the topic in the UK, an apparent absence of similar concepts internationally, at least in the English-speaking world, and a significant concern with mental health issues. These demonstrate that automated Web issue analysis is capable of quickly delivering new insights into a problem. General limitations are that the success of Web issue analysis is dependant upon the particular topic chosen and the ability to find a phrase that accurately captures the topic and is not used in other contexts, as well as being language-specific.
    • Automatic multidocument summarization of research abstracts: Design and user evaluation

      Ou, Shiyan; Khoo, Christopher S.G.; Goh, Dion H. (Wiley, 2007)
      The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method - with or without the use of a taxonomy - were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.
    • Automatic question answering for medical MCQs: Can it go further than information retrieval?

      Ha, Le An; Yaneva, Viktoriya (RANLP, 2019-09-04)
      We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require training data. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.
    • Automatic summarisation: 25 years On

      Orăsan, Constantin (Cambridge University Press (CUP), 2019-09-19)
      Automatic text summarisation is a topic that has been receiving attention from the research community from the early days of computational linguistics, but it really took off around 25 years ago. This article presents the main developments from the last 25 years. It starts by defining what a summary is and how its definition changed over time as a result of the interest in processing new types of documents. The article continues with a brief history of the field and highlights the main challenges posed by the evaluation of summaries. The article finishes with some thoughts about the future of the field.
    • Avoiding obscure topics and generalising findings produces higher impact research

      Thelwall, Mike (Springer, 2016-10-11)
      Much academic research is never cited and may be rarely read, indicating wasted effort from the authors, referees and publishers. One reason that an article could be ignored is that its topic is, or appears to be, too obscure to be of wide interest, even if excellent scholarship produced it. This paper reports a word frequency analysis of 874,411 English article titles from 18 different Scopus natural, formal, life and health sciences categories 2009-2015 to assess the likelihood that research on obscure (rarely researched) topics is less cited. In all categories examined, unusual words in article titles associate with below average citation impact research. Thus, researchers considering obscure topics may wish to reconsider, generalise their study, or to choose a title that reflects the wider lessons that can be drawn. Authors should also consider including multiple concepts and purposes within their titles in order to attract a wider audience.
    • BDAFRICA: diseño e implementación de una base de datos de la literatura poscolonial africana publicada en España

      Fernández Ruiz, MR; Corpas Pastor, G; Seghiri, M (Universidad de Valladolid, 2016-01-10)
      Este trabajo demuestra que no existe un repositorio que incluya los autores poscoloniales africanos publicados hasta el momento en España y que permita, por ende, realizar estudios cuantitativos y cualitativos del impacto de esta literatura con la precisión deseable. Esto supone una carencia tanto para investigaciones académicas como para el sector editorial a la hora de analizar tendencias de selección y recepción en el mercado. Ante esta situación, el objetivo primordial de este trabajo es diseñar e implementar una base de datos, basada en MySQL y delimitada por unos parámetros muy concretos, que recoja todas las obras de autores africanos publicadas en castellano en España entre 1972 (año en que España se unió al sistema ISBN) y 2014. Tras determinar unos criterios de diseño y unos protocolos de compilación específcos, el desarrollo metodológico se ha dividido en cuatro fases: recopilación, almacenamiento, tratamiento y difusión de los datos. Así, la base de datos BDÁFRICA consigue un doble objetivo: por un lado, proporciona a los investigadores datos fables en los que basar sus estudios y, por otro, permitiría ofrecer por primera vez datos estadísticos de la evolución de la publicación de obras de autores africanos en España en los últimos 42 años.
    • Bilingual contexts from comparable corpora to mine for translations of collocations

      Taslimipoor, Shiva; Mitkov, Ruslan; Corpas Pastor, Gloria; Fazly, Afsaneh (Springer, 2018-03-21)
      Due to the limited availability of parallel data in many languages, we propose a methodology that benefits from comparable corpora to find translation equivalents for collocations (as a specific type of difficult-to-translate multi-word expressions). Finding translations is known to be more difficult for collocations than for words. We propose a method based on bilingual context extraction and build a word (distributional) representation model drawing on these bilingual contexts (bilingual English-Spanish contexts in our case). We show that the bilingual context construction is effective for the task of translation equivalent learning and that our method outperforms a simplified distributional similarity baseline in finding translation equivalents.