• An initial exploration of the link relationship between UK university Web sites.

      Thelwall, Mike (MCB UP Ltd, 2002)
      Aggregates of links are of interest to information scientists in the same way as citation counts are: as potential sources of data from which new knowledge can be mined. Builds on the recent discovery of a correlation between a Web link count measure and the research quality of British universities by applying a range of multivariate statistical techniques to counts of links between pairs of universities. This represents an initial attempt at developing an understanding of this phenomenon. Extracts plausible results. Also identifies outliers in the data by the techniques, some of which were verified by being tracked down to identifiable Web phenomena. This is an important outcome because successful anomaly identification is a precondition to more effective analysis of this kind of data. The identification of groupings is encouraging evidence that Web links between universities can be mined for significant results, although it is clear that more methodological development is needed, if any but the simplest patterns are to be extracted. Finally, based upon the types of patterns extracted, argues that none of the methods used are capable of fully analysing link structures on their own.
    • An intelligible implementation of FastSLAM2.0 on a low-power embedded architecture

      Jiménez Serrata, Albert A.; Yang, Shufan; Li, Renfa (Springer, 2017-03-02)
      The simultaneous localisation and mapping (SLAM) algorithm has drawn increasing interests in autonomous robotic systems. However, SLAM has not been widely explored in embedded system design spaces yet due to the limitation of processing recourses in embedded systems. Especially when landmarks are not identifiable, the amount of computer processing will dramatically increase due to unknown data association. In this work, we propose an intelligible SLAM solution for an embedded processing platform to reduce computer processing time using a low-variance resampling technique. Our prototype includes a low-cost pixy camera, a Robot kit with L298N motor board and Raspberry Pi V2.0. Our prototype is able to recognise artificial landmarks in a real environment with an average 75% of identified landmarks in corner detection and corridor detection with only average 1.14 W.
    • An investigation of the online presence of UK universities on Instagram

      Stuart, Emma; Stuart, David; Thelwall, Mike (Emerald, 2017-08-01)
      Purpose – Rising tuition fees and a growing importance on league tables has meant that university branding is becoming more of a necessity to attract prospective staff, students, and funding. Whilst university websites are an important branding tool, academic institutions are also beginning to exploit social media. Image-based social media services such as Instagram are particularly popular at the moment. It is therefore logical for universities to have a presence on popular image-based social media services such as Instagram. This paper investigates the online presence of UK universities on Instagram in an initial investigation of use. Design/Methodology/Approach – This study utilizes webometric data collection, and content analysis methodology. Findings – The results indicate that at the time of data analysis for this investigation (Spring, 2015), UK universities had a limited presence on Instagram for general university accounts, with only 51 out of 128 institutions having an account. The most common types of images posted were humanizing (31.0%), showcasing (28.8%), and orienting (14.3%). Orienting images were more likely to receive likes than other image types, and crowdsourcing images were more likely to receive comments. Originality/Value – This paper gives a valuable insight into the image posting practices of UK universities on Instagram. The findings are of value to heads of marketing, online content creators, social media campaign managers, and anyone who is responsible for the marketing, branding, and promoting of a university’s services.
    • Anaphora Resolution

      Mitkov, Ruslan (Longman, 2002)
    • Anaphora Resolution: To What Extent Does It Help NLP Applications?

      Mitkov, Ruslan; Evans, Richard; Orasan, Constantin (Springer, 2007)
    • Análisis de necesidades documentales y terminológicas de médicos y traductores médicos como base para el diseño de un diccionario multilingüe de nueva generación

      Corpas Pastor, Gloria; Roldán Juárez, Marina (Universitat Jaume I, 2014)
      En el presente trabajo se plantea el diseño de un recurso lexicográfico multilingüe orientado a médicos y traductores médicos. En la actualidad, no existe ningún recurso que satisfaga a ambos colectivos por igual, debido a que estos poseen necesidades muy diferentes. Sin embargo, partimos de la premisa de que se podría crear una herramienta única, modular, adaptable y flexible, que responda a sus diversas expectativas, necesidades y preferencias. Se parte para ello de un análisis de necesidades siguiendo el método empírico de recogida de datos en línea mediante una encuesta trilingüe.
    • Arabic-SOS: Segmentation, stemming, and orthography standardization for classical and pre-modern standard Arabic

      Mohamed, Emad; Sayed, Zeeshan (ACM, 2019-05-31)
      While morphological segmentation has always been a hot topic in Arabic, due to the morphological complexity of the language and the orthography, most effort has focused on Modern Standard Arabic. In this paper, we focus on pre-MSA texts. We use the Gradient Boosting algorithm to train a morphological segmenter with a corpus derived from Al-Manar, a late 19th/early 20th century magazine that focused on the Arabic and Islamic heritage. Since most of the cultural heritage Arabic available suffers from substandard orthography, we have trained a machine learner to standardize the text. Our segmentation accuracy reaches 98.47%, and the orthography standardization an F-macro of 0.98 and an F-micro of 0.99. We also produce stemming as a by-product of segmentation.
    • Are citations from clinical trials evidence of higher impact research? An analysis of ClinicalTrials.gov

      Thelwall, Mike; Kousha, Kayvan (Springer, 2016-09-03)
      An important way in which medical research can translate into improved health outcomes is by motivating or influencing clinical trials that eventually lead to changes in clinical practice. Citations from clinical trials records to academic research may therefore serve as an early warning of the likely future influence of the cited articles. This paper partially assesses this hypothesis by testing whether prior articles referenced in ClinicalTrials.gov records are more highly cited than average for the publishing journal. The results from four high profile general medical journals support the hypothesis, although there may not be a cause-and effect relationship. Nevertheless, it is reasonable for researchers to use citations to their work from clinical trials records as partial evidence of the possible long-term impact of their research.
    • Are classic references cited first? An analysis of citation order within article sections

      Thelwall, Mike (Springer, 2019-06-07)
      Early citations within an article section may have an agenda-setting role but contribute little to the new research. To investigate whether this practice may be common, this article assesses whether the average impact of cited references is influenced by the order in which they are cited within article sections. This is tested on 1,683,299,868 citations to 41,068,375 unique journal articles from 1,470,209 research articles in the PubMed Open Access collection, split into 22 fields. The results show that the first cited article in the Introduction and Background have much higher average citation impacts than later articles, and the same is true to a lesser extent for the Discussion and Conclusion in most fields, but not the Methods and Results. The findings do not prove that early citations are less central to the citing article but nevertheless add to previous evidence suggesting that this practice may be widespread. It may therefore be useful to distinguish between initial introductory citations when evaluating citation impact, or to use impact indicators that implicitly or explicitly give less weight to the citation counts of highly cited articles.
    • Are Mendeley reader counts high enough for research evaluations when articles are published?

      Thelwall, Mike (Emerald Publishing Limited, 2017-10-27)
      Purpose –Mendeley reader counts have been proposed as early indicators for the impact of academic publications. In response, this article assesses whether there are enough Mendeley readers for research evaluation purposes during the month when an article is first published. Design/methodology/approach – Average Mendeley reader counts were compared to average Scopus citation counts for 104520 articles from ten disciplines during the second half of 2016. Findings - Articles attracted, on average, between 0.1 and 0.8 Mendeley readers per article in the month in which they first appeared in Scopus. This is about ten times more than the average Scopus citation count. Research limitations/implications – Other subjects may use Mendeley more or less than the ten investigated here. The results are dependent on Scopus’s indexing practices, and Mendeley reader counts can be manipulated and have national and seniority biases. Practical implications – Mendeley reader counts during the month of publication are more powerful than Scopus citations for comparing the average impacts of groups of documents but are not high enough to differentiate between the impacts of typical individual articles. Originality/value - This is the first multi-disciplinary and systematic analysis of Mendeley reader counts from the publication month of an article.
    • Are Mendeley Reader Counts Useful Impact Indicators in all Fields?

      Thelwall, Mike (Springer, 2017-10-27)
      Reader counts from the social reference sharing site Mendeley are known to be valuable for early research evaluation. They have strong correlations with citation counts for journal articles but appear about a year before them. There are disciplinary differences in the value of Mendeley reader counts but systematic evidence is needed at the level of narrow fields to reveal its extent. In response, this article compares Mendeley reader counts with Scopus citation counts for journal articles from 2012 in 325 narrow Scopus fields. Despite strong positive correlations in most fields, averaging 0.671, the correlations in some fields are as weak as 0.255. Technical reasons explain most weaker correlations, suggesting that the underlying relationship is almost always strong. The exceptions are caused by unusually high educational or professional use or topics of interest within countries that avoid Mendeley. The findings suggest that if care is taken then Mendeley reader counts can be used for early citation impact evidence in almost all fields and for related impact in some of the remainder. As an additional application of the results, cross-checking with Mendeley data can be used to identify indexing anomalies in citation databases.
    • Are raw RSS feeds suitable for broad issue scanning? A science concern case study

      Thelwall, Mike; Prabowo, Rudy; Fairclough, Ruth (Wiley InterScience, 2006)
      Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 burst words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue.
    • Are Wikipedia citations important evidence of the impact of scholarly articles and books?

      Thelwall, Mike; Kousha, Kayvan (Wiley-Blackwell, 2016-06-13)
      Individual academics and research evaluators often need to assess the value of published research. Whilst citation counts are a recognised indicator of scholarly impact, alternative data is needed to provide evidence of other types of impact, including within education and wider society. Wikipedia is a logical choice for both of these because the role of a general encyclopaedia is to be an understandable repository of facts about a diverse array of topics and hence it may cite research to support its claims. To test whether Wikipedia could provide new evidence about the impact of scholarly research, this article counted citations to 302,328 articles and 18,735 monographs in English indexed by Scopus in the period 2005 to 2012. The results show that citations from Wikipedia to articles are too rare for most research evaluation purposes, with only 5% of articles being cited in all fields. In contrast, a third of monographs have at least one citation from Wikipedia, with the most in the arts and humanities. Hence, Wikipedia citations can provide extra impact evidence for academic monographs. Nevertheless, the results may be relatively easily manipulated and so Wikipedia is not recommended for evaluations affecting stakeholder interests.
    • Assessing the teaching value of non-English academic books: The case of Spain

      Mas Bleda, Amalia; Thelwall, Mike (Consejo Superior de Investigaciones Científicas, 2018-12-01)
      This study examines the educational value of 15,117 Spanish-language books published by Spanish publishers in social sciences and humanities fields in the period 2002-2011, based on mentions of them extracted automatically from online course syllabi. A method was developed to collect syllabus mentions and filter out false matches. Manual checks of the 52,716 syllabus mentions found estimated an accuracy of 99.5% for filtering out false mentions and 74.7% for identifying correct mentions. A fifth of the sampled books (2,849; 19%) were mentioned at least once in online syllabi and almost all (95%) were from a third of the publishers included in the study. An in-depth analysis of the 23 books recommended most often in online syllabi showed that they are mostly single-authored humanities monographs that were originally written in Spanish. The syllabus mentions originated from 379 domains, but mostly from Spanish university websites. In conclusion, it is possible to make indicators from online syllabus mentions to assess the teaching value of Spanish-language books, although manual checks are needed if the values ​​are used for assessing individual books.
    • Attention: there is an inconsistency between android permissions and application metadata!

      Alecakir, Huseyin; Can, Burcu; Sen, Sevil (Springer Science and Business Media LLC, 2021-01-07)
      Since mobile applications make our lives easier, there is a large number of mobile applications customized for our needs in the application markets. While the application markets provide us a platform for downloading applications, it is also used by malware developers in order to distribute their malicious applications. In Android, permissions are used to prevent users from installing applications that might violate the users’ privacy by raising their awareness. From the privacy and security point of view, if the functionality of applications is given in sufficient detail in their descriptions, then the requirement of requested permissions could be well-understood. This is defined as description-to-permission fidelity in the literature. In this study, we propose two novel models that address the inconsistencies between the application descriptions and the requested permissions. The proposed models are based on the current state-of-art neural architectures called attention mechanisms. Here, we aim to find the permission statement words or sentences in app descriptions by using the attention mechanism along with recurrent neural networks. The lack of such permission statements in application descriptions creates a suspicion. Hence, the proposed approach could assist in static analysis techniques in order to find suspicious apps and to prioritize apps for more resource intensive analysis techniques. The experimental results show that the proposed approach achieves high accuracy.
    • Autism and the web: using web-searching tasks to detect autism and improve web accessibility

      Yaneva, Victoria (Association for Computing Machinery (ACM), 2018-08-02)
      People with autism consistently exhibit different attention-shifting patterns compared to neurotypical people. Research has shown that these differences can be successfully captured using eye tracking. In this paper, we summarise our recent research on using gaze data from web-related tasks to address two problems: improving web accessibility for people with autism and detecting autism automatically. We first examine the way a group of participants with autism and a control group process the visual information from web pages and provide empirical evidence of different visual searching strategies. We then use these differences in visual attention, to train a machine learning classifier which can successfully use the gaze data to distinguish between the two groups with an accuracy of 0.75. At the end of this paper we review the way forward to improving web accessibility and automatic autism detection, as well as the practical implications and alternatives for using eye tracking in these research areas.
    • Autism detection based on eye movement sequences on the web: a scanpath trend analysis approach

      Eraslan, Sukru; Yesilada, Yeliz; Yaneva, Victoria; Harper, Simon; Duarte, Carlos; Drake, Ted; Hwang, Faustina; Lewis, Clayton (ACM, 2020-04-20)
      Autism diagnostic procedure is a subjective, challenging and expensive procedure and relies on behavioral, historical and parental report information. In our previous, we proposed a machine learning classifier to be used as a potential screening tool or used in conjunction with other diagnostic methods, thus aiding established diagnostic methods. The classifier uses eye movements of people on web pages but it only considers non-sequential data. It achieves the best accuracy by combining data from several web pages and it has varying levels of accuracy on different web pages. In this present paper, we investigate whether it is possible to detect autism based on eye-movement sequences and achieve stable accuracy across different web pages to be not dependent on specific web pages. We used Scanpath Trend Analysis (STA) which is designed for identifying a trending path of a group of users on a web page based on their eye movements. We first identify trending paths of people with autism and neurotypical people. To detect whether or not a person has autism, we calculate the similarity of his/her path to the trending paths of people with autism and neurotypical people. If the path is more similar to the trending path of neurotypical people, we classify the person as a neurotypical person. Otherwise, we classify her/him as a person with autism. We systematically evaluate our approach with an eye-tracking dataset of 15 verbal and highly-independent people with autism and 15 neurotypical people on six web pages. Our evaluation shows that the STA approach performs better on individual web pages and provides more stable accuracy across different pages.
    • Automated Web issue analysis: A nurse prescribing case study

      Thelwall, Mike; Thelwall, Saheeda; Fairclough, Ruth (Elsevier, 2006)
      Web issue analysis, a new automated technique designed to rapidly give timely management intelligence about a topic from an automated large-scale analysis of relevant pages from the Web, is introduced and demonstrated. The technique includes hyperlink and URL analysis to identify common direct and indirect sources of Web information. In addition, text analysis through natural language processing techniques is used identify relevant common nouns and noun phrases. A case study approach is taken, applying Web issue analysis to the topic of nurse prescribing. The results are presented in descriptive form and a qualitative analysis is used to argue that new information has been found. The nurse prescribing results demonstrate interesting new findings, such as the parochial nature of the topic in the UK, an apparent absence of similar concepts internationally, at least in the English-speaking world, and a significant concern with mental health issues. These demonstrate that automated Web issue analysis is capable of quickly delivering new insights into a problem. General limitations are that the success of Web issue analysis is dependant upon the particular topic chosen and the ability to find a phrase that accurately captures the topic and is not used in other contexts, as well as being language-specific.
    • Automatic multidocument summarization of research abstracts: Design and user evaluation

      Ou, Shiyan; Khoo, Christopher S.G.; Goh, Dion H. (Wiley, 2007)
      The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method - with or without the use of a taxonomy - were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.
    • Automatic question answering for medical MCQs: Can it go further than information retrieval?

      Ha, Le An; Yaneva, Viktoriya (RANLP, 2019-09-04)
      We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require training data. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.