• $1 Internet - exploiting limited bandwidth in developing countries

      Heinz, Ignatz; Dennett, Christopher Paul (2008)
      INFONET-BioVision.org is an freely available internet based knowledge-management system, funded by the Liechtenstein Development Service (LDS) and the BioVision Foundation for Environment and Development, that offers Kenyan farmers information on affordable, effective and ecologically sound technologies in crop and livestock production as well as environmental and human health. One of the challenges faced by the project is the secure provision of information to the rural areas that would most benefit from advice on crop pests and productivity [Avallain, 2008] . Bandwidth is sometimes available in these areas, but is limited, unmanaged and relatively expensive. This paper discusses current work in the development of a novel system that brings together hardware and software to make better use of available bandwidth, whilst offering a financially viable and sustainable method of extending internet provision to these hard-to-reach areas, providing rural farmers with access to the INFONET-BioVision platform and other internet based sources of information. The system currently in development is premised on the fact that some internet based applications require more bandwidth than others. Moreover, their real-time requirements differ greatly. Although it is conceivable that a number of users can share low-bandwidth connections, the multiple bandwidth requests created can easily overwhelm the connection due to the way in which these are managed by protocols developed for bandwidth-rich countries. This results in virtually no bandwidth availability for the user applications themselves. It is clear, therefore, that to maximise the number of users on one low-bandwidth connection, allocation should take place before applications actually make the bandwidth requests. Indeed, similar bandwidth management exists on a larger scale, with domestic broadband providers controlling the amount of data provided through existing channels to a home user at the exchange, based on a tariff system. The system effectively applies a scaled down version of this scenario to available connectivity, be that GPRS, satellite or wired. An inexpensive single-board computer acts as a hub between users and internet, allowing software management of bandwidth and connectivity to users mobile devices through Wi-Fi, Bluetooth and wired LAN. The allocation of bandwidth to each user is based on a voucher system that effectively splits the cost of the connection. Users purchase these vouchers, which are priced according to usage, ranging from very low, none real-time e-mail access to more expensive web-browsing, prior to accessing the system. The proposed system is intended to provide communities with inexpensive connectivity through shared costs which is scaleable, such that, should there be a requirement for extra provision of bandwidth or number of users, subsequent devices can be added or moved simply, easily and at low cost. The system is scheduled for testing later in 2008, at which point a full evaluation will be undertaken.
    • A comparison of sources of links for academic Web impact factor calculations

      Thelwall, Mike (MCB UP Ltd, 2002)
      There has been much recent interest in extracting information from collections of Web links. One tool that has been used is Ingwersen¿s Web impact factor. It has been demonstrated that several versions of this metric can produce results that correlate with research ratings of British universities showing that, despite being a measure of a purely Internet phenomenon, the results are susceptible to a wider interpretation. This paper addresses the question of which is the best possible domain to count backlinks from, if research is the focus of interest. WIFs for British universities calculated from several different source domains are compared, primarily the .edu, .ac.uk and .uk domains, and the entire Web. The results show that all four areas produce WIFs that correlate strongly with research ratings, but that none produce incontestably superior figures. It was also found that the WIF was less able to differentiate in more homogeneous subsets of universities, although positive results are still possible.
    • A comparison of title words for journal articles and Wikipedia pages: Coverage and stylistic differences?

      Thelwall, Mike; Sud, Pardeep (La Fundación Española para la Ciencia y la Tecnología (FECYT), 2018-02-12)
      This article assesses whether there are gaps in Wikipedia’s coverage of academic information and whether there are non-obvious stylistic differences from academic journal articles that Wikipedia users and editors should be aware of. For this, it analyses terms in the titles of journal articles that are absent from all English Wikipedia page titles for each of 27 Scopus subject categories. The results show that English Wikipedia has lower coverage of issues of interest to non-English nations and there are gaps probably caused by a lack of willing subject specialist editors in some areas. There were also stylistic disciplinary differences in the results, with some fields using synonyms of “analysing” that were ignored in Wikipedia, and others using the present tense in titles to emphasise research outcomes. Since Wikipedia is broadly effective at covering academic research topics from all disciplines, it might be relied upon by non-specialists. Specialists should therefore check for coverage gaps within their areas for useful topics and librarians should caution users that important topics may be missing.
    • A decade of Garfield readers

      Thelwall, Mike (Springer, 2017-11-30)
      This brief note discusses Garfield’s continuing influence from the perspective of the Mendeley readers of his articles. This reflects the direct impact of his work since the launch of Mendeley in August 2008. In the last decade, his work is still extensively read by younger scientists, especially in computer and information sciences and the social sciences, and with a broad international spread. His work on citation indexes, impact factors and science history tracking seems to have the most contemporary relevance.
    • A flexible framework for collocation retrieval and translation from parallel and comparable corpora

      Rivera, Oscar Mendoza; Mitkov, Ruslan; Corpas Pastor, Gloria (John Benjamins, 2018)
      This paper outlines a methodology and a system for collocation retrieval and translation from parallel and comparable corpora. The methodology was developed with translators and language learners in mind. It is based on a phraseology framework, applies statistical techniques, and employs source tools and online resources. The collocation retrieval and translation has proved successful for English and Spanish and can be easily adapted to other languages. The evaluation results are promising and future goals are proposed. Furthermore, conclusions are drawn on the nature of comparable corpora and how they can be better exploited to suit particular needs of target users.
    • A framework for named entity recognition in the open domain

      Evans, Richard (John Benjamins Publishing Company, 2004)
    • A Free Database of University Web Links: Data Collection Issues

      Thelwall, Mike (2003)
      This paper describes and gives access to a database of the link structures of 109 UK university and higher education college websites, as created by a specialist information science web crawler in June and July of 2001. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.This paper describes a free set of databases of the link structures of the university web sites from a selection of countries, as created by a specialist information science web crawler. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.
    • A High Precision Information Retrieval Method for WiQA

      Orasan, Constantin; Puşcaşu, Georgiana (Springer, 2007)
      This paper presents Wolverhampton University’s participation in the WiQA competition. The method chosen for this task combines a high precision, but low recall information retrieval approach with a greedy sentence ranking algorithm. The high precision retrieval is ensured by querying the search engine with the exact topic, in this way obtaining only sentences which contain the topic. In one of the runs, the set of retrieved sentences is expanded using coreferential relations between sentences. The greedy algorithm used for ranking selects one sentence at a time, always the one which adds most information to the set of sentences without repeating the existing information too much. The evaluation revealed that it achieves a performance similar to other systems participating in the competition and that the run which uses coreference obtains the highest MRR score among all the participants.
    • A layered approach for investigating the topological structure of communities in the Web.

      Thelwall, Mike (MCB UP Ltd, 2003)
      A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
    • A neuro-inspired visual tracking method based on programmable system-on-chip platform

      Yang, Shufan; Wong-Lin, KongFatt; Andrew, James; Mak, Terrence; McGinnity, T. Martin (Springer, 2017-01-20)
      Using programmable system-on-chip to implement computer vision functions poses many challenges due to highly constrained resources in cost, size and power consumption. In this work, we propose a new neuro-inspired image processing model and implemented it on a system-on-chip Xilinx Z702c board. With the attractor neural network model to store the object’s contour information, we eliminate the computationally expensive steps in the curve evolution re-initialisation at every new iteration or frame. Our experimental results demonstrate that this integrated approach achieves accurate and robust object tracking, when they are partially or completely occluded in the scenes. Importantly, the system is able to process 640 by 480 videos in real-time stream with 30 frames per second using only one low-power Xilinx Zynq-7000 system-on-chip platform. This proof-of-concept work has demonstrated the advantage of incorporating neuro-inspired features in solving image processing problems during occlusion.
    • A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method

      Mitkov, Ruslan; Evans, Richard; Orasan, Constantin (Springer, 2002)
      This paper describes a new, advanced and completely revamped version of Mitkov's knowledge-poor approach to pronoun resolution. In contrast to most anaphora resolution approaches, the new system, referred to as MARS, operates in fully automatic mode. It benefits from purpose-built programs for identifying occurrences of non-nominal anaphora (including pleonastic pronouns) and for recognition of animacy, and employs genetic algorithms to achieve optimal performance. The paper features extensive evaluation and discusses important evaluation issues in anaphora resolution.
    • A research and institutional size-based model for national university Web site interlinking

      Thelwall, Mike (MCB UP Ltd, 2002)
      Web links are a phenomenon of interest to bibliometricians by analogy with citations, and to others because of their use in Web navigation and search engines. It is known that very few links on university Web sites are targeted at scholarly expositions and yet, at least in the UK and Australia, a correlation has been established between link count metrics for universities and measures of institutional research. This paper operates on a finer-grained level of detail, focussing on counts of links between pairs of universities. It provides evidence of an underlying linear relationship with the quadruple product of the size and research quality of both source and target institution. This simple model is proposed as applying generally to national university systems, subject to a series of constraints to identify cases where it is unlikely to be applicable. It is hoped that the model, if confirmed by studies of other countries, will open the door to deeper mining of academic Web link data.
    • A Single Chip System for Sensor Data Fusion Based on a Drift-diffusion Model

      Yang, Shufan; Wong-Lin, Kongfatt; Rano, Inaki; Lindsay, Anthony (IEEE, 2017-09-07)
      Current multisensory system face data communication overhead in integrating disparate sensor data to build a coherent and accurate global phenomenon. We present here a novel hardware and software co-design platform for a heterogeneous data fusion solution based on a perceptual decision making approach (the drift-diffusion model). It provides a convenient infrastructure for sensor data acquisition and data integration and only uses a single chip Xilinx ZYNQ-7000 XC7Z020 AP SOC. A case study of controlling the moving speed of a single ground-based robot, according to physiological states of the operator based on heart rates, is conducted and demonstrates the possibility of integrated sensor data fusion architecture. The results of our DDM-based data integration shows a better correlation coefficient with the raw ECG signal compare with a simply piecewise approach.
    • Academia.edu: Social network or Academic Network?

      Thelwall, Mike; Kousha, Kayvan (Wiley, 2014-03-12)
      Academic social network sites Academia.edu and ResearchGate, and reference sharing sites Mendeley, Bibsonomy, Zotero, and CiteULike, give scholars the ability to publicize their research outputs and connect with each other. With millions of users, these are a significant addition to the scholarly communication and academic information‐seeking eco‐structure. There is thus a need to understand the role that they play and the changes, if any, that they can make to the dynamics of academic careers. This article investigates attributes of philosophy scholars on Academia.edu, introducing a median‐based, time‐normalizing method to adjust for time delays in joining the site. In comparison to students, faculty tend to attract more profile views but female philosophers did not attract more profile views than did males, suggesting that academic capital drives philosophy uses of the site more than does friendship and networking. Secondary analyses of law, history, and computer science confirmed the faculty advantage (in terms of higher profile views) except for females in law and females in computer science. There was also a female advantage for both faculty and students in law and computer science as well as for history students. Hence, Academia.edu overall seems to reflect a hybrid of scholarly norms (the faculty advantage) and a female advantage that is suggestive of general social networking norms. Finally, traditional bibliometric measures did not correlate with any Academia.edu metrics for philosophers, perhaps because more senior academics use the site less extensively or because of the range informal scholarly activities that cannot be measured by bibliometric methods.
    • Academic information on Twitter: A user survey

      Mohammadi, Ehsan; Thelwall, Mike; Kwasny, Mary; Holmes, Kristi L. (PLOS, 2018-05-17)
      Although counts of tweets citing academic papers are used as an informal indicator of interest, little is known about who tweets academic papers and who uses Twitter to find scholarly information. Without knowing this, it is difficult to draw useful conclusions from a publication being frequently tweeted. This study surveyed 1,912 users that have tweeted journal articles to ask about their scholarly-related Twitter uses. Almost half of the respondents (45%) did not work in academia, despite the sample probably being biased towards academics. Twitter was used most by people with a social science or humanities background. People tend to leverage social ties on Twitter to find information rather than searching for relevant tweets. Twitter is used in academia to acquire and share real-time information and to develop connections with others. Motivations for using Twitter vary by discipline, occupation, and employment sector, but not much by gender. These factors also influence the sharing of different types of academic information. This study provides evidence that Twitter plays a significant role in the discovery of scholarly information and cross-disciplinary knowledge spreading. Most importantly, the large numbers of non-academic users support the claims of those using tweet counts as evidence for the non-academic impacts of scholarly research
    • The Accuracy of Confidence Intervals for Field Normalised Indicators

      Thelwall, Mike; Fairclough, Ruth (Elsevier, 2017-04-07)
      When comparing the average citation impact of research groups, universities and countries, field normalisation reduces the influence of discipline and time. Confidence intervals for these indicators can help with attempts to infer whether differences between sets of publications are due to chance factors. Although both bootstrapping and formulae have been proposed for these, their accuracy is unknown. In response, this article uses simulated data to systematically compare the accuracy of confidence limits in the simplest possible case, a single field and year. The results suggest that the MNLCS (Mean Normalised Log-transformed Citation Score) confidence interval formula is conservative for large groups but almost always safe, whereas bootstrap MNLCS confidence intervals tend to be accurate but can be unsafe for smaller world or group sample sizes. In contrast, bootstrap MNCS (Mean Normalised Citation Score) confidence intervals can be very unsafe, although their accuracy increases with sample sizes.
    • Adults with High-functioning Autism Process Web Pages With Similar Accuracy but Higher Cognitive Effort Compared to Controls

      Yaneva, Victoria; Ha, Le; Eraslan, Sukru; Yesilada, Yeliz (ACM, 2019-05-13)
      To accommodate the needs of web users with high-functioning autism, a designer's only option at present is to rely on guidelines that: i) have not been empirically evaluated and ii) do not account for the di erent levels of autism severity. Before designing effective interventions, we need to obtain an empirical understanding of the aspects that speci c user groups need support with. This has not yet been done for web users at the high ends of the autism spectrum, as often they appear to execute tasks effortlessly, without facing barriers related to their neurodiverse processing style. This paper investigates the accuracy and efficiency with which high-functioning web users with autism and a control group of neurotypical participants obtain information from web pages. Measures include answer correctness and a number of eye-tracking features. The results indicate similar levels of accuracy for the two groups at the expense of efficiency for the autism group, showing that the autism group invests more cognitive effort in order to achieve the same results as their neurotypical counterparts.
    • Aggressive language identification using word embeddings and sentiment features

      Orasan, Constantin (Association for Computational Linguistics, 2018-06-25)
      This paper describes our participation in the First Shared Task on Aggression Identification. The method proposed relies on machine learning to identify social media texts which contain aggression. The main features employed by our method are information extracted from word embeddings and the output of a sentiment analyser. Several machine learning methods and different combinations of features were tried. The official submissions used Support Vector Machines and Random Forests. The official evaluation showed that for texts similar to the ones in the training dataset Random Forests work best, whilst for texts which are different SVMs are a better choice. The evaluation also showed that despite its simplicity the method performs well when compared with more elaborated methods.
    • All that Glitters is not Gold when Translating Phraseological Units

      Corpas Pastor, Gloria; Monti, Johanna; Mitkov, Ruslan; Corpas Pastor, Gloria; Seretan, Violeta (European Association for Machine Translation (EAMT), 2013-09-02)
      Phraseological unit is an umbrella term which covers a wide range of multi-word units (collocations, idioms, proverbs, routine formulae, etc.). Phraseological units (PUs) are pervasive in all languages and exhibit a peculiar combinatorial nature. PUs are usually frequent, cognitively salient, syntactically frozen and/or semantically opaque. Besides, their creative manipulations in discourse can be anything but predictable, straightforward or easy to process. And when it comes to translating, problems multiply exponentially. It goes without saying that cultural differences and linguistic anisomorphisms go hand in hand with issues arising from varying degrees of equivalence at the levels of system and text. No wonder PUs have been considered a pain in the neck within the NLP community. This presentation will focus on contrastive and translational features of phraseological units. It will consist of three parts. As a convenient background, the first part will contrast two similar concepts: multi-word unit (the preferred term within the NLP community) versus phraseological unit (the preferred term in phraseology). The second part will deal with phraseological systems in general, their structure and functioning. Finally, the third part will adopt a contrastive approach, with especial reference to translators’ strategies, procedures and choices. For good or for bad, when it comes to rendering phraseological units, human translation and computer-assisted translation appear to share the same garden path.
    • An evaluation of syntactic simplification rules for people with autism

      Evans, Richard; Orasan, Constantin; Dornescu, Iustin (Association for Computational Linguistics, 2014)
      Syntactically complex sentences constitute an obstacle for some people with Autistic Spectrum Disorders. This paper evaluates a set of simplification rules specifically designed for tackling complex and compound sentences. In total, 127 different rules were developed for the rewriting of complex sentences and 56 for the rewriting of compound sentences. The evaluation assessed the accuracy of these rules individually and revealed that fully automatic conversion of these sentences into a more accessible form is not very reliable.