Browsing Research Institute in Information and Language Processing by Subjects
Now showing items 1-4 of 4
A layered approach for investigating the topological structure of communities in the Web.A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
Evidence for the existence of geographic trends in university web site interlinkingThe Web is an important medium for scholarly communication of various types, perhaps eventually to replace entirely some traditional mechanisms such as print journals. Yet the Web analogy of citations, hyperlinks, are much more varied in use and existing citation techniques are difficult to generalise to the new medium. In this context, one new challenging object of study is the modern multi-faceted, multi-genre, partly unregulated university Web site. This paper develops a methodology to analyse the patterns of interlinking between university Web sites and uses it to indicate that the degree of interlinking decreases with distance, at least in the UK. This is perhaps not in itself a surprising result, despite claims of a paradigm shift from the traditional virtual college towards collaboratories, but the methodology developed can also be used to refine existing Web link metrics to produce more powerful tools for comparing groups of sites.
Hyperlinks as a data source for science mappingHyperlinks between academic web sites, like citations, can potentially be used to map disciplinary structures and identify evidence of connections between disciplines. In this paper we classified a sample of links originating in three different disciplines: maths, physics and sociology. Links within a discipline were found to be different in character to links between pages in different disciplines. There were also disciplinary differences in both types of link. As a consequence, we argue that interpretations of web science maps covering multiple disciplines will need to be sensitive to the contexts of the links mapped.
Motivations for academic web site interlinking: evidence for the Web as a novel source of information on informal scholarly communicationThe need to understand authors’ motivations for creating links between university web sites is addressed by a survey of a random collection of 414 such links from the ac.uk domain. A classification scheme was created and applied to this collection. Obtaining inter-classifier agreement as to the single main link creation cause was very difficult because of multiple potential motivations and the fluidity of genre on the Web. Nevertheless, it was clear that, whilst the vast majority, over 90%, was created for broadly scholarly reasons, only two were equivalent to journal citations. It is concluded that academic web link metrics will be dominated by a range of informal types of scholarly communication. Since formal communication can be extensively studied through citation analysis, this provides an exciting new window through which to investigate a facet of a previously obscured type of communication activity.