• Custom interfaces for advanced queries in search engines

      Thelwall, Mike; Binns, Ray; Harries, Gareth; Page-Kennedy, Theresa; Price, Liz; Wilkinson, David (MCB UP Ltd, 2001)
      Those seeking information from the Internet often start from a search engine, using either its organised directory structure or its text query facility. In response to the difficulty in identifying the most relevant pages for some information needs, many search engines offer Boolean text matching and some, including Google, AltaVista and HotBot, offer the facility to integrate additional information into a more advanced request. Amongst web users, however, it is known that the employment of complex enquiries is far from universal, with very short queries being the norm. It is demonstrated that the gap between the provision of advanced search facilities and their use can be bridged, for specific information needs, by the construction of a simple interface in the form of a website that automatically formulates the necessary requests. It is argued that this kind of resource, perhaps employing additional knowledge domain specific information, is one that could be useful for websites or portals of common interest groups. The approach is illustrated by a website that enables a user to search the individual websites of university level institutions in European Union associated countries.
    • Finding similar academic Web sites with links, bibliometric couplings and colinks

      Thelwall, Mike; Wilkinson, David (Elsevier, 2004)
      A common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
    • Graph structure in three national academic Webs: Power laws with anomalies

      Thelwall, Mike; Wilkinson, David (Wiley, 2003)
      The graph structures of three national university publicly indexable Webs from Australia, New Zealand, and the UK were analyzed. Strong scale-free regularities for page indegrees, outdegrees, and connected component sizes were in evidence, resulting in power laws similar to those previously identified for individual university Web sites and for the AltaVista-indexed Web. Anomalies were also discovered in most distributions and were tracked down to root causes. As a result, resource driven Web sites and automatically generated pages were identified as representing a significant break from the assumptions of previous power law models. It follows that attempts to track average Web linking behavior would benefit from using techniques to minimize or eliminate the impact of such anomalies.
    • Hyperlinks as a data source for science mapping

      Harries, Gareth; Wilkinson, David; Price, Liz; Fairclough, Ruth; Thelwall, Mike (Sage, 2004)
      Hyperlinks between academic web sites, like citations, can potentially be used to map disciplinary structures and identify evidence of connections between disciplines. In this paper we classified a sample of links originating in three different disciplines: maths, physics and sociology. Links within a discipline were found to be different in character to links between pages in different disciplines. There were also disciplinary differences in both types of link. As a consequence, we argue that interpretations of web science maps covering multiple disciplines will need to be sensitive to the contexts of the links mapped.
    • Motivations for academic web site interlinking: evidence for the Web as a novel source of information on informal scholarly communication

      Wilkinson, David; Harries, Gareth; Thelwall, Mike; Price, Liz (Sage, 2003)
      The need to understand authors’ motivations for creating links between university web sites is addressed by a survey of a random collection of 414 such links from the ac.uk domain. A classification scheme was created and applied to this collection. Obtaining inter-classifier agreement as to the single main link creation cause was very difficult because of multiple potential motivations and the fluidity of genre on the Web. Nevertheless, it was clear that, whilst the vast majority, over 90%, was created for broadly scholarly reasons, only two were equivalent to journal citations. It is concluded that academic web link metrics will be dominated by a range of informal types of scholarly communication. Since formal communication can be extensively studied through citation analysis, this provides an exciting new window through which to investigate a facet of a previously obscured type of communication activity.
    • Three target document range metrics for university web sites

      Thelwall, Mike; Wilkinson, David (Wiley, 2003)
      Three new metrics are introduced that measure the range of use of a university Web site by its peers through different heuristics for counting links targeted at its pages. All three give results that correlate significantly with the research productivity of the target institution. The directory range model, which is based upon summing the number of distinct directories targeted by each other university, produces the most promising results of any link metric yet. Based upon an analysis of changes between models, it is suggested that range models measure essentially the same quantity as their predecessors but are less susceptible to spurious causes of multiple links and are therefore more robust.