University of Wolverhampton
Browse
Collection All
bullet
bullet
bullet
bullet
Listed communities
bullet
bullet
bullet
bullet
bullet
bullet
bullet
bullet
bullet
bullet
bullet
bullet
bullet

Wolverhampton Intellectual Repository and E-Theses > School of Technology > School of Computing and IT > Statistical Cybermetrics Research Group  > A Free Database of University Web Links: Data Collection Issues

Please use this identifier to cite or link to this item: http://hdl.handle.net/2436/15919
    Del.icio.us     LinkedIn     Citeulike     Connotea     Facebook     Stumble it!



Title: A Free Database of University Web Links: Data Collection Issues
Authors: Thelwall, Mike
Citation: International Journal of Scientometrics, Informetrics and Bibliometrics, 2002/3, 6/7(1): Paper 2
Issue Date: 2003
URI: http://hdl.handle.net/2436/15919
Additional Links: http://www.cindoc.csic.es/cybermetrics/articles/v6i1p2.html
Abstract: This paper describes and gives access to a database of the link structures of 109 UK university and higher education college websites, as created by a specialist information science web crawler in June and July of 2001. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.This paper describes a free set of databases of the link structures of the university web sites from a selection of countries, as created by a specialist information science web crawler. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.
Type: Article
Language: en
Description: Metadata only. Full text available at above link.
Keywords: Web impact factors
Search engines
Web crawlers
Weblinks
Websites
Universities
Academic websites
ISSN: 1137-5019
Appears in Collections: Statistical Cybermetrics Research Group
Statistical Cybermetrics Research Group

Files in This Item:

There are no files associated with this item.



All Items in WIRE are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Fairtrade - Guarantees a better deal for Third World Producers

University of Wolverhampton, Wulfruna Street, Wolverhampton, WV1 1LY

Course enquiries: 0800 953 3222, General enquiries: 01902 321000,
Email: enquiries@wlv.ac.uk | Freedom of Information | Disclaimer and copyright | Website feedback | The University as a charity

OR Logo Powered by Open Repository | Cookies