2.50
Hdl Handle:
http://hdl.handle.net/2436/15919
Title:
A Free Database of University Web Links: Data Collection Issues
Authors:
Thelwall, Mike
Abstract:
This paper describes and gives access to a database of the link structures of 109 UK university and higher education college websites, as created by a specialist information science web crawler in June and July of 2001. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.This paper describes a free set of databases of the link structures of the university web sites from a selection of countries, as created by a specialist information science web crawler. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.
Citation:
International Journal of Scientometrics, Informetrics and Bibliometrics, 2002/3, 6/7(1): Paper 2
Issue Date:
2003
URI:
http://hdl.handle.net/2436/15919
Additional Links:
http://www.cindoc.csic.es/cybermetrics/articles/v6i1p2.html
Type:
Article
Language:
en
Description:
Metadata only. Full text available at above link.
ISSN:
1137-5019
Appears in Collections:
Statistical Cybermetrics Research Group ; Statistical Cybermetrics Research Group

Full metadata record

DC FieldValue Language
dc.contributor.authorThelwall, Mike-
dc.date.accessioned2008-01-10T12:55:31Z-
dc.date.available2008-01-10T12:55:31Z-
dc.date.issued2003-
dc.identifier.citationInternational Journal of Scientometrics, Informetrics and Bibliometrics, 2002/3, 6/7(1): Paper 2en
dc.identifier.issn1137-5019-
dc.identifier.urihttp://hdl.handle.net/2436/15919-
dc.descriptionMetadata only. Full text available at above link.en
dc.description.abstractThis paper describes and gives access to a database of the link structures of 109 UK university and higher education college websites, as created by a specialist information science web crawler in June and July of 2001. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.This paper describes a free set of databases of the link structures of the university web sites from a selection of countries, as created by a specialist information science web crawler. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque techniques of commercial search engines. Basic tools for querying are also provided. The key issues concerning running an accurate web crawler are also discussed. Access is also given to the normally hidden crawler stop list with the aim of making the crawl process more transparent. The necessity of having such a list is discussed, with the conclusion that fully automatic crawling is not socially or empirically desirable because of the existence of database-generated areas of the web and the proliferation of the phenomenon of mirroring.en
dc.language.isoenen
dc.relation.urlhttp://www.cindoc.csic.es/cybermetrics/articles/v6i1p2.htmlen
dc.subjectWeb impact factorsen
dc.subjectSearch enginesen
dc.subjectWeb crawlersen
dc.subjectWeblinksen
dc.subjectWebsites-
dc.subjectUniversities-
dc.subjectAcademic websites-
dc.titleA Free Database of University Web Links: Data Collection Issuesen
dc.typeArticleen
All Items in WIRE are protected by copyright, with all rights reserved, unless otherwise indicated.