Show simple item record

dc.contributor.authorThelwall, Mike
dc.date.accessioned2006-08-23T14:47:27Z
dc.date.available2006-08-23T14:47:27Z
dc.date.issued2003
dc.identifier.citationThelwall, M. (2003), "A layered approach for investigating the topological structure of communities in the Web", Journal of Documentation, Vol. 59 No. 4, pp. 410-429. https://doi.org/10.1108/00220410310485703
dc.identifier.issn0022-0418
dc.identifier.doi10.1108/00220410310485703
dc.identifier.urihttp://hdl.handle.net/2436/4009
dc.descriptionThis is an accepted manuscript of an article published by MCB UP Ltd in Journal of Documentation on 01/08/2003, available online: https://doi.org/10.1108/00220410310485703 The accepted version of the publication may differ from the final published version.
dc.description.abstractA layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
dc.formatapplication/pdf
dc.format.extent337348 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherMCB UP Ltd
dc.relation.urlhttp://www.emeraldinsight.com/10.1108/00220410310485703
dc.subjectCollaborative working
dc.subjectInformation retrieval
dc.subjectWebometrics
dc.subjectModelling
dc.subjectUK
dc.subjectAcademic websites
dc.titleA layered approach for investigating the topological structure of communities in the Web.
dc.typeJournal article
dc.identifier.journalJournal of Documentation
dc.format.digYES
rioxxterms.versionAM
dc.source.volume59
dc.source.issue4
dc.source.beginpage410
dc.source.endpage429
refterms.dateFCD2020-06-09T13:15:45Z
refterms.versionFCDAM
refterms.dateFOA2018-08-21T11:55:51Z
html.description.abstractA layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.


Files in this item

Thumbnail
Name:
2003 A layered approach prepri ...
Size:
329.4Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record