Show simple item record

dc.contributor.authorThelwall, Mike
dc.contributor.authorWilkinson, David
dc.date.accessioned2008-05-21T10:59:55Z
dc.date.available2008-05-21T10:59:55Z
dc.date.issued2004
dc.identifier.citationInformation Processing & Management, 40 (3): 515-526
dc.identifier.issn03064573
dc.identifier.doi10.1016/S0306-4573(03)00042-6
dc.identifier.urihttp://hdl.handle.net/2436/27375
dc.description.abstractA common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.
dc.language.isoen
dc.publisherElsevier
dc.relation.urlhttp://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VC8-48WPPGX-1&_user=1644469&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000054077&_version=1&_urlVersion=0&_userid=1644469&md5=3426871bfba523a853ec684de16a9ecf
dc.subjectAcademic websites
dc.subjectDocument clustering
dc.subjectWebometrics
dc.subjectInformation retrieval
dc.titleFinding similar academic Web sites with links, bibliometric couplings and colinks
dc.typeJournal article
dc.identifier.journalInformation Processing & Management
html.description.abstractA common task in both Webmetrics and Web information retrieval is to identify a set of Web pages or sites that are similar in content. In this paper we assess the extent to which links, colinks and couplings can be used to identify similar Web sites. As an experiment, a random sample of 500 pairs of domains from the UK academic Web were taken and human assessments of site similarity, based upon content type, were compared against ratings for the three concepts. The results show that using a combination of all three gives the highest probability of identifying similar sites, but surprisingly this was only a marginal improvement over using links alone. Another unexpected result was that high values for either colink counts or couplings were associated with only a small increased likelihood of similarity. The principal advantage of using couplings and colinks was found to be greater coverage in terms of a much larger number of pairs of sites being connected by these measures, instead of increased probability of similarity. In information retrieval terminology, this is improved recall rather than improved precision.


Files in this item

Thumbnail
Name:
Publisher version

This item appears in the following Collection(s)

Show simple item record