Collaboration between UK Universities: A machine-learning based webometric analysis

2.89
Hdl Handle:
http://hdl.handle.net/2436/338261
Title:
Collaboration between UK Universities: A machine-learning based webometric analysis
Authors:
Kenekayoro, Patrick
Abstract:
Collaboration is essential for some types of research, which is why some agencies include collaboration among the requirements for funding research projects. Studying collaborative relationships is important because analyses of collaboration networks can give insights into knowledge based innovation systems, the roles that different organisations play in a research field and the relationships between scientific disciplines. Co-authored publication data is widely used to investigate collaboration between organisations, but this data is not free and thus may not be accessible for some researchers. Hyperlinks have some similarities with citations, so hyperlink data may be used as an indicator to estimate the extent of collaboration between academic institutions and may be able to show types of relationships that are not present in co-authorship data. However, it has been shown that using raw hyperlink counts for webometric research can sometimes produce unreliable results, so researchers have attempted to find alternate counting methods and have tried to identify the reasons why hyperlinks may have been created in academic websites. This thesis uses machine learning techniques, an approach that has not previously been widely used in webometric research, to automatically classify hyperlinks and text in university websites in an attempt to filter out irrelevant hyperlinks when investigating collaboration between academic institutions. Supervised machine learning methods were used to automatically classify the web page types that can be found in Higher Education Institutions’ websites. The results were assessed to see whether ii automatically filtered hyperlink data gave better results than raw hyperlink data in terms of identifying patterns of collaboration between UK universities. Unsupervised learning methods were used to automatically identify groups of university departments that are collaborating or that may benefit from collaborating together, based on their co-appearance in research clusters. Results show that the machine learning methods used in this thesis can automatically identify both the source and target web page categories of hyperlinks in university websites with up to 78% accuracy; which means that it can increase the possibility for more effective hyperlink classification or for identifying the reasons why hyperlinks may have been created in university websites, if those reasons can be inferred from the relationship between the source and target page types. When machine learning techniques were used to filter hyperlinks that may not have been created because of collaboration from the hyperlink data, there was an increased correlation between hyperlink data and other collaboration indicators. This emphasises the possibility for using machine learning methods to make hyperlink data a more reliable data source for webometric research. The reasons for university name mentions in the different web page types found in an academic institution’s website are broadly the same as the reasons for link creation, this means that classification based on inter-page relationships may also be used to improve name mentions data for webometrics research. iii Clustering research groups based on the text in their homepages may be useful for identifying those research groups or departments with similar research interests which may be valuable for policy makers in monitoring research fields; based on the sizes of identified clusters and for identifying future collaborators; based on co-appearances in clusters, if identical research interests is a factor that can influence the choice of a future collaborator. In conclusion, this thesis shows that machine learning techniques can be used to significantly improve the quality of hyperlink data for webometrics research, and can also be used to analyse other web based data to give additional insights that may be beneficial for webometrics studies.
Issue Date:
Sep-2014
URI:
http://hdl.handle.net/2436/338261
Type:
Thesis
Language:
en
Description:
A thesis submitted
Appears in Collections:
E-Theses

Full metadata record

DC FieldValue Language
dc.contributor.authorKenekayoro, Patricken_GB
dc.date.accessioned2015-01-14T15:34:45Z-
dc.date.available2015-01-14T15:34:45Z-
dc.date.issued2014-09-
dc.identifier.urihttp://hdl.handle.net/2436/338261-
dc.descriptionA thesis submitteden_GB
dc.description.abstractCollaboration is essential for some types of research, which is why some agencies include collaboration among the requirements for funding research projects. Studying collaborative relationships is important because analyses of collaboration networks can give insights into knowledge based innovation systems, the roles that different organisations play in a research field and the relationships between scientific disciplines. Co-authored publication data is widely used to investigate collaboration between organisations, but this data is not free and thus may not be accessible for some researchers. Hyperlinks have some similarities with citations, so hyperlink data may be used as an indicator to estimate the extent of collaboration between academic institutions and may be able to show types of relationships that are not present in co-authorship data. However, it has been shown that using raw hyperlink counts for webometric research can sometimes produce unreliable results, so researchers have attempted to find alternate counting methods and have tried to identify the reasons why hyperlinks may have been created in academic websites. This thesis uses machine learning techniques, an approach that has not previously been widely used in webometric research, to automatically classify hyperlinks and text in university websites in an attempt to filter out irrelevant hyperlinks when investigating collaboration between academic institutions. Supervised machine learning methods were used to automatically classify the web page types that can be found in Higher Education Institutions’ websites. The results were assessed to see whether ii automatically filtered hyperlink data gave better results than raw hyperlink data in terms of identifying patterns of collaboration between UK universities. Unsupervised learning methods were used to automatically identify groups of university departments that are collaborating or that may benefit from collaborating together, based on their co-appearance in research clusters. Results show that the machine learning methods used in this thesis can automatically identify both the source and target web page categories of hyperlinks in university websites with up to 78% accuracy; which means that it can increase the possibility for more effective hyperlink classification or for identifying the reasons why hyperlinks may have been created in university websites, if those reasons can be inferred from the relationship between the source and target page types. When machine learning techniques were used to filter hyperlinks that may not have been created because of collaboration from the hyperlink data, there was an increased correlation between hyperlink data and other collaboration indicators. This emphasises the possibility for using machine learning methods to make hyperlink data a more reliable data source for webometric research. The reasons for university name mentions in the different web page types found in an academic institution’s website are broadly the same as the reasons for link creation, this means that classification based on inter-page relationships may also be used to improve name mentions data for webometrics research. iii Clustering research groups based on the text in their homepages may be useful for identifying those research groups or departments with similar research interests which may be valuable for policy makers in monitoring research fields; based on the sizes of identified clusters and for identifying future collaborators; based on co-appearances in clusters, if identical research interests is a factor that can influence the choice of a future collaborator. In conclusion, this thesis shows that machine learning techniques can be used to significantly improve the quality of hyperlink data for webometrics research, and can also be used to analyse other web based data to give additional insights that may be beneficial for webometrics studies.en_GB
dc.language.isoenen
dc.subjectcollaborationen_GB
dc.subjectuniversityen_GB
dc.subjectmachine learningen_GB
dc.subjectsupervised learningen_GB
dc.subjectunsupervised learningen_GB
dc.subjectwebometricsen_GB
dc.subjectlink analysisen_GB
dc.subjectco-word analysisen_GB
dc.subjectclassificationen_GB
dc.subjectclusteringen_GB
dc.titleCollaboration between UK Universities: A machine-learning based webometric analysisen
dc.typeThesisen
All Items in WIRE are protected by copyright, with all rights reserved, unless otherwise indicated.