Unsupervised Relation Extraction for E-Learning Applications

2.50
Hdl Handle:
http://hdl.handle.net/2436/299064
Title:
Unsupervised Relation Extraction for E-Learning Applications
Authors:
Afzal, Naveed
Abstract:
In this modern era many educational institutes and business organisations are adopting the e-Learning approach as it provides an effective method for educating and testing their students and staff. The continuous development in the area of information technology and increasing use of the internet has resulted in a huge global market and rapid growth for e-Learning. Multiple Choice Tests (MCTs) are a popular form of assessment and are quite frequently used by many e-Learning applications as they are well adapted to assessing factual, conceptual and procedural information. In this thesis, we present an alternative to the lengthy and time-consuming activity of developing MCTs by proposing a Natural Language Processing (NLP) based approach that relies on semantic relations extracted using Information Extraction to automatically generate MCTs. Information Extraction (IE) is an NLP field used to recognise the most important entities present in a text, and the relations between those concepts, regardless of their surface realisations. In IE, text is processed at a semantic level that allows the partial representation of the meaning of a sentence to be produced. IE has two major subtasks: Named Entity Recognition (NER) and Relation Extraction (RE). In this work, we present two unsupervised RE approaches (surface-based and dependency-based). The aim of both approaches is to identify the most important semantic relations in a document without assigning explicit labels to them in order to ensure broad coverage, unrestricted to predefined types of relations. In the surface-based approach, we examined different surface pattern types, each implementing different assumptions about the linguistic expression of semantic relations between named entities while in the dependency-based approach we explored how dependency relations based on dependency trees can be helpful in extracting relations between named entities. Our findings indicate that the presented approaches are capable of achieving high precision rates. Our experiments make use of traditional, manually compiled corpora along with similar corpora automatically collected from the Web. We found that an automatically collected web corpus is still unable to ensure the same level of topic relevance as attained in manually compiled traditional corpora. Comparison between the surface-based and the dependency-based approaches revealed that the dependency-based approach performs better. Our research enabled us to automatically generate questions regarding the important concepts present in a domain by relying on unsupervised relation extraction approaches as extracted semantic relations allow us to identify key information in a sentence. The extracted patterns (semantic relations) are then automatically transformed into questions. In the surface-based approach, questions are automatically generated from sentences matched by the extracted surface-based semantic pattern which relies on a certain set of rules. Conversely, in the dependency-based approach questions are automatically generated by traversing the dependency tree of extracted sentence matched by the dependency-based semantic patterns. The MCQ systems produced from these surface-based and dependency-based semantic patterns were extrinsically evaluated by two domain experts in terms of questions and distractors readability, usefulness of semantic relations, relevance, acceptability of questions and distractors and overall MCQ usability. The evaluation results revealed that the MCQ system based on dependency-based semantic relations performed better than the surface-based one. A major outcome of this work is an integrated system for MCQ generation that has been evaluated by potential end users.
Advisors:
Mitkov, Ruslan
Publisher:
University of Wolverhampton
Issue Date:
2011
URI:
http://hdl.handle.net/2436/299064
Type:
Thesis or dissertation
Language:
en
Description:
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy
Appears in Collections:
E-Theses

Full metadata record

DC FieldValue Language
dc.contributor.advisorMitkov, Ruslanen_GB
dc.contributor.authorAfzal, Naveeden_GB
dc.date.accessioned2013-08-19T11:37:57Z-
dc.date.available2013-08-19T11:37:57Z-
dc.date.issued2011-
dc.identifier.urihttp://hdl.handle.net/2436/299064-
dc.descriptionA thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophyen_GB
dc.description.abstractIn this modern era many educational institutes and business organisations are adopting the e-Learning approach as it provides an effective method for educating and testing their students and staff. The continuous development in the area of information technology and increasing use of the internet has resulted in a huge global market and rapid growth for e-Learning. Multiple Choice Tests (MCTs) are a popular form of assessment and are quite frequently used by many e-Learning applications as they are well adapted to assessing factual, conceptual and procedural information. In this thesis, we present an alternative to the lengthy and time-consuming activity of developing MCTs by proposing a Natural Language Processing (NLP) based approach that relies on semantic relations extracted using Information Extraction to automatically generate MCTs. Information Extraction (IE) is an NLP field used to recognise the most important entities present in a text, and the relations between those concepts, regardless of their surface realisations. In IE, text is processed at a semantic level that allows the partial representation of the meaning of a sentence to be produced. IE has two major subtasks: Named Entity Recognition (NER) and Relation Extraction (RE). In this work, we present two unsupervised RE approaches (surface-based and dependency-based). The aim of both approaches is to identify the most important semantic relations in a document without assigning explicit labels to them in order to ensure broad coverage, unrestricted to predefined types of relations. In the surface-based approach, we examined different surface pattern types, each implementing different assumptions about the linguistic expression of semantic relations between named entities while in the dependency-based approach we explored how dependency relations based on dependency trees can be helpful in extracting relations between named entities. Our findings indicate that the presented approaches are capable of achieving high precision rates. Our experiments make use of traditional, manually compiled corpora along with similar corpora automatically collected from the Web. We found that an automatically collected web corpus is still unable to ensure the same level of topic relevance as attained in manually compiled traditional corpora. Comparison between the surface-based and the dependency-based approaches revealed that the dependency-based approach performs better. Our research enabled us to automatically generate questions regarding the important concepts present in a domain by relying on unsupervised relation extraction approaches as extracted semantic relations allow us to identify key information in a sentence. The extracted patterns (semantic relations) are then automatically transformed into questions. In the surface-based approach, questions are automatically generated from sentences matched by the extracted surface-based semantic pattern which relies on a certain set of rules. Conversely, in the dependency-based approach questions are automatically generated by traversing the dependency tree of extracted sentence matched by the dependency-based semantic patterns. The MCQ systems produced from these surface-based and dependency-based semantic patterns were extrinsically evaluated by two domain experts in terms of questions and distractors readability, usefulness of semantic relations, relevance, acceptability of questions and distractors and overall MCQ usability. The evaluation results revealed that the MCQ system based on dependency-based semantic relations performed better than the surface-based one. A major outcome of this work is an integrated system for MCQ generation that has been evaluated by potential end users.en_GB
dc.language.isoenen
dc.publisherUniversity of Wolverhamptonen
dc.subjectE-Learningen_GB
dc.subjectAutomatic Generation of Multiple-Choice Questionsen_GB
dc.subjectInformation Extractionen_GB
dc.subjectRelation Extractionen_GB
dc.subjectBiomedical Domainen_GB
dc.subjectDependency Treeen_GB
dc.titleUnsupervised Relation Extraction for E-Learning Applicationsen_GB
dc.typeThesis or dissertationen
dc.type.qualificationnamePhDen
dc.type.qualificationlevelDoctoralen
All Items in WIRE are protected by copyright, with all rights reserved, unless otherwise indicated.