| Title: | Encyclopaedic Question Answering |
| Authors: | Dornescu, Iustin |
| Advisors: | Orasan, Constantin Dr. Mitkov, Ruslan Prof. Navigli, Roberto Prof. |
| Publisher: | University of Wolverhampton |
| Issue Date: | Feb-2012 |
| URI: | http://hdl.handle.net/2436/254613 |
| Abstract: | Open-domain question answering (QA) is an established NLP task which enables users
to search for speciVc pieces of information in large collections of texts. Instead of
using keyword-based queries and a standard information retrieval engine, QA systems
allow the use of natural language questions and return the exact answer (or a list of
plausible answers) with supporting snippets of text. In the past decade, open-domain QA
research has been dominated by evaluation fora such as TREC and CLEF, where shallow
techniques relying on information redundancy have achieved very good performance.
However, this performance is generally limited to simple factoid and deVnition questions
because the answer is usually explicitly present in the document collection. Current
approaches are much less successful in Vnding implicit answers and are diXcult to adapt
to more complex question types which are likely to be posed by users.
In order to advance the Veld of QA, this thesis proposes a shift in focus from simple factoid
questions to encyclopaedic questions: list questions composed of several constraints.
These questions have more than one correct answer which usually cannot be extracted
from one small snippet of text. To correctly interpret the question, systems need to
combine classic knowledge-based approaches with advanced NLP techniques. To Vnd
and extract answers, systems need to aggregate atomic facts from heterogeneous sources
as opposed to simply relying on keyword-based similarity. Encyclopaedic questions
promote QA systems which use basic reasoning, making them more robust and easier
to extend with new types of constraints and new types of questions. A novel semantic
architecture is proposed which represents a paradigm shift in open-domain QA system
design, using semantic concepts and knowledge representation instead of words and
information retrieval. The architecture consists of two phases, analysis – responsible for
interpreting questions and Vnding answers, and feedback – responsible for interacting
with the user.
This architecture provides the basis for EQUAL, a semantic QA system developed
as part of the thesis, which uses Wikipedia as a source of world knowledge and
iii
employs simple forms of open-domain inference to answer encyclopaedic questions.
EQUAL combines the output of a syntactic parser with semantic information from
Wikipedia to analyse questions. To address natural language ambiguity, the system
builds several formal interpretations containing the constraints speciVed by the user
and addresses each interpretation in parallel. To Vnd answers, the system then tests
these constraints individually for each candidate answer, considering information from
diUerent documents and/or sources. The correctness of an answer is not proved using a
logical formalism, instead a conVdence-based measure is employed. This measure reWects
the validation of constraints from raw natural language, automatically extracted entities,
relations and available structured and semi-structured knowledge from Wikipedia and the
Semantic Web. When searching for and validating answers, EQUAL uses the Wikipedia
link graph to Vnd relevant information. This method achieves good precision and allows
only pages of a certain type to be considered, but is aUected by the incompleteness of the
existing markup targeted towards human readers. In order to address this, a semantic
analysis module which disambiguates entities is developed to enrich Wikipedia articles
with additional links to other pages. The module increases recall, enabling the system
to rely more on the link structure of Wikipedia than on word-based similarity between
pages. It also allows authoritative information from diUerent sources to be linked to the
encyclopaedia, further enhancing the coverage of the system.
The viability of the proposed approach was evaluated in an independent setting by
participating in two competitions at CLEF 2008 and 2009. In both competitions, EQUAL
outperformed standard textual QA systems as well as semi-automatic approaches. Having
established a feasible way forward for the design of open-domain QA systems, future
work will attempt to further improve performance to take advantage of recent advances
in information extraction and knowledge representation, as well as by experimenting
with formal reasoning and inferencing capabilities. |
| Type: | Thesis or dissertation |
| Language: | en |
| Keywords: | question answering semantic architecture semantic question answering named entity disambiguation Natural Language Processing question interpretation question decomposition wikification wikipedia |
| Appears in Collections: | E-Theses
|
| Files in This Item: |
| File |
Description |
Size |
Format |
View/Open |
| dornescu_PhDtheses.pdf | | 3420Kb | Adobe PDF |  View/Open |
|
All Items in WIRE are protected by copyright, with all rights reserved, unless otherwise indicated.