Average rating
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Star rating
Your vote was cast
Thank you for your feedback
Thank you for your feedback
Authors
Dornescu, IustinAdvisors
Orasan, Constantin Dr.Mitkov, Ruslan Prof.
Navigli, Roberto Prof.
Issue Date
2012-02
Metadata
Show full item recordAbstract
Open-domain question answering (QA) is an established NLP task which enables users to search for speciVc pieces of information in large collections of texts. Instead of using keyword-based queries and a standard information retrieval engine, QA systems allow the use of natural language questions and return the exact answer (or a list of plausible answers) with supporting snippets of text. In the past decade, open-domain QA research has been dominated by evaluation fora such as TREC and CLEF, where shallow techniques relying on information redundancy have achieved very good performance. However, this performance is generally limited to simple factoid and deVnition questions because the answer is usually explicitly present in the document collection. Current approaches are much less successful in Vnding implicit answers and are diXcult to adapt to more complex question types which are likely to be posed by users. In order to advance the Veld of QA, this thesis proposes a shift in focus from simple factoid questions to encyclopaedic questions: list questions composed of several constraints. These questions have more than one correct answer which usually cannot be extracted from one small snippet of text. To correctly interpret the question, systems need to combine classic knowledge-based approaches with advanced NLP techniques. To Vnd and extract answers, systems need to aggregate atomic facts from heterogeneous sources as opposed to simply relying on keyword-based similarity. Encyclopaedic questions promote QA systems which use basic reasoning, making them more robust and easier to extend with new types of constraints and new types of questions. A novel semantic architecture is proposed which represents a paradigm shift in open-domain QA system design, using semantic concepts and knowledge representation instead of words and information retrieval. The architecture consists of two phases, analysis – responsible for interpreting questions and Vnding answers, and feedback – responsible for interacting with the user. This architecture provides the basis for EQUAL, a semantic QA system developed as part of the thesis, which uses Wikipedia as a source of world knowledge and iii employs simple forms of open-domain inference to answer encyclopaedic questions. EQUAL combines the output of a syntactic parser with semantic information from Wikipedia to analyse questions. To address natural language ambiguity, the system builds several formal interpretations containing the constraints speciVed by the user and addresses each interpretation in parallel. To Vnd answers, the system then tests these constraints individually for each candidate answer, considering information from diUerent documents and/or sources. The correctness of an answer is not proved using a logical formalism, instead a conVdence-based measure is employed. This measure reWects the validation of constraints from raw natural language, automatically extracted entities, relations and available structured and semi-structured knowledge from Wikipedia and the Semantic Web. When searching for and validating answers, EQUAL uses the Wikipedia link graph to Vnd relevant information. This method achieves good precision and allows only pages of a certain type to be considered, but is aUected by the incompleteness of the existing markup targeted towards human readers. In order to address this, a semantic analysis module which disambiguates entities is developed to enrich Wikipedia articles with additional links to other pages. The module increases recall, enabling the system to rely more on the link structure of Wikipedia than on word-based similarity between pages. It also allows authoritative information from diUerent sources to be linked to the encyclopaedia, further enhancing the coverage of the system. The viability of the proposed approach was evaluated in an independent setting by participating in two competitions at CLEF 2008 and 2009. In both competitions, EQUAL outperformed standard textual QA systems as well as semi-automatic approaches. Having established a feasible way forward for the design of open-domain QA systems, future work will attempt to further improve performance to take advantage of recent advances in information extraction and knowledge representation, as well as by experimenting with formal reasoning and inferencing capabilities.Publisher
University of WolverhamptonType
Thesis or dissertationLanguage
enCollections
Related items
Showing items related by title, author, creator and subject.
-
Critical questions for WOLF: an evaluation of the use of a VLE in the teaching and assessment of English StudiesMiles, Rosie; Colbert, Benjamin; Wilson, Frank (Centre of Excellence in Learning and Teaching, 2005)
-
How common are explicit research questions in journal articles?Thelwall, Michael; Mas-Bleda, Amalia (MIT Press, 2020-12-01)Although explicitly labelled research questions seem to be central to some fields, others do not need them. This may confuse authors, editors, readers and reviewers of multidisciplinary research. This article assesses the extent to which research questions are explicitly mentioned in 17 out of 22 areas of scholarship from 2000 to 2018 by searching over a million full-text open access journal articles. Research questions were almost never explicitly mentioned (under 2%) by articles in engineering, physical, life and medical sciences, and were the exception (always under 20%) for the broad fields in which they were least rare: computing, philosophy, theology and social sciences. Nevertheless, research questions were increasingly mentioned explicitly in all fields investigated, despite a rate of 1.8% overall (1.1% after correcting for irrelevant matches). Other terminology for an article’s purpose may be more widely used instead, including aims, objectives, goals, hypotheses, and purposes, although no terminology occurs in a majority of articles in any broad field tested. Authors, editors, readers and reviewers should therefore be aware that the use of explicitly labelled research questions or other explicit research purpose terminology is non-standard in most or all broad fields, although it is becoming less rare. Keywords: Research purpose statements; research article structures; research questions; research aims; research goals.
-
Automatic question answering for medical MCQs: Can it go further than information retrieval?Ha, Le An; Yaneva, Viktoriya (RANLP, 2019-09-04)We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require training data. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.