Recent Submissions

  • Multiword units in machine translation and translation technology

    Ruslan, Mitkov; Monti, Johanna; Corpas Pastor, Gloria; Seretan, Violeta (John Benjamins, 2018)
    The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but we believe that there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. In this chapter, we present a survey of the field with particular reference to Machine Translation and Translation Technology.
  • What matters more: the size of the corpora or their quality? The case of automatic translation of multiword expressions using comparable corpora.

    Mitkov, Ruslan; Taslimipoor, Shiva (John Benjamins, 2016)
    This study investigates (and compares) the impact of the size and the similarity/quality of comparable corpora on the specific task of extracting translation equivalents of verb-noun collocations from such corpora. The comprehensive evaluation of different configurations of English and Spanish corpora sheds some light on the more general and perennial question: what matters more – the quantity or quality of corpora?
  • Intelligent Natural Language Processing: Trends and Applications

    Orăsan, Constantin; Evans, Richard; Mitkov, Ruslan (Springer, 2017)
    Autistic Spectrum Disorder (ASD) is a neurodevelopmental disorder which has a life-long impact on the lives of people diagnosed with the condition. In many cases, people with ASD are unable to derive the gist or meaning of written documents due to their inability to process complex sentences, understand non-literal text, and understand uncommon and technical terms. This paper presents FIRST, an innovative project which developed language technology (LT) to make documents more accessible to people with ASD. The project has produced a powerful editor which enables carers of people with ASD to prepare texts suitable for this population. Assessment of the texts generated using the editor showed that they are not less readable than those generated more slowly as a result of onerous unaided conversion and were significantly more readable than the originals. Evaluation of the tool shows that it can have a positive impact on the lives of people with ASD.
  • A Single Chip System for Sensor Data Fusion Based on a Drift-diffusion Model

    Yang, Shufan; Wong-Lin, Kongfatt; Rano, Inaki; Lindsay, Anthony (IEEE, 2017-09-7)
    Current multisensory system face data communication overhead in integrating disparate sensor data to build a coherent and accurate global phenomenon. We present here a novel hardware and software co-design platform for a heterogeneous data fusion solution based on a perceptual decision making approach (the drift-diffusion model). It provides a convenient infrastructure for sensor data acquisition and data integration and only uses a single chip Xilinx ZYNQ-7000 XC7Z020 AP SOC. A case study of controlling the moving speed of a single ground-based robot, according to physiological states of the operator based on heart rates, is conducted and demonstrates the possibility of integrated sensor data fusion architecture. The results of our DDM-based data integration shows a better correlation coefficient with the raw ECG signal compare with a simply piecewise approach.
  • A neuro-inspired visual tracking method based on programmable system-on-chip platform

    Yang, Shufan; Wong-Lin, KongFatt; Andrew, James; Mak, Terrence; McGinnity, T. Martin (Springer, 2017-01-20)
    Using programmable system-on-chip to implement computer vision functions poses many challenges due to highly constrained resources in cost, size and power consumption. In this work, we propose a new neuro-inspired image processing model and implemented it on a system-on-chip Xilinx Z702c board. With the attractor neural network model to store the object’s contour information, we eliminate the computationally expensive steps in the curve evolution re-initialisation at every new iteration or frame. Our experimental results demonstrate that this integrated approach achieves accurate and robust object tracking, when they are partially or completely occluded in the scenes. Importantly, the system is able to process 640 by 480 videos in real-time stream with 30 frames per second using only one low-power Xilinx Zynq-7000 system-on-chip platform. This proof-of-concept work has demonstrated the advantage of incorporating neuro-inspired features in solving image processing problems during occlusion.
  • An intelligible implementation of FastSLAM2.0 on a low-power embedded architecture

    Jiménez Serrata, Albert A.; Yang, Shufan; Li, Renfa (Springer, 2017-03-02)
    The simultaneous localisation and mapping (SLAM) algorithm has drawn increasing interests in autonomous robotic systems. However, SLAM has not been widely explored in embedded system design spaces yet due to the limitation of processing recourses in embedded systems. Especially when landmarks are not identifiable, the amount of computer processing will dramatically increase due to unknown data association. In this work, we propose an intelligible SLAM solution for an embedded processing platform to reduce computer processing time using a low-variance resampling technique. Our prototype includes a low-cost pixy camera, a Robot kit with L298N motor board and Raspberry Pi V2.0. Our prototype is able to recognise artificial landmarks in a real environment with an average 75% of identified landmarks in corner detection and corridor detection with only average 1.14 W.
  • Reuse of scientific data in academic publications

    He, Lin; Nahar, Vinita (Emerald Group Publishing Limited, 2016-07-18)
    Purpose In recent years, a large number of data repositories have been built and used. However, the extent to which scientific data is reused in academic publications is still unknown. This article explores the functions of re-used scientific data in scholarly publication in different fields. Design/methodology/approach To address these questions, we identified 827 publications citing resources in the Dryad Digital Repository (DDR) indexed by Scopus from 2010 to 2015. Findings The results show that: (i) the number of citations to scientific data increases sharply over the years, but mainly from data-intensive disciplines, such as Agricultural, Biology Science, Environment Science and Medicine; (ii) the majority of citations are from the originating articles; (iii) researchers tend to reuse data produced by their own research groups. Research limitations/implications data may be re-used without being formally cited. Originality/value The conservatism in data sharing suggests that more should be done to encourage researchers to re-use other’s data.
  • Discovery of event entailment knowledge from text corpora

    Pekar, Viktor (Elsevier, 2008)
    Event entailment is knowledge that may prove useful for a variety of applications dealing with inferencing over events described in natural language texts. In this paper, we propose a method for automatic discovery of pairs of verbs related by entailment, such as X buy Y X own Y and appoint X as Y X become Y. In contrast to previous approaches that make use of lexico-syntactic patterns and distributional evidence, the underlying assumption of our method is that the implication of one event by another manifests itself in the regular co-occurrence of the two corresponding verbs within locally coherent text. Based on the analogy with the problem of learning selectional preferences Resnik’s [Resnik, P., 1993. Selection and information: a class-based approach to lexical relationships, Ph.D. Thesis, University of Pennsylvania] association strength measure is used to score the extracted verb pairs for asymmetric association in order to discover the direction of entailment in each pair. In our experimental evaluation, we examine the effect that various local discourse indicators produce on the accuracy of this model of entailment. After that we carry out a direct evaluation of the verb pairs against human subjects’ judgements and extrinsically evaluate the pairs on the task of noun phrase coreference resolution.
  • Design and development of a concept-based multi-document summarization system for research abstracts

    Ou, Shiyan; Khoo, Christopher S.G.; Goh, Dion H. (Sage, 2008)
    This paper describes a new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration. Dissertation abstracts in the field of sociology were selected as sample documents for this study. The summarization process includes four major steps — (1) parsing dissertation abstracts into five standard sections; (2) extracting research concepts (often operationalized as research variables) and their relationships, the research methods used and the contextual relations from specific sections of the text; (3) integrating similar concepts and relationships across different abstracts; and (4) combining and organizing the different kinds of information using a variable-based framework, and presenting them in an interactive web-based interface. The accuracy of each summarization step was evaluated by comparing the system-generated output against human coding. The user evaluation carried out in the study indicated that the majority of subjects (70%) preferred the concept-based summaries generated using the system to the sentence-based summaries generated using traditional sentence extraction techniques.
  • Automatic multidocument summarization of research abstracts: Design and user evaluation

    Ou, Shiyan; Khoo, Christopher S.G.; Goh, Dion H. (Wiley, 2007)
    The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method - with or without the use of a taxonomy - were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.
  • Multi-document summarization of news articles using an event-based framework

    Ou, Shiyan; Khoo, Christopher S.G.; Goh, Dion H. (Emerald, 2006)
    Purpose – The purpose of this research is to develop a method for automatic construction of multi-document summaries of sets of news articles that might be retrieved by a web search engine in response to a user query. Design/methodology/approach – Based on the cross-document discourse analysis, an event-based framework is proposed for integrating and organizing information extracted from different news articles. It has a hierarchical structure in which the summarized information is presented at the top level and more detailed information given at the lower levels. A tree-view interface was implemented for displaying a multi-document summary based on the framework. A preliminary user evaluation was performed by comparing the framework-based summaries against the sentence-based summaries. Findings – In a small evaluation, all the human subjects preferred the framework-based summaries to the sentence-based summaries. It indicates that the event-based framework is an effective way to summarize a set of news articles reporting an event or a series of relevant events. Research limitations/implications – Limited to event-based news articles only, not applicable to news critiques and other kinds of news articles. A summarization system based on the event-based framework is being implemented. Practical implications – Multi-document summarization of news articles can adopt the proposed event-based framework. Originality/value – An event-based framework for summarizing sets of news articles was developed and evaluated using a tree-view interface for displaying such summaries.
  • NP animacy identification for anaphora resolution

    Orasan, Constantin; Evans, Richard (American Association for Artificial Intelligence, 2007)
    In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which uses information about the unique beginners in WordNet to classify NPs on the basis of their animacy. The second method relies on a machine learning algorithm which exploits a WordNet enriched with animacy information for each sense. The effect of word sense disambiguation on the two methods is also assessed. The intrinsic evaluation reveals that the machine learning method reaches human levels of performance. The extrinsic evaluation demonstrates that animacy identification can be beneficial in anaphora resolution, especially in the cases where animate entities are identified with high precision.
  • A High Precision Information Retrieval Method for WiQA

    Orasan, Constantin; Puşcaşu, Georgiana (Springer, 2007)
    This paper presents Wolverhampton University’s participation in the WiQA competition. The method chosen for this task combines a high precision, but low recall information retrieval approach with a greedy sentence ranking algorithm. The high precision retrieval is ensured by querying the search engine with the exact topic, in this way obtaining only sentences which contain the topic. In one of the runs, the set of retrieved sentences is expanded using coreferential relations between sentences. The greedy algorithm used for ranking selects one sentence at a time, always the one which adds most information to the set of sentences without repeating the existing information too much. The evaluation revealed that it achieves a performance similar to other systems participating in the competition and that the run which uses coreference obtains the highest MRR score among all the participants.
  • Anaphora Resolution: To What Extent Does It Help NLP Applications?

    Mitkov, Ruslan; Evans, Richard; Orasan, Constantin (Springer, 2007)
  • Anaphora Resolution

    Mitkov, Ruslan (Longman, 2002)
  • A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method

    Mitkov, Ruslan; Evans, Richard; Orasan, Constantin (Springer, 2002)
    This paper describes a new, advanced and completely revamped version of Mitkov's knowledge-poor approach to pronoun resolution. In contrast to most anaphora resolution approaches, the new system, referred to as MARS, operates in fully automatic mode. It benefits from purpose-built programs for identifying occurrences of non-nominal anaphora (including pleonastic pronouns) and for recognition of animacy, and employs genetic algorithms to achieve optimal performance. The paper features extensive evaluation and discusses important evaluation issues in anaphora resolution.
  • Refined Salience Weighting and Error Analysis in Anaphora Resolution.

    Evans, Richard (The Research Group in Computational Linguistics, 2002)
    In this paper, the behaviour of an existing pronominal anaphora resolution system is modified so that different types of pronoun are treated in different ways. Weights are derived using a genetic algorithm for the outcomes of tests applied by this branching algorithm. Detailed evaluation and error analysis is undertaken. Proposals for future research are put forward.
  • A framework for named entity recognition in the open domain

    Evans, Richard (John Benjamins Publishing Company, 2004)
  • A computer-aided environment for construction of multiple-choice tests

    Mitkov, Ruslan; Ha, Le An; Bernardes, Jon (University of Wolverhampton, 2005)
    Multiple choice tests have proved to be an efficient tool for measuring students' achievement and are used on a daily basis both for assessment and diagnostics worldwide. The objective of this project was to provide and alternative to the lengthy and demanding activity of developing multiple-choice tests and propose a new Natural Language Processing (NLP) based approach to generate tests from instructional texts (textbooks, encyclopaedias). Work on the pilot project has shown that the semi-automatic procedure is up to 3.8 times quicker than a completely manual one.

View more