Loading...
Thumbnail Image
Item

Automatic identification and translation of multiword expressions

Taslimipoor, Shiva
Editors
Other contributors
Affiliation
Epub Date
Issue Date
2018-06-30
Submitted date
Subjects
Alternative
Abstract
Multiword Expressions (MWEs) belong to a class of phraseological phenomena that is ubiquitous in the study of language. They are heterogeneous lexical items consisting of more than one word and feature lexical, syntactic, semantic and pragmatic idiosyncrasies. Scholarly research on MWEs benefits both natural language processing (NLP) applications and end users. This thesis involves designing new methodologies to identify and translate MWEs. In order to deal with MWE identification, we first develop datasets of annotated verb-noun MWEs in context. We then propose a method which employs word embeddings to disambiguate between literal and idiomatic usages of the verb-noun expressions. Existence of expression types with various idiomatic and literal distributions leads us to re-examine their modelling and evaluation. We propose a type-aware train and test splitting approach to prevent models from overfitting and avoid misleading evaluation results. Identification of MWEs in context can be modelled with sequence tagging methodologies. To this end, we devise a new neural network architecture, which is a combination of convolutional neural networks and long-short term memories with an optional conditional random field layer on top. We conduct extensive evaluations on several languages demonstrating a better performance compared to the state-of-the-art systems. Experiments show that the generalisation power of the model in predicting unseen MWEs is significantly better than previous systems. In order to find translations for verb-noun MWEs, we propose a bilingual distributional similarity approach derived from a word embedding model that supports arbitrary contexts. The technique is devised to extract translation equivalents from comparable corpora which are an alternative resource to costly parallel corpora. We finally conduct a series of experiments to investigate the effects of size and quality of comparable corpora on automatic extraction of translation equivalents.
Citation
Publisher
Journal
Research Unit
DOI
PubMed ID
PubMed Central ID
Embedded videos
Additional Links
Type
Thesis or dissertation
Language
en
Description
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.
Series/Report no.
ISSN
EISSN
ISBN
ISMN
Gov't Doc #
Sponsors
Rights
Attribution-NonCommercial-NoDerivs 3.0 United States
Research Projects
Organizational Units
Journal Issue
Embedded videos