USE OF LANGUAGE TECHNOLOGY TO IMPROVE MATCHING AND RETRIEVAL IN TRANSLATION MEMORY

5.00
Hdl Handle:
http://hdl.handle.net/2436/620338
Title:
USE OF LANGUAGE TECHNOLOGY TO IMPROVE MATCHING AND RETRIEVAL IN TRANSLATION MEMORY
Authors:
Gupta, Rohit
Abstract:
Current Translation Memory (TM) tools lack semantic knowledge while matching. Most TM tools compute similarity at the string level, which does not take into account semantic aspects in matching. Therefore, semantically similar segments, which differ on the surface form, are often not retrieved. In this thesis, we present five novel and efficient approaches to incorporate advanced semantic knowledge in translation memory matching and retrieval. Two efficient approaches which use a paraphrase database to improve translation memory matching and retrieval are presented. Both automatic and human evaluations are conducted. The results on both evaluations show that paraphrasing improves matching and retrieval. An approach based on manually designed features extracted using NLP systems and resources is presented, where a Support Vector Machine (SVM) regression model is trained, which calculates the similarity between two segments. The approach based on manually designed features did not retrieve better matches than simple edit-distance. Two approaches for retrieving segments from a TM using deep learning are investigated. The first one is based on Long Short Term Memory (LSTM) networks, while the other one is based on Tree Structured Long Short Term Memory (Tree-LSTM) networks. Eight different models using different datasets and settings are trained. The results are comparable to a baseline which uses simple edit-distance.
Issue Date:
2016
URI:
http://hdl.handle.net/2436/620338
Type:
Thesis
Language:
en
Description:
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy
Appears in Collections:
E-Theses

Full metadata record

DC FieldValue Language
dc.contributor.authorGupta, Rohiten
dc.date.accessioned2017-01-17T16:22:03Z-
dc.date.available2017-01-17T16:22:03Z-
dc.date.issued2016-
dc.identifier.urihttp://hdl.handle.net/2436/620338-
dc.descriptionA thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophyen
dc.description.abstractCurrent Translation Memory (TM) tools lack semantic knowledge while matching. Most TM tools compute similarity at the string level, which does not take into account semantic aspects in matching. Therefore, semantically similar segments, which differ on the surface form, are often not retrieved. In this thesis, we present five novel and efficient approaches to incorporate advanced semantic knowledge in translation memory matching and retrieval. Two efficient approaches which use a paraphrase database to improve translation memory matching and retrieval are presented. Both automatic and human evaluations are conducted. The results on both evaluations show that paraphrasing improves matching and retrieval. An approach based on manually designed features extracted using NLP systems and resources is presented, where a Support Vector Machine (SVM) regression model is trained, which calculates the similarity between two segments. The approach based on manually designed features did not retrieve better matches than simple edit-distance. Two approaches for retrieving segments from a TM using deep learning are investigated. The first one is based on Long Short Term Memory (LSTM) networks, while the other one is based on Tree Structured Long Short Term Memory (Tree-LSTM) networks. Eight different models using different datasets and settings are trained. The results are comparable to a baseline which uses simple edit-distance.en
dc.language.isoenen
dc.titleUSE OF LANGUAGE TECHNOLOGY TO IMPROVE MATCHING AND RETRIEVAL IN TRANSLATION MEMORYen
dc.typeThesisen
All Items in WIRE are protected by copyright, with all rights reserved, unless otherwise indicated.