Definition modelling for English and Portuguese: a comparison between models and settings
AuthorsDimas Furtado, Anna Beatriz
MetadataShow full item record
AbstractDefinitions are key for many areas of knowledge; they convey meaning, refer to product conceptualization and naming, facilitate communication, provide clarity, and pervade all areas of human activity. Hence, having access to definitions is essential for many professions, but it is crucial for translation and interpreting. Definition Modelling (DM) is a task concerned with automatically generating definitions from embeddings. While most approaches to DM covers only English, this project aims at generating definitions for both Portuguese and English. DM is tackled with two deep-learning models as a sequence-to-sequence task in this research. Experiments are performed in three different settings - monolingual, cross-lingual, and multilingual based on various corpora and different embeddings. Given the lack of resources, the first dataset for Portuguese DM is developed. Both intrinsic and extrinsic evaluation is conducted. Results show that adopting the pre-trained MT5 model yield better results than non-pre-trained models for monolingual settings. Besides that, Flair-embeddings fare better than both character-based and transformer-based embeddings in non-pre-trained embedding. Human evaluation suggests that automatically generated glosses are useful for translators, although post-editing may be required to achieve optimal quality.
CitationDimas Furtado, A.B. (2022) Definition modelling for English and Portuguese: a comparison between models and settings. University of Wolverhampton. http://hdl.handle.net/2436/625071
PublisherUniversity of Wolverhampton
TypeThesis or dissertation
DescriptionA report submitted in partial fulfilment of the requirements for the Masters in Technology for Translation and Interpreting degree.
The following licence applies to the copyright and re-use of this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International