Characters or morphemes: how to represent words?
dc.contributor.author | Üstün, Ahmet | |
dc.contributor.author | Kurfalı, Murathan | |
dc.contributor.author | Can, Burcu | |
dc.date.accessioned | 2020-09-03T10:18:12Z | |
dc.date.available | 2020-09-03T10:18:12Z | |
dc.date.issued | 2018 | |
dc.identifier.citation | Üstün, A., Kurfalı, M. and Can, B. (2018) Characters or morphemes: how to represent words? In, Proceedings of The Third Workshop on Representation Learning for NLP, Augenstein, I., Cao, K., He, H., Hill, F. et al. Stroudsburg, PA: Association for Computational Linguistics, pp. 144-153. | en |
dc.identifier.isbn | 9781948087438 | en |
dc.identifier.doi | 10.18653/v1/w18-3019 | en |
dc.identifier.uri | http://hdl.handle.net/2436/623576 | |
dc.description | © 2018 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W18-3019 | en |
dc.description.abstract | In this paper, we investigate the effects of using subword information in representation learning. We argue that using syntactic subword units effects the quality of the word representations positively. We introduce a morpheme-based model and compare it against to word-based, character-based, and character n-gram level models. Our model takes a list of candidate segmentations of a word and learns the representation of the word based on different segmentations that are weighted by an attention mechanism. We performed experiments on Turkish as a morphologically rich language and English with a comparably poorer morphology. The results show that morpheme-based models are better at learning word representations of morphologically complex languages compared to character-based and character n-gram level models since the morphemes help to incorporate more syntactic knowledge in learning, that makes morpheme-based models better at syntactic tasks. | en |
dc.description.sponsorship | This research was supported by TUBITAK (The Scientific and Technological Research Council of Turkey) grant number 115E464. | en |
dc.format | application/pdf | en |
dc.language.iso | en | en |
dc.publisher | Association for Computational Linguistics | en |
dc.relation.url | https://www.aclweb.org/anthology/W18-3019/ | en |
dc.title | Characters or morphemes: how to represent words? | en |
dc.type | Conference contribution | en |
dc.date.updated | 2020-08-26T08:20:54Z | |
dc.conference.name | Proceedings of The Third Workshop on Representation Learning for NLP | |
pubs.finish-date | 2018-07 | |
pubs.start-date | 2018-07 | |
rioxxterms.funder | TUBITAK | en |
rioxxterms.identifier.project | 115E464 | en |
rioxxterms.version | VoR | en |
rioxxterms.licenseref.uri | http://creativecommons.org/licenses/by/4.0/ | en |
rioxxterms.licenseref.startdate | 2020-09-03 | en |
dc.description.version | Published version | |
refterms.dateFCD | 2020-09-03T10:16:49Z | |
refterms.versionFCD | VoR | |
refterms.dateFOA | 2020-09-03T00:00:00Z |