Show simple item record

dc.contributor.authorÜstün, Ahmet
dc.contributor.authorCan, Burcu
dc.date.accessioned2020-09-08T13:02:45Z
dc.date.available2020-09-08T13:02:45Z
dc.date.issued2020-07-10
dc.identifier.citationÜstün, A., & Can, B. (2020). Incorporating word embeddings in unsupervised morphological segmentation. Natural Language Engineering, 1-21. doi:10.1017/S1351324920000406en
dc.identifier.issn1351-3249en
dc.identifier.doi10.1017/S1351324920000406en
dc.identifier.urihttp://hdl.handle.net/2436/623615
dc.descriptionThis is an accepted manuscript of an article published by Cambridge University Press in Natural Language Engineering on 10/07/2020, available online: https://doi.org/10.1017/S1351324920000406 The accepted version of the publication may differ from the final published version.en
dc.description.abstract© The Author(s), 2020. Published by Cambridge University Press. We investigate the usage of semantic information for morphological segmentation since words that are derived from each other will remain semantically related. We use mathematical models such as maximum likelihood estimate (MLE) and maximum a posteriori estimate (MAP) by incorporating semantic information obtained from dense word vector representations. Our approach does not require any annotated data which make it fully unsupervised and require only a small amount of raw data together with pretrained word embeddings for training purposes. The results show that using dense vector representations helps in morphological segmentation especially for low-resource languages. We present results for Turkish, English, and German. Our semantic MLE model outperforms other unsupervised models for Turkish language. Our proposed models could be also used for any other low-resource language with concatenative morphology.en
dc.description.sponsorshipThis research was supported by TUBITAK (The Scientific and Technological Research Council of Turkey) with grant number 115E464.en
dc.formatapplication/pdfen
dc.languageen
dc.language.isoenen
dc.publisherCambridge University Press (CUP)en
dc.relation.urlhttps://www.cambridge.org/core/journals/natural-language-engineering/article/incorporating-word-embeddings-in-unsupervised-morphological-segmentation/3737B441F4322D6A0CCD7F7D29B9D47Den
dc.subjectmorphological segmentationen
dc.subjectunsupervised learningen
dc.subjectBayesian learningen
dc.subjectlow-resource languageen
dc.titleIncorporating word embeddings in unsupervised morphological segmentationen
dc.typeJournal articleen
dc.identifier.eissn1469-8110
dc.identifier.journalNatural Language Engineeringen
dc.date.updated2020-08-26T08:24:54Z
dc.date.accepted2020-06-15
rioxxterms.funderTUBITAKen
rioxxterms.identifier.project115E464en
rioxxterms.versionAMen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/en
rioxxterms.licenseref.startdate2021-01-10en
dc.source.volume2020
dc.source.beginpage1
dc.source.endpage21
dc.description.versionPublished version
refterms.dateFCD2020-09-08T13:01:34Z
refterms.versionFCDAM
refterms.dateFOA2020-09-08T00:00:00Z


Files in this item

Thumbnail
Name:
Can_Incorporating_Neural_Word_ ...
Size:
911.8Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by-nc-nd/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/