Unsupervised morphological segmentation using neural word embeddings
Abstract
We present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network [11]. We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.Citation
Üstün A., Can B. (2016) Unsupervised Morphological Segmentation Using Neural Word Embeddings. In: Král P., Martín-Vide C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_4Publisher
Springer International PublishingAdditional Links
https://link.springer.com/chapter/10.1007%2F978-3-319-45925-7_4Type
Conference contributionLanguage
enDescription
This is an accepted manuscript of an article published by Springer in Král P., Martín-Vide C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918 on 21/09/2016, available online: https://doi.org/10.1007/978-3-319-45925-7_4 The accepted version of the publication may differ from the final published version.Series/Report no.
Lecture Notes in Computer Science, vol 9918ISSN
0302-9743EISSN
1611-3349ISBN
9783319459240ae974a485f413a2113503eed53cd6c53
10.1007/978-3-319-45925-7_4
Scopus Count
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/