Unsupervised morphological segmentation using neural word embeddings
AbstractWe present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network . We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.
CitationÜstün A., Can B. (2016) Unsupervised Morphological Segmentation Using Neural Word Embeddings. In: Král P., Martín-Vide C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_4
PublisherSpringer International Publishing
DescriptionThis is an accepted manuscript of an article published by Springer in Král P., Martín-Vide C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918 on 21/09/2016, available online: https://doi.org/10.1007/978-3-319-45925-7_4 The accepted version of the publication may differ from the final published version.
Series/Report no.Lecture Notes in Computer Science, vol 9918
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/