Loading...
Thumbnail Image
Item

Unsupervised morphological segmentation using neural word embeddings

Can, Burcu
Üstün, Ahmet
Alternative
Abstract
We present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network [11]. We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.
Citation
Üstün A., Can B. (2016) Unsupervised Morphological Segmentation Using Neural Word Embeddings. In: Král P., Martín-Vide C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_4
Journal
Research Unit
PubMed ID
PubMed Central ID
Embedded videos
Type
Conference contribution
Language
en
Description
This is an accepted manuscript of an article published by Springer in Král P., Martín-Vide C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918 on 21/09/2016, available online: https://doi.org/10.1007/978-3-319-45925-7_4 The accepted version of the publication may differ from the final published version.
Series/Report no.
Lecture Notes in Computer Science, vol 9918
ISSN
0302-9743
EISSN
1611-3349
ISBN
9783319459240
ISMN
Gov't Doc #
Sponsors
Rights
Research Projects
Organizational Units
Journal Issue
Embedded videos