Clustering word roots syntactically

Ozturk, Mustafa BurakCan, Burcu2020-10-232020-10-232016-06-23Öztürk, M.B. and Can, B. (2016) Clustering word roots syntactically, 2016 24th Signal Processing and Communication Application Conference (SIU), 16-19 May 2016, Zonguldak, Turkey.97815090167922165-060810.1109/siu.2016.7496026http://hdl.handle.net/2436/623733This is an accepted manuscript of an article published by IEEE in 2016 24th Signal Processing and Communication Application Conference (SIU) on 23/06/2016, available online: https://ieeexplore.ieee.org/document/7496026 The accepted version of the publication may differ from the final published version.Distributional representation of words is used for both syntactic and semantic tasks. In this paper two different methods are presented for clustering word roots. In the first method, the distributional model word2vec [1] is used for clustering word roots, whereas distributional approaches are generally used for words. For this purpose, the distributional similarities of roots are modeled and the roots are divided into syntactic categories (noun, verb etc.). In the other method, two different models are proposed: an information theoretical model and a probabilistic model. With a metric [8] based on mutual information and with another metric based on Jensen-Shannon divergence, similarities of word roots are calculated and clustering is performed using these metrics. Clustering word roots has a significant role in other natural language processing applications such as machine translation and question answering, and in other applications that include language generation. We obtained a purity of 0.92 from the obtained clusters.application/pdfotherTurkish syntaxclusteringmorphologyClustering word roots syntacticallySözcük Köklerinin Sözdizimsel Olarak KümelenmesiConference contribution2020-10-09