Loading...
Clustering word roots syntactically
Ozturk, Mustafa Burak ; Can, Burcu
Ozturk, Mustafa Burak
Can, Burcu
Authors
Editors
Other contributors
Affiliation
Epub Date
Issue Date
2016-06-23
Submitted date
Subjects
Files
Alternative
Sözcük Köklerinin Sözdizimsel Olarak Kümelenmesi
Abstract
Distributional representation of words is used for both syntactic and semantic tasks. In this paper two different methods are presented for clustering word roots. In the first method, the distributional model word2vec [1] is used for clustering word roots, whereas distributional approaches are generally used for words. For this purpose, the distributional similarities of roots are modeled and the roots are divided into syntactic categories (noun, verb etc.). In the other method, two different models are proposed: an information theoretical model and a probabilistic model. With a metric [8] based on mutual information and with another metric based on Jensen-Shannon divergence, similarities of word roots are calculated and clustering is performed using these metrics. Clustering word roots has a significant role in other natural language processing applications such as machine translation and question answering, and in other applications that include language generation. We obtained a purity of 0.92 from the obtained clusters.
Citation
Öztürk, M.B. and Can, B. (2016) Clustering word roots syntactically, 2016 24th Signal Processing and Communication Application Conference (SIU), 16-19 May 2016, Zonguldak, Turkey.
Publisher
Journal
Research Unit
PubMed ID
PubMed Central ID
Embedded videos
Additional Links
Type
Conference contribution
Language
other
Description
This is an accepted manuscript of an article published by IEEE in 2016 24th Signal Processing and Communication Application Conference (SIU) on 23/06/2016, available online: https://ieeexplore.ieee.org/document/7496026
The accepted version of the publication may differ from the final published version.
Series/Report no.
ISSN
2165-0608
EISSN
ISBN
9781509016792