Tuning language representation models for classification of Turkish news
MetadataShow full item record
AbstractPre-trained language representation models are very efficient in learning language representation independent from natural language processing tasks to be performed. The language representation models such as BERT and DistilBERT have achieved amazing results in many language understanding tasks. Studies on text classification problems in the literature are generally carried out for the English language. This study aims to classify the news in the Turkish language using pre-trained language representation models. In this study, we utilize BERT and DistilBERT by tuning both models for the text classification task to learn the categories of Turkish news with different tokenization methods. We provide a quantitative analysis of the performance of BERT and DistilBERT on the Turkish news dataset by comparing the models in terms of their representation capability in the text classification task. The highest performance is obtained with DistilBERT with an accuracy of 97.4%.
CitationTokgöz, M., Turhan, F., Bölücü, N. and Can, B. (2021) Tuning language representation models for classification of Turkish news. ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering, pp. 402–407. https://doi.org/10.1145/3459104.3459170
DescriptionThis is an accepted manuscript of a paper published by ACM in 2021 International Symposium on Electrical, Electronics and Information Engineering proceedings on 19/02/2021, available online: https://doi.org/10.1145/3459104.3459170 The accepted manuscript of the publication may differ from the final published version.
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/