Tuning language representation models for classification of Turkish news
Abstract
Pre-trained language representation models are very efficient in learning language representation independent from natural language processing tasks to be performed. The language representation models such as BERT and DistilBERT have achieved amazing results in many language understanding tasks. Studies on text classification problems in the literature are generally carried out for the English language. This study aims to classify the news in the Turkish language using pre-trained language representation models. In this study, we utilize BERT and DistilBERT by tuning both models for the text classification task to learn the categories of Turkish news with different tokenization methods. We provide a quantitative analysis of the performance of BERT and DistilBERT on the Turkish news dataset by comparing the models in terms of their representation capability in the text classification task. The highest performance is obtained with DistilBERT with an accuracy of 97.4%.Citation
Tokgöz, M., Turhan, F., Bölücü, N. and Can, B. (2021) Tuning language representation models for classification of Turkish news. ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering, pp. 402–407. https://doi.org/10.1145/3459104.3459170Publisher
ACMAdditional Links
https://dl.acm.org/doi/10.1145/3459104.3459170Type
Conference contributionLanguage
enDescription
This is an accepted manuscript of a paper published by ACM in 2021 International Symposium on Electrical, Electronics and Information Engineering proceedings on 19/02/2021, available online: https://doi.org/10.1145/3459104.3459170 The accepted manuscript of the publication may differ from the final published version.ISBN
9781450389839ae974a485f413a2113503eed53cd6c53
10.1145/3459104.3459170
Scopus Count
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/