Loading...
Tuning language representation models for classification of Turkish news
Tokgöz, Meltem ; Turhan, Fatmanur ; Bölücü, Necva ; Can, Burcu
Tokgöz, Meltem
Turhan, Fatmanur
Bölücü, Necva
Can, Burcu
Editors
Other contributors
Affiliation
Epub Date
Issue Date
2021-02-19
Submitted date
Subjects
Alternative
Abstract
Pre-trained language representation models are very efficient in learning language representation independent from natural language processing tasks to be performed. The language representation models such as BERT and DistilBERT have achieved amazing results in many language understanding tasks. Studies on text classification problems in the literature are generally carried out for the English language. This study aims to classify the news in the Turkish language using pre-trained language representation models. In this study, we utilize BERT and DistilBERT by tuning both models for the text classification task to learn the categories of Turkish news with different tokenization methods. We provide a quantitative analysis of the performance of BERT and DistilBERT on the Turkish news dataset by comparing the models in terms of their representation capability in the text classification task. The highest performance is obtained with DistilBERT with an accuracy of 97.4%.
Citation
Tokgöz, M., Turhan, F., Bölücü, N. and Can, B. (2021) Tuning language representation models for classification of Turkish news. ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering, pp. 402–407. https://doi.org/10.1145/3459104.3459170
Publisher
Journal
Research Unit
PubMed ID
PubMed Central ID
Embedded videos
Additional Links
Type
Conference contribution
Language
en
Description
This is an accepted manuscript of a paper published by ACM in 2021 International Symposium on Electrical, Electronics and Information Engineering proceedings on 19/02/2021, available online: https://doi.org/10.1145/3459104.3459170 The accepted manuscript of the publication may differ from the final published version.
Series/Report no.
ISSN
EISSN
ISBN
9781450389839