Loading...
Neural text normalization for Turkish social media
Goker, Sinan ; Can, Burcu
Goker, Sinan
Can, Burcu
Authors
Editors
Other contributors
Affiliation
Epub Date
Issue Date
2018-12-10
Submitted date
Alternative
Abstract
Social media has become a rich data source for natural language processing tasks with its worldwide use; however, it is hard to process social media data due to its informal nature. Text normalization is the task of transforming the noisy text into its canonical form. It generally serves as a preprocessing task in other NLP tasks that are applied to noisy text. In this study, we apply two approaches for Turkish text normalization: Contextual Normalization approach using distributed representations of words and Sequence-to-Sequence Normalization approach using neural encoder-decoder models. As the approaches applied to Turkish and also other languages are mostly rule-based, additional rules are required to be added to the normalization model in order to detect new error patterns arising from the change of the language use in social media. In contrast to rule-based approaches, the proposed approaches provide the advantage of normalizing different error patterns that change over time by training with a new dataset and updating the normalization model. Therefore, the proposed methods provide a solution to language change dependency in social media by updating the normalization model without defining new rules.
Citation
Göker, S. and Can, B. (2018) Neural text normalization for Turkish social media, 2018 3rd International Conference on Computer Science and Engineering (UBMK), 20-23 September, 2018, Sarajevo, Bosnia-Herzegovina.
Publisher
Journal
Research Unit
PubMed ID
PubMed Central ID
Embedded videos
Additional Links
Type
Conference contribution
Language
en
Description
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference on Computer Science and Engineering (UBMK) on 10/12/2018, available online: https://ieeexplore.ieee.org/document/8566406
The accepted version of the publication may differ from the final published version.
Series/Report no.
ISSN
EISSN
ISBN
9781538678930