Loading...
Thumbnail Image
Item

A cascaded unsupervised model for PoS tagging

Bölücü, Necva
Can, Burcu
Editors
Other contributors
Affiliation
Epub Date
Issue Date
2021-03-31
Submitted date
Alternative
Abstract
Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing (NLP), that assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective etc). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g. dependency parsing) and thereby extract the meaning of the sentence (e.g. semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.
Citation
Bölücü, N. and Can, B (2021) A Cascaded Unsupervised Model for PoS Tagging. ACM Transactions on Asian and Low-Resource Language Information Processing. 20(1), Article 17
Publisher
Research Unit
PubMed ID
PubMed Central ID
Embedded videos
Additional Links
Type
Journal article
Language
en
Description
This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in March 2021. The accepted version of the publication may differ from the final published version.
Series/Report no.
ISSN
2375-4699
EISSN
ISBN
ISMN
Gov't Doc #
Sponsors
Rights
Research Projects
Organizational Units
Journal Issue
Embedded videos