Average rating
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Star rating
Your vote was cast
Thank you for your feedback
Thank you for your feedback
Issue Date
2021-12-31
Metadata
Show full item recordAbstract
Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing (NLP), that assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective etc). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g. dependency parsing) and thereby extract the meaning of the sentence (e.g. semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.Citation
Bölücü, N. and Can, B. (in press) A cascaded unsupervised model for PoS tagging, ACM Transactions on Asian and Low-Resource Language Information Processing.Publisher
ACMJournal
ACM Transactions on Asian and Low-Resource Language Information ProcessingType
Journal articleLanguage
enDescription
This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing (in press). The accepted version of the publication may differ from the final published version.ISSN
2375-4699
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/