Show simple item record

dc.contributor.authorÖztürk, Burak
dc.contributor.authorCan, Burcu
dc.date.accessioned2020-10-12T11:50:18Z
dc.date.available2020-10-12T11:50:18Z
dc.date.issued2019-03-22
dc.identifier.citationÖztürk, M. and Can, B. (2019) Turkish lexicon expansion by using finite state automata, Turkish Journal of Electrical Engineering & Computer Sciences, 27, pp. 1012–1027.en
dc.identifier.issn1300-0632en
dc.identifier.doi10.3906/elk-1804-10en
dc.identifier.urihttp://hdl.handle.net/2436/623708
dc.description© 2019 The Authors. Published by The Scientific and Technological Research Council of Turkey. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://journals.tubitak.gov.tr/elektrik/issues/elk-19-27-2/elk-27-2-25-1804-10.pdfen
dc.description.abstractTurkish is an agglutinative language with rich morphology. A Turkish verb can have thousands of different word forms. Therefore, sparsity becomes an issue in many Turkish natural language processing (NLP) applications. This article presents a model for Turkish lexicon expansion. We aimed to expand the lexicon by using a morphological segmentation system by reversing the segmentation task into a generation task. Our model uses finite-state automata (FSA) to incorporate orthographic features and morphotactic rules. We extracted orthographic features by capturing phonological operations that are applied to words whenever a suffix is added. Each FSA state corresponds to either a stem or a suffix category. Stems are clustered based on their parts-of-speech (i.e. noun, verb, or adjective) and suffixes are clustered based on their allomorphic features. We generated approximately 1 million word forms by using only a few thousand Turkish stems with an accuracy of 82.36%, which will help to reduce the out-of-vocabulary size in other NLP applications. Although our experiments are performed on Turkish language, the same model is also applicable to other agglutinative languages such as Hungarian and Finnish.en
dc.formatapplication/pdfen
dc.languageen
dc.language.isoenen
dc.publisherScientific and Technological Research Council of Turkeyen
dc.relation.urlhttps://journals.tubitak.gov.tr/elektrik/issues/elk-19-27-2/elk-27-2-25-1804-10.pdfen
dc.subjectmorphologyen
dc.subjectlexicon expansionen
dc.subjectmorphological generationen
dc.subjectfinite-state automataen
dc.titleTurkish lexicon expansion by using finite state automataen
dc.typeJournal articleen
dc.identifier.eissn1303-6203
dc.identifier.journalTurkish Journal of Electrical Engineering & Computer Sciencesen
dc.date.updated2020-10-09T11:04:11Z
dc.date.accepted2018-12-10
rioxxterms.funderHacettepe University, Ankaraen
rioxxterms.identifier.projectUOW12102020BCen
rioxxterms.versionVoRen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0/en
rioxxterms.licenseref.startdate2020-10-12en
dc.source.volume27
dc.source.issue2
dc.source.beginpage1012
dc.source.endpage1027
dc.description.versionPublished version
refterms.dateFCD2020-10-12T11:49:02Z
refterms.versionFCDVoR
refterms.dateFOA2020-10-12T11:50:19Z


Files in this item

Thumbnail
Name:
elk-27-2-25-1804-10.pdf
Size:
234.8Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0/