Show simple item record

dc.contributor.authorSivakumar, Jasivan
dc.contributor.authorMuga, Jake
dc.contributor.authorSpadavecchia, Flavio
dc.contributor.authorWhite, Daniel
dc.contributor.authorCan Buglalilar, Burcu
dc.date.accessioned2021-11-08T11:38:54Z
dc.date.available2021-11-08T11:38:54Z
dc.date.issued2022-01-20
dc.identifier.citationSivakumar, J., Muga, J., Spadavecchia, F., White, D. and Can, B. (2022) A GRU-based pipeline approach for word-sentence segmentation and punctuation restoration in English. 2021 International Conference on Asian Language Processing (IALP), pp.268-273.en
dc.identifier.isbn9781665483117
dc.identifier.doi10.1109/IALP54817.2021.9675269
dc.identifier.urihttp://hdl.handle.net/2436/624438
dc.descriptionThis is an accepted manuscript of an article published by IEEE in Proceedings of 2021 International Conference on Asian Language Processing (IALP) on 20 Jan 2022. Available online at https://doi.org/10.1109/IALP54817.2021.9675269 The accepted version of the publication may differ from the final published version.en
dc.description.abstractIn this study, we propose a Gated Recurrent Unit (GRU) model to restore the following features: word and sentence boundaries, periods, commas, and capitalisation for unformatted English text. We approach feature restoration as a binary classification task where the model learns to predict whether a feature should be restored or not. A pipeline approach is proposed, in which only one feature (word boundary, sentence boundary, punctuation, capitalisation) is restored in each component of the pipeline model. To optimise the model, we conducted a grid search on the parameters. The effect of changing the order of the pipeline is also investigated experimentally; PERIODS > COMMAS > SPACES > CASING yielded the best result. Our findings highlight several specifcaction points with optimisation potential to be targeted in follow-up research.en
dc.formatapplication/pdfen
dc.language.isoenen
dc.publisherIEEEen
dc.relation.urlhttps://ieeexplore.ieee.org/abstract/document/9675269en
dc.subjectdeep learningen
dc.subjectgraph neural networksen
dc.subjectsematicsen
dc.titleA GRU-based pipeline approach for word-sentence segmentation and punctuation restoration in Englishen
dc.typeConference contributionen
dc.date.updated2021-11-08T10:20:27Z
dc.conference.name2021 International Conference on Asian Language Processing
dc.conference.locationSingapore
pubs.finish-date2021-12-13
pubs.start-date2021-12-11
dc.date.accepted2021-08-31
rioxxterms.funderUniversity of Wolverhamptonen
rioxxterms.identifier.projectUOW08112021BCen
rioxxterms.versionAMen
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/en
rioxxterms.licenseref.startdate2022-01-20en
refterms.dateFCD2021-11-08T11:38:17Z
refterms.versionFCDAM
refterms.dateFOA2022-01-20T00:00:00Z


Files in this item

Thumbnail
Name:
Sivakumar_et_al_GRU-based_pipe ...
Size:
2.045Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

https://creativecommons.org/licenses/by-nc-nd/4.0/
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/