A GRU-based pipeline approach for word-sentence segmentation and punctuation restoration in English
dc.contributor.author | Sivakumar, Jasivan | |
dc.contributor.author | Muga, Jake | |
dc.contributor.author | Spadavecchia, Flavio | |
dc.contributor.author | White, Daniel | |
dc.contributor.author | Can Buglalilar, Burcu | |
dc.date.accessioned | 2021-11-08T11:38:54Z | |
dc.date.available | 2021-11-08T11:38:54Z | |
dc.date.issued | 2022-01-20 | |
dc.identifier.citation | Sivakumar, J., Muga, J., Spadavecchia, F., White, D. and Can, B. (2022) A GRU-based pipeline approach for word-sentence segmentation and punctuation restoration in English. 2021 International Conference on Asian Language Processing (IALP), pp.268-273. | en |
dc.identifier.isbn | 9781665483117 | |
dc.identifier.doi | 10.1109/IALP54817.2021.9675269 | |
dc.identifier.uri | http://hdl.handle.net/2436/624438 | |
dc.description | This is an accepted manuscript of an article published by IEEE in Proceedings of 2021 International Conference on Asian Language Processing (IALP) on 20 Jan 2022. Available online at https://doi.org/10.1109/IALP54817.2021.9675269 The accepted version of the publication may differ from the final published version. | en |
dc.description.abstract | In this study, we propose a Gated Recurrent Unit (GRU) model to restore the following features: word and sentence boundaries, periods, commas, and capitalisation for unformatted English text. We approach feature restoration as a binary classification task where the model learns to predict whether a feature should be restored or not. A pipeline approach is proposed, in which only one feature (word boundary, sentence boundary, punctuation, capitalisation) is restored in each component of the pipeline model. To optimise the model, we conducted a grid search on the parameters. The effect of changing the order of the pipeline is also investigated experimentally; PERIODS > COMMAS > SPACES > CASING yielded the best result. Our findings highlight several specifcaction points with optimisation potential to be targeted in follow-up research. | en |
dc.format | application/pdf | en |
dc.language.iso | en | en |
dc.publisher | IEEE | en |
dc.relation.url | https://ieeexplore.ieee.org/abstract/document/9675269 | en |
dc.subject | deep learning | en |
dc.subject | graph neural networks | en |
dc.subject | sematics | en |
dc.title | A GRU-based pipeline approach for word-sentence segmentation and punctuation restoration in English | en |
dc.type | Conference contribution | en |
dc.date.updated | 2021-11-08T10:20:27Z | |
dc.conference.name | 2021 International Conference on Asian Language Processing | |
dc.conference.location | Singapore | |
pubs.finish-date | 2021-12-13 | |
pubs.start-date | 2021-12-11 | |
dc.date.accepted | 2021-08-31 | |
rioxxterms.funder | University of Wolverhampton | en |
rioxxterms.identifier.project | UOW08112021BC | en |
rioxxterms.version | AM | en |
rioxxterms.licenseref.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | en |
rioxxterms.licenseref.startdate | 2022-01-20 | en |
refterms.dateFCD | 2021-11-08T11:38:17Z | |
refterms.versionFCD | AM | |
refterms.dateFOA | 2022-01-20T00:00:00Z |