2.50
Hdl Handle:
http://hdl.handle.net/2436/620690
Title:
Gender bias in machine learning for sentiment analysis
Authors:
Thelwall, Mike ( 0000-0001-6065-205X )
Abstract:
Purpose: This paper investigates whether machine learning induces gender biases in the sense of results that are more accurate for male authors than for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis. Design/methodology/approach: This article uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection. Findings: Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender datasets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders. Practical implications: End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with. Originality/value: This is the first demonstration of gender bias in machine learning sentiment analysis.
Publisher:
Emerald
Journal:
Gender bias in machine learning for sentiment analysis
Issue Date:
Dec-2017
URI:
http://hdl.handle.net/2436/620690
Additional Links:
http://www.emeraldinsight.com/loi/oir
Type:
Article
Language:
en
ISSN:
1468-4527
Appears in Collections:
Statistical Cybermetrics Research Group

Full metadata record

DC FieldValue Language
dc.contributor.authorThelwall, Mikeen
dc.date.accessioned2017-09-22T14:17:57Z-
dc.date.available2017-09-22T14:17:57Z-
dc.date.issued2017-12-
dc.identifier.issn1468-4527en
dc.identifier.urihttp://hdl.handle.net/2436/620690-
dc.description.abstractPurpose: This paper investigates whether machine learning induces gender biases in the sense of results that are more accurate for male authors than for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis. Design/methodology/approach: This article uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection. Findings: Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender datasets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders. Practical implications: End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with. Originality/value: This is the first demonstration of gender bias in machine learning sentiment analysis.en
dc.language.isoenen
dc.publisherEmeralden
dc.relation.urlhttp://www.emeraldinsight.com/loi/oiren
dc.subjectSentiment analysisen
dc.subjectopinion miningen
dc.subjectsocial mediaen
dc.subjectonline customer relations managementen
dc.titleGender bias in machine learning for sentiment analysisen
dc.typeArticleen
dc.identifier.journalGender bias in machine learning for sentiment analysisen
dc.date.accepted2017-09-
rioxxterms.funderInternalen
rioxxterms.identifier.projectUoW220917MTen
rioxxterms.versionAMen
rioxxterms.licenseref.urihttps://creativecommons.org/CC BY-NC-ND 4.0en
rioxxterms.licenseref.startdate2018-06-01en
This item is licensed under a Creative Commons License
Creative Commons
All Items in WIRE are protected by copyright, with all rights reserved, unless otherwise indicated.