Graph-based approaches for semi-supervised and cross-domain sentiment analysis

2.33
Hdl Handle:
http://hdl.handle.net/2436/323990
Title:
Graph-based approaches for semi-supervised and cross-domain sentiment analysis
Authors:
Ponomareva, Natalia
Abstract:
The rapid development of Internet technologies has resulted in a sharp increase in the number of Internet users who create content online. Usergenerated content often represents people's opinions, thoughts, speculations and sentiments and is a valuable source of information for companies, organisations and individual users. This has led to the emergence of the eld of sentiment analysis, which deals with the automatic extraction and classi cation of sentiments expressed in texts. Sentiment analysis has been intensively researched over the last ten years, but there are still many issues to be addressed. One of the main problems is the lack of labelled data necessary to carry out precise supervised sentiment classi cation. In response, research has moved towards developing semi-supervised and crossdomain techniques. Semi-supervised approaches still need some labelled data and their e ectiveness is largely determined by the amount of these data, whereas cross-domain approaches usually perform poorly if training data are very di erent from test data. The majority of research on sentiment classi cation deals with the binary classi cation problem, although for many practical applications this rather coarse sentiment scale is not su cient. Therefore, it is crucial to design methods which are able to perform accurate multiclass sentiment classi cation. iii The aims of this thesis are to address the problem of limited availability of data in sentiment analysis and to advance research in semi-supervised and cross-domain approaches for sentiment classi cation, considering both binary and multiclass sentiment scales. We adopt graph-based learning as our main method and explore the most popular and widely used graph-based algorithm, label propagation. We investigate various ways of designing sentiment graphs and propose a new similarity measure which is unsupervised, easy to compute, does not require deep linguistic analysis and, most importantly, provides a good estimate for sentiment similarity as proved by intrinsic and extrinsic evaluations. The main contribution of this thesis is the development and evaluation of a graph-based sentiment analysis system that a) can cope with the challenges of limited data availability by using semi-supervised and crossdomain approaches b) is able to perform multiclass classi cation and c) achieves highly accurate results which are superior to those of most stateof- the-art semi-supervised and cross-domain systems. We systematically analyse and compare semi-supervised and cross-domain approaches in the graph-based framework and propose recommendations for selecting the most pertinent learning approach given the data available. Our recommendations are based on two domain characteristics, domain similarity and domain complexity, which were shown to have a signi cant impact on semi-supervised and cross-domain performance.
Advisors:
Thelwall, Mike
Publisher:
University of Wolverhampton
Issue Date:
2014
URI:
http://hdl.handle.net/2436/323990
Type:
Thesis or dissertation
Language:
en
Description:
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy
Appears in Collections:
E-Theses

Full metadata record

DC FieldValue Language
dc.contributor.advisorThelwall, Mikeen_GB
dc.contributor.authorPonomareva, Nataliaen_GB
dc.date.accessioned2014-07-30T13:14:35Zen
dc.date.available2014-07-30T13:14:35Zen
dc.date.issued2014en
dc.identifier.urihttp://hdl.handle.net/2436/323990en
dc.descriptionA thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophyen_GB
dc.description.abstractThe rapid development of Internet technologies has resulted in a sharp increase in the number of Internet users who create content online. Usergenerated content often represents people's opinions, thoughts, speculations and sentiments and is a valuable source of information for companies, organisations and individual users. This has led to the emergence of the eld of sentiment analysis, which deals with the automatic extraction and classi cation of sentiments expressed in texts. Sentiment analysis has been intensively researched over the last ten years, but there are still many issues to be addressed. One of the main problems is the lack of labelled data necessary to carry out precise supervised sentiment classi cation. In response, research has moved towards developing semi-supervised and crossdomain techniques. Semi-supervised approaches still need some labelled data and their e ectiveness is largely determined by the amount of these data, whereas cross-domain approaches usually perform poorly if training data are very di erent from test data. The majority of research on sentiment classi cation deals with the binary classi cation problem, although for many practical applications this rather coarse sentiment scale is not su cient. Therefore, it is crucial to design methods which are able to perform accurate multiclass sentiment classi cation. iii The aims of this thesis are to address the problem of limited availability of data in sentiment analysis and to advance research in semi-supervised and cross-domain approaches for sentiment classi cation, considering both binary and multiclass sentiment scales. We adopt graph-based learning as our main method and explore the most popular and widely used graph-based algorithm, label propagation. We investigate various ways of designing sentiment graphs and propose a new similarity measure which is unsupervised, easy to compute, does not require deep linguistic analysis and, most importantly, provides a good estimate for sentiment similarity as proved by intrinsic and extrinsic evaluations. The main contribution of this thesis is the development and evaluation of a graph-based sentiment analysis system that a) can cope with the challenges of limited data availability by using semi-supervised and crossdomain approaches b) is able to perform multiclass classi cation and c) achieves highly accurate results which are superior to those of most stateof- the-art semi-supervised and cross-domain systems. We systematically analyse and compare semi-supervised and cross-domain approaches in the graph-based framework and propose recommendations for selecting the most pertinent learning approach given the data available. Our recommendations are based on two domain characteristics, domain similarity and domain complexity, which were shown to have a signi cant impact on semi-supervised and cross-domain performance.en_GB
dc.language.isoenen
dc.publisherUniversity of Wolverhamptonen
dc.subjectsentiment analysisen_GB
dc.subjectsemi-supervised learningen_GB
dc.subjectcross-domain learningen_GB
dc.subjectgraph-based learningen_GB
dc.titleGraph-based approaches for semi-supervised and cross-domain sentiment analysisen_GB
dc.typeThesis or dissertationen
dc.type.qualificationnamePhDen
dc.type.qualificationlevelDoctoralen
All Items in WIRE are protected by copyright, with all rights reserved, unless otherwise indicated.