Modified mean and quantile regression models for citation analysis
AffiliationFaculty of Science and Engineering
MetadataShow full item record
AbstractModeling citation counts is an important subject because of its impact on providing a fair assessment for journals, researchers, articles, universities and countries. When fitting statistical models to citation count data it frequently occurs that the number of uncited articles (0s) differs from that expected under the best fitting model. This problem might be remedied by fitting a zero-modified, i.e. zero-inflated or a zero-deflated, distribution that allows the predicted number of zeros to more closely approximate the number of zeros in a dataset. Whilst previous scientometric studies have fitted zero-inflated distributions to citation count data, none have fitted zero-deflated or zero-modified distributions. In this thesis, a new procedure for fitting zero-modified models is proposed with its base distribution set separately as a discretised lognormal, hooked power-law or Weibull distribution, resulting in comprehensive statistical inferences including estimates, confidence intervals and p-values for all parameters of the models, including the zeromodification parameter. The procedure enables us to estimate both positive and negative zero-modification parameters corresponding to zero-inflation and zero-deflation (fewer uncited articles than expected), respectively. Based on real citation count datasets, it is shown that zero-modification can change by altering the base distribution. In addition, it is illustrated that the nature of the distribution of the observed citation counts is an important indicator for determining the distribution that best fits. We also focus on quantile regression (QR) for citation analysis. Unlike linear regression, where only the conditional mean of a dependent variable is modeled, in QR the different conditional quantiles of the dependent variable, such as the median, are modeled based on a set of independent variables, presenting a deep description of the relationship between independent variables and a dependent variable. It is a useful technique for analysing the entire citation count distribution. In this thesis we address two challenges for the analysis of citation counts by QR: discontinuity and substantial mass points at lower counts, such as zero, one, two, and three. To address these challenges, an update of the Bayesian two-part hurdle QR introduced by King and Song (2019a) is proposed. The original Bayesian two-part QR with hurdle at zero was introduced for count data with a mass point at zero. For citation counts, there are also substantial mass points at one, two, and three, which influence the estimates of the model parameters. In our new update of the model, the hurdle point is shifted forward to minimise the effect of the mass points on the estimation of the model, resulting in more precise estimates for the QR part of the model. The new model enables analyses of the citation counts of low cited articles simultaneously but separately from those of the moderately and highly cited articles.
CitationShahmandi Hounejani, M. (2021) Modified mean and quantile regression models for citation analysis. University of Wolverhampton. http://hdl.handle.net/2436/624970
PublisherUniversity of Wolverhampton
TypeThesis or dissertation
DescriptionA thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.
The following licence applies to the copyright and re-use of this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International