• Exploiting tweet sentiments in altmetrics large-scale data

      Hassan, Saeed-Ul; Aljohani, Naif Radi; Iqbal Tarar, Usman; Safder, Iqra; Sarwar, Raheem; Alelyani, Salem; Nawaz, Raheel (SAGE, 2022-12-31)
      This article aims to exploit social exchanges on scientific literature, specifically tweets, to analyse social media users' sentiments towards publications within a research field. First, we employ the SentiStrength tool, extended with newly created lexicon terms, to classify the sentiments of 6,482,260 tweets associated with 1,083,535 publications provided by Altmetric.com. Then, we propose harmonic means-based statistical measures to generate a specialized lexicon, using positive and negative sentiment scores and frequency metrics. Next, we adopt a novel article-level summarization approach to domain-level sentiment analysis to gauge the opinion of social media users on Twitter about the scientific literature. Last, we propose and employ an aspect-based analytical approach to mine users' expressions relating to various aspects of the article, such as tweets on its title, abstract, methodology, conclusion, or results section. We show that research communities exhibit dissimilar sentiments towards their respective fields. The analysis of the field-wise distribution of article aspects shows that in Medicine, Economics, Business & Decision Sciences, tweet aspects are focused on the results section. In contrast, Physics & Astronomy, Materials Sciences, and Computer Science these aspects are focused on the methodology section. Overall, the study helps us to understand the sentiments of online social exchanges of the scientific community on scientific literature. Specifically, such a fine-grained analysis may help research communities in improving their social media exchanges about the scientific articles to disseminate their scientific findings effectively and to further increase their societal impact.
    • Parsing AUC result-figures in machine learning specific scholarly documents for semantically-enriched summarization

      Safder, Iqra; Batool, Hafsa; Sarwar, Raheem; Zaman, Farooq; Aljohani, Naif Radi; Nawaz, Raheel; Gaber, Mohamed; Hassan, Saeed-Ul (Taylor & Francis, 2021-11-14)
      Machine learning specific scholarly full-text documents contain a number of result-figures expressing valuable data, including experimental results, evaluations, and cross-model comparisons. The scholarly search system often overlooks this vital information while indexing important terms using conventional text-based content extraction approaches. In this paper, we propose creating semantically enriched document summaries by extracting meaningful data from the results-figures specific to the evaluation metric of the area under the curve (AUC) and their associated captions from full-text documents. At first, classify the extracted figures and analyze them by parsing the figure text, legends, and data plots – using a convolutional neural network classification model with a pre-trained ResNet-50 on 1.2 million Images from ImageNet. Next, we extract information from the result figures specific to AUC by approximating the region under the function's graph as a trapezoid and calculating its area, i.e., the trapezoidal rule. Using over 12,000 figures extracted from 1000 scholarly documents, we show that figure specialized summaries contain more enriched terms about figure semantics. Furthermore, we empirically show that the trapezoidal rule can calculate the area under the curve by dividing the curve into multiple intervals. Finally, we measure the quality of specialized summaries using ROUGE, Edit distance, and Jaccard Similarity metrics. Overall, we observed that figure specialized summaries are more comprehensive and semantically enriched. The applications of our research are enormous, including improved document searching, figure searching, and figure focused plagiarism. The data and code used in this paper can be accessed at the following URL: https://github.com/slab-itu/fig-ir/.
    • Sentiment analysis for Urdu online reviews using deep learning models

      Safder, Iqra; Mehmood, Zainab; Sarwar, Raheem; Hassan, Saeed-Ul; Zaman, Farooq; Adeel Nawab, Rao Muhammad; Bukhari, Faisal; Ayaz Abbasi, Rabeeh; Alelyani, Salem; Radi Aljohani, Naif; et al. (Wiley, 2021-06-28)
      Most existing studies are focused on popular languages like English, Spanish, Chinese, Japanese, and others, however, limited attention has been paid to Urdu despite having more than 60 million native speakers. In this paper, we develop a deep learning model for the sentiments expressed in this under-resourced language. We develop an open-source corpus of 10,008 reviews from 566 online threads on the topics of sports, food, software, politics, and entertainment. The objectives of this work are bi-fold (1) the creation of a human-annotated corpus for the research of sentiment analysis in Urdu; and (2) measurement of up-to-date model performance using a corpus. For their assessment, we performed binary and ternary classification studies utilizing another model, namely LSTM, RCNN Rule-Based, N-gram, SVM, CNN, and LSTM. The RCNN model surpasses standard models with 84.98 % accuracy for binary classification and 68.56 % accuracy for ternary classification. To facilitate other researchers working in the same domain, we have open-sourced the corpus and code developed for this research.