• Aggressive language identification using word embeddings and sentiment features

      Orasan, Constantin (Association for Computational Linguistics, 2018-06-25)
      This paper describes our participation in the First Shared Task on Aggression Identification. The method proposed relies on machine learning to identify social media texts which contain aggression. The main features employed by our method are information extracted from word embeddings and the output of a sentiment analyser. Several machine learning methods and different combinations of features were tried. The official submissions used Support Vector Machines and Random Forests. The official evaluation showed that for texts similar to the ones in the training dataset Random Forests work best, whilst for texts which are different SVMs are a better choice. The evaluation also showed that despite its simplicity the method performs well when compared with more elaborated methods.
    • RGCL at GermEval 2019: offensive language detection with deep learning

      Plum, A; Ranasinghe, Tharindu; Orasan, Constantin; Mitkov, R (German Society for Computational Linguistics & Language Technology, 2019-10-08)
      This paper describes the system submitted by the RGCL team to GermEval 2019 Shared Task 2: Identification of Offensive Language. We experimented with five different neural network architectures in order to classify Tweets in terms of offensive language. By means of comparative evaluation, we select the best performing for each of the three subtasks. Overall, we demonstrate that using only minimal preprocessing we are able to obtain competitive results.