Browsing Research Institute in Information and Language Processing by Subjects
Now showing items 1-4 of 4
Attention: there is an inconsistency between android permissions and application metadata!Since mobile applications make our lives easier, there is a large number of mobile applications customized for our needs in the application markets. While the application markets provide us a platform for downloading applications, it is also used by malware developers in order to distribute their malicious applications. In Android, permissions are used to prevent users from installing applications that might violate the users’ privacy by raising their awareness. From the privacy and security point of view, if the functionality of applications is given in sufficient detail in their descriptions, then the requirement of requested permissions could be well-understood. This is defined as description-to-permission fidelity in the literature. In this study, we propose two novel models that address the inconsistencies between the application descriptions and the requested permissions. The proposed models are based on the current state-of-art neural architectures called attention mechanisms. Here, we aim to find the permission statement words or sentences in app descriptions by using the attention mechanism along with recurrent neural networks. The lack of such permission statements in application descriptions creates a suspicion. Hence, the proposed approach could assist in static analysis techniques in order to find suspicious apps and to prioritize apps for more resource intensive analysis techniques. The experimental results show that the proposed approach achieves high accuracy.
Sarcasm target identification with LSTM networksGeçmi¸s yıllarda, kinayeli metinler üzerine yapılan çalı¸smalarda temel hedef metinlerin kinaye içerip içermediginin ˘ tespit edilmesiydi. Sosyal medya kullanımı ile birlikte siber zorbalıgın yaygınla¸sması, metinlerin sadece kinaye içerip içer- ˘ mediginin tespit edilmesinin yanısıra kinayeli metindeki hedefin ˘ belirlenmesini de gerekli kılmaya ba¸slamı¸stır. Bu çalı¸smada, kinayeli metinlerde hedef tespiti için bir derin ögrenme modeli ˘ kullanılarak hedef tespiti yapılmı¸s ve elde edilen sonuçlar literatürdeki ˙Ingilizce üzerine olan benzer çalı¸smalarla kıyaslanmı¸stır. Sonuçlar, önerdigimiz modelin kinaye hedef tespitinde benzer ˘ çalı¸smalara göre daha iyi çalı¸stıgını göstermektedir. The earlier work on sarcastic texts mainly concentrated on detecting the sarcasm on a given text. With the spread of cyber-bullying with the use of social media, it becomes also essential to identify the target of the sarcasm besides detecting the sarcasm. In this study, we propose a deep learning model for target identification on sarcastic texts and compare it with other work on English. The results show that our model outperforms the related work on sarcasm target identification.
Transfer learning for Turkish named entity recognition on noisy text© Cambridge University Press 2020. In this article, we investigate using deep neural networks with different word representation techniques for named entity recognition (NER) on Turkish noisy text. We argue that valuable latent features for NER can, in fact, be learned without using any hand-crafted features and/or domain-specific resources such as gazetteers and lexicons. In this regard, we utilize character-level, character n-gram-level, morpheme-level, and orthographic character-level word representations. Since noisy data with NER annotation are scarce for Turkish, we introduce a transfer learning model in order to learn infrequent entity types as an extension to the Bi-LSTM-CRF architecture by incorporating an additional conditional random field (CRF) layer that is trained on a larger (but formal) text and a noisy text simultaneously. This allows us to learn from both formal and informal/noisy text, thus improving the performance of our model further for rarely seen entity types. We experimented on Turkish as a morphologically rich language and English as a relatively morphologically poor language. We obtained an entity-level F1 score of 67.39% on Turkish noisy data and 45.30% on English noisy data, which outperforms the current state-of-art models on noisy text. The English scores are lower compared to Turkish scores because of the intense sparsity in the data introduced by the user writing styles. The results prove that using subword information significantly contributes to learning latent features for morphologically rich languages.
Using morpheme-level attention mechanism for Turkish sequence labellingWith deep learning being used in natural language processing problems, there have been serious improvements in the solution of many problems in this area. Sequence labeling is one of these problems. In this study, we examine the effects of character, morpheme, and word representations on sequence labelling problems by proposing a model for the Turkish language by using deep neural networks. Modeling the word as a whole in agglutinative languages such as Turkish causes sparsity problem. Therefore, rather than handling the word as a whole, expressing a word through its characters or considering the morpheme and morpheme label information gives more detailed information about the word and mitigates the sparsity problem. In this study, we applied the existing deep learning models using different word or sub-word representations for Named Entity Recognition (NER) and Part-of-Speech Tagging (POS Tagging) in Turkish. The results show that using morpheme information of words improves the Turkish sequence labelling.