Combining text and images for film age appropriateness classification
Name:
Publisher version
View Source
Access full-text PDFOpen Access
View Source
Check access options
Check access options
Abstract
We combine textual information from a corpus of film scripts and the images of important scenes from IMDB that correspond to these films to create a bimodal dataset (the dataset and scripts can be obtained from https://tinyurl.com/se9tlmr) for film age appropriateness classification with the objective of improving the prediction of age appropriateness for parents and children. We use state-of-the art Deep Learning image feature extraction, including DENSENet, ResNet, Inception, and NASNet. We have tested several Machine learning algorithms and have found xgboost to yield the best results. Previously reported classification accuracy, using only textual features, were 79.1% and 65.3% for American MPAA and British BBFC classification respectively. Using images alone, we achieve 64.8% and 56.7% classification accuracy. The most consistent combination of textual features and images’ features achieves 81.1% and 66.8%, both statistically significant improvements over the use of text only.Citation
Ha, L.A. and Mohamed, E. (2021) Combining text and images for film age appropriateness classification. Procedia Computer Science, 189, pp. 242-249.Publisher
ElsevierJournal
Procedia Computer ScienceAdditional Links
https://www.sciencedirect.com/science/article/pii/S1877050921012060Type
Conference contributionLanguage
enDescription
© 2021 The Authors. Published by Elsevier. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1016/j.procs.2021.05.087ISSN
1877-0509ae974a485f413a2113503eed53cd6c53
10.1016/j.procs.2021.05.087
Scopus Count
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/
Related items
Showing items related by title, author, creator and subject.
-
Feature weighting for cooccurrence-based classification of wordsPekar, Viktor; Krkoska, Michael; Staab, Steffen (ACM Digital Library, 2004)The paper comparatively studies methods of feature weighting in application to the task of cooccurrence-based classification of words according to their meaning. We explore parameter optimization of several weighting methods frequently used for similar problems such as text classification. We find that successful application of all the methods crucially depends on a number of parameters; only a carefully chosen weighting procedure allows to obtain consistent improvement on a classifier learned from non-weighted data.
-
Does grading method influence honours degree classification?Yorke, Mantz; Barnett, Greg; Bridges, Paul; Evanson, Peter; Haines, Chris; Jenkins, Don; Knight, Peter; Scurry, David; Stowell, Marie; Woolf, Harvey (Routledge (Taylor & Francis), 2002)Variation in mark-spread is very evident in degree classification data provided by the Higher Education Statistics Agency (HESA). Previous empirical investigations suggested that, at the level of the module, the spread of results might, in some subjects, be influenced by the method of grading (percentage marking or shorter grade-point scale). The availability of degree classification data from HESA made it possible to test whether the effect perceived at module level carried through to the honours degree classification. The empirically-generated hypothesis was that subjects characterised by a relatively narrow spread under percentage marking would show a wider spread when a grade-point scale of around 20 divisions was used, with the effect being detectable in honours degree classification data. The hypothesis was tested, using HESA data for academic years 1994-95 to 1998-99, on those new universities in England and Wales for which the existence of an institution-wide grading approach could be established. Tests were undertaken at the level of the HESA subject area, and at the more fine-grained level of the individual subject where numbers permitted. Results from the analyses are mixed. The analyses have probably been influenced by weaknesses in the way that HESA has collected award data, but nevertheless suggest lines for further inquiry into a matter that is of importance for equity within institutions (especially where modular schemes are being operated) and more broadly across the higher education sector.
-
Combining Multiple Corpora for Readability Assessment for People with Cognitive DisabilitiesYaneva, Victoria; Orăsan, Constantin; Evans, Richard; Rohanian, Omid (Association for Computational Linguistics, 2017-09-08)Given the lack of large user-evaluated corpora in disability-related NLP research (e.g. text simplification or readability assessment for people with cognitive disabilities), the question of choosing suitable training data for NLP models is not straightforward. The use of large generic corpora may be problematic because such data may not reflect the needs of the target population. At the same time, the available user-evaluated corpora are not large enough to be used as training data. In this paper we explore a third approach, in which a large generic corpus is combined with a smaller population-specific corpus to train a classifier which is evaluated using two sets of unseen user-evaluated data. One of these sets, the ASD Comprehension corpus, is developed for the purposes of this study and made freely available. We explore the effects of the size and type of the training data used on the performance of the classifiers, and the effects of the type of the unseen test datasets on the classification performance.