Abstract
This paper explores automatic methods to identify relevant biography candidates in large databases, and extract biographical information from encyclopedia entries and databases. In this work, relevant candidates are defined as people who have made an impact in a certain country or region within a pre-defined time frame. We investigate the case of people who had an impact in the Republic of Austria and died between 1951 and 2019. We use Wikipedia and Wikidata as data sources and compare the performance of our information extraction methods on these two databases. We demonstrate the usefulness of a natural language processing pipeline to identify suitable biography candidates and, in a second stage, extract relevant information about them. Even though they are considered by many as an identical resource, our results show that the data from Wikipedia and Wikidata differs in some cases and they can be used in a complementary way providing more data for the compilation of biographies.Citation
Plum, A., Zampieri, M., Orasan, C., , Wandl-Vogt, E. and Mitkov, R. (2019) Large-scale data harvesting for biographical data, Proceedings of the Third Conference on Biographical Data in a Digital World 2019, Varna, Bulgaria, 5-6 September, 2019.Publisher
CEURAdditional Links
https://sites.google.com/view/bd2019/homeType
Conference contributionLanguage
enISSN
1613-0073
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/