Loading...
Large-scale data harvesting for biographical data
Plum, Alistair ; ; ; Wandl-Vogt, Eveline ; Mitkov, R
Plum, Alistair
Wandl-Vogt, Eveline
Mitkov, R
Editors
Other contributors
Affiliation
Epub Date
Issue Date
2019-09-05
Submitted date
Subjects
Alternative
Abstract
This paper explores automatic methods to identify relevant biography candidates in large databases, and extract biographical information from encyclopedia entries and databases. In this work, relevant candidates are defined as people who have made an impact in a certain country or region within a pre-defined time frame. We investigate the case of people who had an impact in the Republic of Austria and died between 1951 and 2019. We use Wikipedia and Wikidata as data sources and compare the performance of our information extraction methods on these two databases. We demonstrate the usefulness of a natural language processing pipeline to identify suitable biography candidates and, in a second stage, extract relevant information about them. Even though they are considered by many as an identical resource, our results show that the data from Wikipedia and Wikidata differs in some cases and they can be used in a complementary way providing more data for the compilation of biographies.
Citation
Plum, A., Zampieri, M., Orasan, C., , Wandl-Vogt, E. and Mitkov, R. (2019) Large-scale data harvesting for biographical data, Proceedings of the Third Conference on Biographical Data in a Digital World 2019, Varna, Bulgaria, 5-6 September, 2019.
Publisher
Journal
Research Unit
DOI
PubMed ID
PubMed Central ID
Embedded videos
Additional Links
Type
Conference contribution
Language
en
Description
Series/Report no.
ISSN
1613-0073