2019:Technology outreach & innovation/Wikidata & ETL
This is an Accepted submission for the Technology space at Wikimania 2019. |
Description
editCurrently, Wikidata, or any other Wikibase instance, is being populated from external data sources mostly manually, by creating ad-hoc data transformation scripts. Usually, these scripts are run once, and that is it. Given the heterogeneity of the source data and languages used to transform them, this means the scripts are hard or impossible to maintain and unable to run periodically in an automated fashion to keep Wikidata up-to-date.
In this session, we would like to demonstrate our work-in-progress in our project utilizing LinkedPipes ETL - a tool for data transformation pipelines - to load data to Wikibases and Wikidata.
Slides
editRelationship to the theme
editThis session will address the conference theme — Wikimedia, Free Knowledge and the Sustainable Development Goals — in the following manner:
- Industry, innovation, and infrastructure: Enabling volunteers to better automate bulk loading of data into Wikidata using sharable data transformation pipelines helps to make these processes more unified and therefore sustainable.
Session outcomes
editAt the end of the session, the following will have been achieved:
- Our approach to bulk loading data to Wikidata using LinkedPipes ETL will be presented
- Feedback on the method will be gathered
- Interested attendees will have tried loading some data to a demo Wikibase instance
Session leader(s)
editContacts
edit- jakub@jakubklimek.com
Session type
editEach Space at Wikimania 2019 will have specific format requests. The program design prioritises submissions which are future-oriented and directly engage the audience. The format of this submission is a:
One of these options:
- Option 1: Presentation - 20 minutes
- Option 2: Roundtable workshop - 10 minutes of presenting, 30 to 60 minutes of hands-on for interested attendees
Requirements
editThe session will work best with these conditions:
- Room
- Classroom with a projector + screen
- Audience:
- Technically savvy (RDF, SPARQL) contributors to Wikidata or other Wikibases
- Recording:
- The presentation part can be recorded, the hands-on probably not