2021 talk:Submissions/Integrating Wikidata into the Wikimedia projects
WIKIMANIA 2021 SESSION NOTES
========================
editWelcome, Wikimania 2021 note-takers! Thank you so much for being an important part of Wikimania. Preferred number of note-takers: 2 (min) or 3 (max).
SESSION OVERVIEW
edit- Title
- Integrating Wikidata into the Wikimedia projects
- Day & time
- August 17, 2021, 14:30-15:15 UTC
- Session wikilink
- https://w.wiki/3uKb
- Speaker(s)
- Individual presentations (5 minutes each):
- User:Mike Peel (overview)
- User:Nirali Sahoo (synchronising Wikipedia and Wikidata)
- EricaAzzellini (Mbabel)
- Benoît Prieur (French Wikipedia and Wikivoyage)
- putnik (Russian Wikipedia and small projects)
- Notetakers
- ADD NOTETAKER(S) HERE
- Attendees (optional, self-reported)
- (41 participants in Remo as of 14:31, 82+hosts+1 tech support as of 17:45)
Andy Mabbett (User:Pigsonthewing)
User:Slashme
User:Dick Bos
User:coslouisiana
DarwIn
User:Ashlesh007
User:BugWarp
User:Robby
User:Anupamdutta73
User: Abhishek Sasmal
Amir Aharoni
ADD MORE ATTENDEES HERE
User: J. N. Squire
SESSION SUMMARY
editMike Peel
editWikidata is multi-purpose. Can be used to create various objects on various projects.
Versatile: can access it through various ways:
Statements
Lua calls
pre-existing lua modules - e.g. WikidataIB
Can reformat using convert etc.
Often seen as an "external wiki", which people don't like to leave their "current" wiki to visit. People do this with Commons, but somehow different with text and templates.
References on Wikidata are too sparse: lots of "imported from Wikipedia" refs, but not enough others.
Multilingualism isn't seen as a priority by many, but for example coverage of topics that aren't mainly English is much better on Wikidata because it's covered in other language wikis.
Vandalism is a risk, especially because the data flows downstream to other projects.
It's currently used for:
Infoboxes
Lists outside mainspace
Interwiki links
Authority control
Checking ID values [?]
Coordinates
Syncing data
The importance of Wikidata:
It can act as a source of information for all Wikimedia projects.
Collection of relevant information in an organised way. This eases access to information and helps us with importing the information in to infoboxes and other information displays.
Nirali Sahoo
editThe process of syncing Wikidata and WP is done with the help of bots.
Find articles that we ant to retrieve info from.
Develop scripts that will do the imports
These imports need to be monitored to ensure that wrong info doesn't make its way in.
Do this by discussing: review functionality and source code.
Do test edits by running the script on a small number of articles.
Then get the bot approved.
Then only do live edits on the whole list of articles. This can be done on batches of 100 or 200, or all at once.
The hardships
Knowing what to export. Wikipedia has millions of articles; choosing the relevant ones can be overwhelming given the huge amount.
Knowing how to export. This is related to the development of scripts. Every one of those millions of articles is structured differently, even in the same topic: e.g. birth-date vs date of birth. So writing scripts that cover all of the relevant structures can be difficult.
Érica Azzellini
editMbabel (https://meta.wikimedia.org/wiki/Mbabel )
Érica's poster from WikidataCon2019 about Mbabel: https://commons.wikimedia.org/wiki/File:Mbabel_tool_-_Automatic_articles_on_Wikipedia_with_Wikidata_WikidataCon_2019_poster.pdf
Tool developed to automatically create wikipedia stub articles from Wikidata
Started in 2018, developed by EricaAzzellin and Ederporto
Why?
Working on integrating WP and Wikidata
Want to automate complex or boring tasks
Give a resource to users, so that they can focus on creative tasks. For exmaple, every film article should include date of release, star, genre etc, which can be given in a structure, and then the authors can expand.
Structured narratives.
Development of human-understandable text from structured database.
Mbabel tool provides a narrative template which can be filled in with data from Wikidata.
How?
WikidataIB Module
Thematic structured narrative: for example articles on films, earthquakes, etc. have similar structures.
Tested the narratives a lot to generate natural-sounding text.
User presses a button to call the template and QID
Used it on lists of redlinks
Created articles with photography albums and narrative text for list of [what was this? Something from a GLAM project...]
Example of article on Brazilian election made with Mbabel. [1]
Good readable text with with key info on the election and a nice infobox.
Table of results from each round generated by the narrative template.
Final part of the article also auto-generated including sensible categories.
Challenges
Lots of work on code improvement, but need documentation improvement
Darwin: Mbabel also requires a huge load of work in Wikidata before using it to create new articles
Joalpe: ...once it is done articles may be easily replicated in different Wikipedias and Wikidata meaningful contributions are relevant per se
Benoît Prieur
editShowing differences between Wikipedia (frwiki) and Wikivoyage.
WikiVoyage use case was quite different, due to OSM etc.
Biography inbox on frwiki, generated by WikiData info
Using correct gender based on Wikidata info, but can cause controversy.
Article footer can be created with authority control, database ID...
Example article:
Anna Kiesenhofer: mathematician and cyclist. So two infoboxes, one for maths and one for sport.
At bottom of article, some research IDs and sports IDs from Wikidata.
Wikivoyage in fr:
A similar infobox: all data coming from Wikidata; all metadata around the location.
Another aspect is the data not form Wikidata but collected with wikidata
Example: see shape of administrative map of town. Drawn automatically from the OpenStreetMap relation of the object: this automatically gives the correct shape. Before that we needed to put each vertex of the polygon into Wikivoyage. So Wikidata is here the hub of various data sources.
Putnik|putnik
editAbout ruwiki and smaller languages from the region
Similar to the French Wikipedia, but with a different stack of models and templates
Infoboxes support Wikidata. In almost all of the articles we have some data from Wikidata.
We use complex logic to show [?]
Can show coordinates and maps like in Wikivoyage, awards, tables of population, charts, etc. etc.
Lots of small pieces of coe that show each property separately.
Tried to show the same in lead section with main facts and dates [missing something here]
Tried to generate stubs with a bit of text and an infobox and a short lead section.
Source templates that are a bit different than frwiki
Authority control and external links template.
Have a stack of models and templates that can be used in other wikis: ruwiki serves as a hub for about 20 smaller wikis.
Difference from ruwiki so that the smaller wikis in other languages in Russia really want to use data from wikidata: don't want to do all of this themselves.
The use the universal infobox model, similar to "databox" model. this can be used to generate infoboxes like in commons.
Also use source templates to import data effectively from ruwiki
Example: infobox in ruwiki. Can show awards as images..
Example from Bashkir wikipedia: shows every field that it can show from wikidata. Not as good as a bespoke infobox, but great for small wikis.
Generating sources: separate template which is very similar to CiteQ, adapted for ruwiki standards.
Authority control: split by topics: social networks, photo and video, thematic sites, etc.
Community objections: even with all these benefits, not all in the community want to see wikidata info
Hard to control Wikidata vandalism
External project that is hard to edit.
Discussion about disabling wikidata was held this month again, even after all these years.
Not as simple as it looks.
PRESENTATION / SLIDES / LINKS TO MATERIAL SHARED
editDraft slides at https://docs.google.com/presentation/d/1C_2R1hQefpjdbo5v3I6mRtJN4iuecTPSEy7E3rjmH2s
Final slides will be uploaded to Commons after the presentation.
URLs
editWikidata Infobox on Commons - https://commons.wikimedia.org/wiki/Template:Wikidata_Infobox
interactive sessions only: ACTION ITEMS / RECOMMENDATIONS / NEXT STEPS
editIronically, I learned that references on Wikipedia infoboxes break dpbedia import scripts - e.g. genre on: https://dbpedia.org/page/The_ArchAndroid not that I'm big on dbpedia for other reasons but it was interesting to figure it out
ptwiki does use wikidata for lists on mainspace
Related session from previous day:
https://wikimania.wikimedia.org/wiki/2021:Submissions/Automatically_maintained_citations_with_Wikidata_and_Cite_Q
Mbabel
Mbabel details - https://meta.wikimedia.org/wiki/Mbabel
Mbabel poster from Erica at WikiDatacon 2019 - https://commons.wikimedia.org/wiki/File:Mbabel_tool_-_Automatic_articles_on_Wikipedia_with_Wikidata_WikidataCon_2019_poster.pdf
one of the original uses for Mbabel (developed by User:Pharos) was for Met Museum art objects. Examples can be seen here: https://en.wikipedia.org/wiki/Wikipedia:GLAM/Metropolitan_Museum_of_Art/Artworks/Women_artists#painting
(Q&A only) QUESTIONS FOR THE SPEAKER
editWill there will be any way of using Wikidata structured data to create articles in Wikipedia (kind of like Wiki Abstract will do, but static), in a way that we don't need to actually save the code previously to using it, which makes it non functional?
Mbabel will do this - see Erica's talk.
No, it does not. Mbabel is not functional, it forces us to save a draft (always with a different name!) everytime and then arrange it aftwrwards. IMO, it's not really a solution to anything.
Listeria is a tool that makes list articles from Wikidata - https://en.wikipedia.org/wiki/User:ListeriaBot
Makes lists, not articles. The article part of the list is written by hand.
"Makes list articles"
No, it does not provide the header. Without that those lists would never be accepted as articles, in first place
Depends on the wiki.
Not on wiki.pt, for sure
Incnis: Did you develop Wikipedia-to-Wikidata software which looks into edit history to warn operator about possible vandalism, edit wars, and problems alike?
Nirali, is the work you presented also being applied to categories? Differences across Wikipedias on category structuring appear to be significant.
Yes, we can. (For example: my work involved the import of Soccerway IDs for players whose Wikipedia articles used the template but the IDs were missing in Wikidata - There was a category for that); other than that, we can also apply it to lists which are a part of categories as well. The structures are indeed different for different Wikis which is a challenge but the scripts are developed accordingly :)
For Érica: How long did it take you to get all that data into Wikidata before you could start autocreating articles on ptwiki?
-> answered on call
Liliana: I tried the Template:Wikidata Infobox template in this article, now do we use this template or do we use the person tab? i.e. https://es.wikipedia.org/wiki/Gonzalo_Mena_Tortajada what is your recommendation?
That was designed to pull all the information in. If you try to use it on enwiki, it will be removed, though. [missed a lot here.]
I use esWiki, can I use it this tool?
Ruth: this generated article is great. Are you flagging to suggest humans might engage with it in a way that fleshes it out or naturalizes language?
Tried to provide templates sounding as natural as possible; several kinds of information could be integrated, and then humans can easily build on it. Could provide the entry structure on a way that allows people to work on the more complex and human sections. Worked with students from journalism section of university, had never edited wikipedia before, and they were impressed with the initial information provided and found it easy to extend. Can be a great resource for beginners or experts.
Amir Aharoni: Question to Benoit or Sergey: If the functionality between French and Russian Wikipedias is similar, can you imagine a world in which you use the same stack of templates and modules, too? (Of course, the strings shown to users would be translated to French and Russian.)
There is a cultural dimension, and some of the metadata is different between the two. Not just a technical question.
Might be possible to create some basic framework that will work, but will need to be extended for various wikis. Maybe create an integration committee?
Thanks for the response! I love the "Integration committee" name :) --Amir
More on "Global Templates": https://wikimania.wikimedia.org/wiki/2021:Submissions/Global_templates,_component_templates
When will Cite Q be integrated into Citoid?
Was brought up in a previous session - https://wikimania.wikimedia.org/wiki/2021:Submissions/Automatically_maintained_citations_with_Wikidata_and_Cite_Q
The Russian citation template would be nice to implement into citoid: would like to hook up with devs
Andy Mabbett: This and Cite Q should be more closley integrated
Cite Q developers are happy to work with Citoid developers on this.
GreenReaper: Have these tools been tried with generic Wikibase, e.g. via WBStack or Docker? Interested in using them in a standalone wiki farm (including multiple languages that are not frequently edited).
Tools like WikidataIB should work fine with generic wikibases, since they don't assume anything more than the connection. I suspect the infoboxes etc. will struggle, though, as they tend to be coded to use specific properties, which won't exist on non-Wikidata installations.
Perhaps we can use federated properties for that. Especially the forthcoming (in-development) version that allows it to combine foreign and local properties. Thanks!
(Lots of thanks and claps in the Remo chat!)
Continue discussion in Remo: https://live.remo.co/e/wikimania-2021-building-6
Floor 9, table T9 - B
Notetaker's reminders:
Your job is to capture what happens in each session, to include:
agenda items
areas of agreement and disagreement
questions and thoughts
next steps or action items, and their owners (if appliccable - for example, in roundtables)
Ensure that your notes are clear and readable and contain details of the conversation and summaries
Put all direct quotes from speakers in "double-quotes", with (name) attribution, and don't use double-quotes for anything else.
Use square brackets to indicate where you are condensing a lot of material and skipping detail. This could be because the detail is not as valuable, or because you have fallen behind and want to catch up to potentially more valuable material, like a new topic.
If you miss a detail, use square brackets and a question mark to indicate uncertainty. [?]
E.g. Team agreed we should implement [?] in Haskell. We further discussed ways to "extend the outreach" (Alice), and to [synergize the streams?].
After Wikimania
After the session, do not forget to copy relevant notes to the relevant Wiki page