Wikimedia

Set the number of observations for Wikipedia's "idwiki" database to 100000 (in our Edit Quality Prediction model)

This is a task related to Wikimedia's latest experiments using Artificial Intelligence to judge (score) the quality of human edits on wiki pages.

Requirements: You should have seen and used a little bit of Python, SQL and maybe a "Makefile" before (or be willing to learn a little bit about them yourself).

Your task is to scale up the number of observations for idwiki (that is the internal name for the database behind the Wikipedia in Indonesian language) to 100000, by providing a patch in GitHub.

For a previous example how to add new observations, see: https://github.com/wiki-ai/editquality/pull/48/files and understand a bit how things work.

In order to change it you need to change the query in Quarry (which allows running SQL queries against Wikipedia & other databases from your browser) at https://quarry.wmflabs.org/query/12494 and re-run the feature extraction, etc.

Important: You must use the linked Phabricator task for communication with your mentors as some of the task mentors are not registered here on the GCI website, so they will not see your comments here.

Task tags

  • python
  • sql
  • makefile
  • database

Students who completed this task

Phantom42

Task type

  • code Code
close

2016