Apertium

Add autocoverage support to begiak

Currently begiak, Apertium's IRC bot, has an .awikstats mode to update dictionary counts and the like. The goal of this task is to integrate in autocoverage.py to be used as a mode to .awikstats. This will allow users to get wikipedia coverage numbers updated on the Apertium wiki by begiak.

General things to consider:

  • How to handle languages that are specified but for which there aren't wikipedias.
  • How to handle the corpus name (we have been doing stuff like wp2015 to mean a wikipedia dump from 2015, but it might make more sense to just call it wp, and .
  • Automatically generate / update /Apertium-xyz/stats/average page?

The server begiak runs on has limited resources, so you need to consider the following:

  • How to handle huge wikipedia dumps (e.g., for English).
  • Limit the number of languages that can be done at one time (maybe only one?).

Task tags

  • python
  • begiak
  • coverage

Students who completed this task

Grzegorz Stark

Task type

  • code Code
close

2017