Apertium
Use open-source OCR to convert open-source non-text news corpora to text. Evaluate an analyser's coverage on them.
Many languages that have online newspapers do not use actual text to store the news but instead use images or GIFs;:((( find a newspaper for a language that lacks news text online (eg. Marathi), check licenses, find an OCR tool and scrape a reasonably large corpus from the images if doing so would not violate CC/GPL. Evaluate the morphological analyser on it.
Task tags
Students who completed this task
Grzegorz Stark