Apertium

Build a clean Kazakh--Russian sentence-aligned bilingual corpus for testing purposes using official information from Kazakh websites (minimum 50 bilingual sentences).

Download and align the Kazakh and Russian version of the same page, divide them in sentences, and build two plain files (eng.FILENAME.txt) and (rus.FILENAME.txt) with one sentence per line so that they correspond to each other.

Make sure they are not already part of OPUS (http://opus.lingfil.uu.se).

For further information and guidance on this task, you are encouraged to come to our IRC (http://wiki.apertium.org/wiki/IRC) channel.

Task tags

  • language_data
  • aligning
  • corpora
  • Kazakh
  • Russian

Students who completed this task

Darkgaia

Task type

  • assessment Outreach / Research
close

2015