
Build a clean Kazakh--English sentence-aligned bilingual corpus for testing purposes using official information from Kazakh websites (minimum 50 bilingual sentences).

Download and align the Kazakh and English version of the same page, divide them in sentences, and build two plain text files (eng.FILENAME.txt) and (kaz.FILENAME.txt) with one sentence per line so that they correspond to each other.

Make sure they are not already part of OPUS (http://opus.lingfil.uu.se).

For further information and guidance on this task, you are encouraged to come to our IRC channel.

Task tags

  • language_data
  • aligning
  • corpora
  • Kazakh
  • English

Students who completed this task

Darkgaia, Ariel Rakovitsky

Task type

  • assessment Outreach / Research