Compare Apertium with another MT system and improve it

Apertium

This tasks aims at improving an Apertium language pair when a web-accessible system exists for it in the 'net. Particularly good if the system is (approximately) rule-based such as Lucy (http://www.lucysoftware.com/english/machine-translation/lucy-lt-kwik-translator-/), Reverso (http://www.reverso.net/text_translation.aspx?lang=EN), Systran(http://www.systransoft.com/free-online-translation) or SDL Free Translation(http://www.freetranslation.com/): (1) Install the Apertium language pair, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁). (2) Collect a corpus of text (newspaper, wikipedia) Segment it in sentences (using e.g., libsegment-java or a similar processor and a SRX (https://en.wikipedia.org/wiki/Segmentation_Rules_eXchange) segmentation rule file borrowed from e.g. OmegaT) and put each sentence in a line. Run the corpus through Apertium and through the other system Select those sentences where both outputs are very similar (e.g, 90% coincident). Decide which one is better. If the other language is better than Apertium, think of what modification could be done for Apertium to produce the same output, and make 3 such modifications.
For further information and guidance on this task, you are encouraged to come to our IRC channel.

Students who completed this task

nuboro

Task type

Quality Assurance