[AI][EASY] Chinese-English Parallel Corpora Collection
Corpora means a collection of text. A parallel corpus consists of a collection of texts which have been translated into one or more other language(s). For example Chinese-English parallel corpora means you will have two files, one will contain a text written in Chinese language and an another file containing the same text translated in English language.
[Problem Statement] Parallel corpora is needed for training Machine Translation systems. We need 100% accurate parallel corpora for Chinese-English language pair. Students need to search for any publicly available(open source) parallel data for Chinese-English, combine all individual parallel texts into two files, one for Chinese and the other for English. We expect the size of the corpus to be huge. Try to gather parallel data containing around 5 Million parallel sentences.