Write a scraper for aligned daily content from wol.jw.org in two languages

Apertium

Write a scraper (preferably in python) that accepts two languages and a date range, and creates an aligned corpus (using TMX format) from content at wol.jw.org in those languages for those date ranges.

Here are some example pages in different languages for a given date (December 1, 2015): [http://wol.jw.org/en/wol/dt/r1/lp-e/2015/12/1] [http://wol.jw.org/kk-cyrl/wol/dt/r43/lp-az/2015/12/1] [http://wol.jw.org/ky/wol/dt/r51/lp-kz/2015/12/1] [http://wol.jw.org/khk/wol/dt/r159/lp-kha/2015/12/1] [http://wol.jw.org/tt/wol/dt/r100/lp-tat/2015/12/1]

For further information and guidance on this task, you are encouraged to come to our IRC channel.

Task tags

python
aligning
scraper

Students who completed this task

vigneshv

Task type

Code