CCExtractor Development

[Potentially hard] Update Tesseract to 4.00

The current Windows build is not fully functional because the Tesseract 3.04 release dll is not present (we have the debug one). Compiling the dll has been a troublesome task for us and we have been having issues ever since we pushed the Tesseract version from 3.03 to 3.04.

On the other hand, Tesseract 4.00 was released a few weeks ago and the change file claims that accuracy has improved a lot. So it makes sense that we update.

We haven't looked into this so we don't know how hard it is. It might be a matter of replacing dependencies or it might require lots of code changes. The training files are likely to need replacement, too.

This may be hard - and of course we'll take that into consideration when it's time.

Task tags

  • ocr
  • upgrade
  • tesseract

Students who completed this task

Evgeny Shulgin

Task type

  • done_all Quality Assurance
close

2016