CCExtractor Development
[Winnerstrack]: Produce a "smart" multiple choice exam analyzer (II)
Following with this task:
https://codein.withgoogle.com/dashboard/tasks/6498594207563776/
In which you had to figure out how to correctly split a PDF into the questions (and we received a couple of reasonably good implementations), this new task requires you:
- Save each question into an individual file (you had to to this in the previous task, but let's formalize it)
- For each question, OCR the text the best you can. You can use an external library such as tesseract, or amazon's stuff which has things like borders.
- If a question has for example 4 possible answers (that's typically the case) try to extract each answer separately. This is useful for example in the case that we want to shuffle the answers to generate an exam that has the same answers but in a different order.
- Write generated PDFs to file. Write text to .json files and a sqlite3 database (so we can do queries later).
We consider this task to be hard (but challenging and fun).
You can find lots of different sample exams, for different subjects, here:
https://drive.google.com/file/d/1WULFj053Lm1_y6BTQVOyPdXeynNmOh14/view?usp=sharing
Task tags
Students who completed this task
knightron0, Musab Kılıç, RobOHt