CCExtractor Development

Hard: Produce a "smart" multiple choice exam analyzer

We have a bunch of scanned multiple choice exams, in PDF. Some contain drawings. We want to process these PDFs into a way that it automatically transform the PDF into separate questions. Where possible, we'd like to correctly separate the questions from each exam, both in graphics form, and in text form.

You can use any library you want or even the free tier of any cloud service. For example we tried Amazon's textract and it does a good job in the OCR (graphics to text conversion) but it doesn't help with actually separating each question, and the graphics are lost.

Don't worry about the meaning of the questions or the language (it's intentionally a hard example, in Spanish, and it's an automata course, so it has drawings).

https://drive.google.com/file/d/1nXEzreb3kQnamgadQAFe0ri8qRW94aCU/view?usp=sharing

Task tags

  • hard
  • winnerstrack

Students who completed this task

knightron0, Musab Kılıç, cppio, RobOHt, AlephZero

Task type

  • code Code
close

2019