[AI][Medium] Automatic Text Scoring

CCExtractor Development

Suppose we have a few lines of captions from a news broadcast. When a human will read these captions, he can easily judge and comment upon the correctness of sentence in that language. The goal is to create an API or a simple computer program that can automatically predict the correctness(grammatically) of a sentence in the terms of some scores.

[Hint] Think in the terms of language modelling. You can create your own language model by training it on a corpus(a collection of text) and then use it to evaluate any given test sentence. Read some tutorial on language modelling and implement one.

You can also come up with your own idea. This would be really helpful for CCextractor to rate the translations performed by the CCtranslate or any other translator services used in future.

Task tags

python
natural language processing
machine learning

Students who completed this task

AbhayS, lyect, Iterator, Ivan Makarov

Task type

Code
Design
Outreach / Research