Apertium

Function for calculating word's probability to appear

As part of generating automatically Constraint Grammars we need to know the probabilities of each word showing up. It's calculated simply by counting how many times the word appears and then dividing that by the total number of words in the corpus (a text collection). Make a Python function that takes morphologically analyzed text corpus (in apertium stream format) and then returns the calculated word probabilities in a dictionary data structure, for example {"car": 0.11, "penguin" 0.33, ...}, where 0.11 etc. are the probabilities (0.11 meaning 11%). Use streamparser for parsing the apertium stream format formatted text. Don't use the surface forms of words calculating the probabilities but instead the non-inflected form gotten from the readings in the cohort.

Task tags

  • python
  • cg
  • constraing grammar

Students who completed this task

nuboro

Task type

  • code Code
close

2016