Apertium

Function for calculating feature's probability to appear

As part of generating automatically Constraint Grammars we need to know the probabilities of each feature (e.g. verb (vblex), noun (n)) showing up. It's calculated simply by counting how many times the feature appears and then dividing that by the total number of features in the corpus (a text collection). Make a Python function that takes morphologically analyzed text corpus (in apertium stream format) and then returns the calculated feature probabilities in a dictionary data structure, for example {"n": 0.11, "vblex": 0.33, ...}, where 0.11 etc. are the probabilities (0.11 meaning 11%). Use streamparser for parsing the apertium stream format formatted text.

Task tags

  • constraint grammar
  • python
  • cg

Students who completed this task

nuboro

Task type

  • code Code
close

2016