The original model considers all the words from the vocabulary V in making predictions. We think this is unnecessary, and only predict among entities which appear in the passage. Of these changes, only the first seems important; the other two just aim at keeping the model simple.