You are on page 1of 1

Adding new corpus can bring new changes to the dictionary of the model.

The effects can be as follows:


1. New words may appear in the corpus
2. Words of interest may not be under consideration by the model anymore
To mitigate these effects, following steps are being proposed.
Step 1 : Add new corpus to the existing one and create a dictionary
Step 2 : Compare the words and position(rank) in the original dictionary and the
new one.
Step 3 : Ensure words of importance in original dictionary have rank in new dict
ionary
less than threshold 'n_words'. Ensure that words of importance that may
not be
ranked highly have been adjusted in the new dictionary such that their
rank as well
below threshold 'n_words'.
Step 4 : In order to ensure that we are training incrementally, replace vectors
in new word embeddings
with vectors trained so far and ensure that they are aligned with posit
ion of words in
new dictionary.
Step 5 : Review the implementation of above steps and re-start training.

You might also like