In particular, reallocate the probability mass of n-grams that were seen once to n-grams that were never seen.
for each count c, an adjusted count \(c^* \) (Effective count) is computed as :
\(c^* = \dfrac{ {(c+1)} \times Nc} {N_{c+1}}\)
Where \(Nc\) is number of n-grams seen exactly c times.
Given words , We’ll see what other words it comes with.
It is used for the better estimate for probabilities of lower-order unigrams!
\(P_{continuation}(w)\) : “ How likely is word w to appear as the novel continuation “
Here we have,
\( P_{continuation} \propto |{ w_{i-1}: c(w_{i-1},w)>0}|\)
After Normalization we get,
\(P_{continuation}w(i) = \dfrac{ |{w\_{i-1} : c(w\_{i-1},w)>0}|} {|(w\_{j-1},w\_j):(w\_{j-1},w\_j)>0|}\)
therefore , we derive at Kneser-Ney smoothing:
\(P_{KN}(w\_i | w\_{i-1}) = \frac{ max(c(w\_{i-1},w\_i)-d,0)} {c(w\_{i-1})}+\Lambda(w\_{i-1}P\_{continutaion}(w\_i))\)
where \(\Lambda\) is normalizing constant and 0<d<1.