N gram language models

#N-gram Language Models

N-gram modelling is one of the basic concept in NLP. It is a type of probabilistic language model for predicting the next item in the sequence of words ( sentence ).
Some of the applications of this model are :

Context Sensitive Spelling Correction

For example, consider the following two sentences about five minutes from and about five minuets from .
Here, the words minutes and minuets have meaning and they are in my set of vocabulary. But the probability of minutes from is more than minuets from. This is called context sensitive spelling correction. Here it should be

P(about five minutes from) > P(about five minuets from)

Speech recognition

When you speak a sentence like I saw a van then,

P(I saw a van) > P(eyes awe of an)

There are many more applications in Machine transalation , predicting the completion of a sentence ( Predict text input system) etc.

Our goal is to compute the probability of sequence of words.

P(w) = p(w1,w2,w3,…,wn).

We may never see enough data for estimating these probability values. Therefore according to Markov’s assumption we use only upto k words in the sentence. According to Kth Order Markov Model and using Chain rule we derive at

P(w1,w2,w3....wn) ~ π P(wi | wi-k......wi-1)

A N-gram model uses only n-1 words of prior context.
An n-gram of size 1 is referred to as a “unigram”. Eg- P(office)
size 2 is a “bigram” (or, less commonly, a “digram”). Eg- P(office|from)
size 3 is a “trigram”. Eg-P(office| minutes from)
Larger sizes are sometimes referred to by the value of n in modern language, e.g., “four-gram”, “five-gram”, and so on.