The Language Model

The language model is embodied in P(words), and contains information about preferences for:

words in context: "ate the bun" > "ate the gun"
word ordering: "ate the bun" > "bun the ate"
The model assigns a probability to each possible string. There is an infinite number of possible strings. This is handled by using a Probabilistic Context Free Grammar (PCFG). This assigns a probability to each rewrite rule, but ignores context.
Given a string of words w1...wn,

P(w₁...w_n) = P(w₁) * P(w₂|w₁) * P(w₃|w₁,w₂) * ...
= P_i(i=1,n) P(w_i|w₁...w_i-1)
This is simplified using a bigram model, in which the probability of wi depends only on previous word w_i-1.

P(w₁...w_n) = P(w₁) * P(w₂|w₁) * P(w₃|w₂) * ...
= P_i(i=1,n) P(w_i|w_i-1)
We estimate P(w_i|w_i-1) based on the training corpus (the body of training examples):

P(w₂|w₁) = (#times w₂ follows w₁) / (#times w₁ occurs)
For instance a training corpus is:

A trigram model, in which P(w_i|w_i-1,w_i-2), is more difficult to estimate from the training corpus. The bigram and trigram models capture some local contextual information, such as subject-verb agreement. The unigram model just measures word frequencies.
We can use the weighted sum of trigram, bigram and unigram models:

P(w₁...w_n) = P_i(i=1,n) c1*P(w_i) + c2*P(w_i|w_i-1) + c3*P(w_i|w_i-1,w_i-2)
where c1 + c2 + c3 = 1