Putting the Models Together

We want P(word|signal) ~ P(word) * P(signal|word), given:

P(word_i|word_i-1): the language bigram model
P(phones|word): the word pronunciation HMM
P(signal|phone): the phone HMM

This gives us one big Hidden Markov model. It's a bigram model, in which each state is a word, and every word has transition arc to every other word. We replace word state with word pronunciation model, so that the states are now phones. Also, we replace each phone state with phone HMM, so that the states are now vector wuantization value distributions.