Speech Recognition as Reasoning with Uncertainty

P(words|signal) = P(words) * P(signal|words) / P(signal)

The probabilities include the relative frequencies of the words, as well as other factors such as speaker dependencies.

P(signal) is considered to be a normalizing constant, and is ignored.

P(words) constitutes the language model, and includes preferences for particular combinations of words. For instance, "ate the gun" is less likely than "ate the bun".

P(signal|words) constitutes the acoustic model, and includes preferences for word/phone combinations. For instance, [t][ow][m][ey][t][ow] is the desired phone combination for tomato, not [aa].