Information Gain Method for Selecting the Best Attribute
Use information theory to estimate the size of the subtrees rooted
at each child, for each possible attribute. That is,
try each attribute, evaluate and
pick the best one.
- How much (expected) work to guess if an element x in a set S of size |S|?
log2|S|
That is, at each step we can ask a yes/no question that
eliminates at most 1/2 of the elements remaining.
- Given S = P ∪ N, where P and N are two disjoint sets,
how hard is it to guess if an element x is in P or N?
if x ∈ P, then log2|P| = log2p questions needed, where p = |P|
if x ∈ N, then log2|N| = log2n questions needed, where n = |N|
So, the expected number of questions that have to be asked is:
(Pr(x ∈ P) * log2p) + (Pr(x ∈ N) * log2n)
or, equivalently,
(p/(p+n)) log2p + (n/(p+n)) log2n