Analysis of ID3

What is the number of questions you SAVE by knowing if x is in P or N?

I(P,N) = log₂|S| - (|P|/|S| log₂|P|) - (|N|/|S| log₂|N|)
or, equivalently,

I(%P, %N) = -(%P log₂%P) - (%N log₂%N)
where %P is the % of positive examples in S = |P|/|S| = p/(p+n) and
%N is the % of negative examples in S = |N|/|S| = n/(p+n).
I measures the information content in bits (i.e., number of yes/no questions that must be asked) associated with a set S of examples, which consists of the subset P of positive examples and subset N of negative examples.
Note: 0 ≤ I(P,N) ≤ 1, where 0 means there is no information, and 1 means there is maximum information.
Example: Perfect Balance (Maximum Disorder) in S:
Half the examples in S are positive and half are negative. Hence, %P = %N = 1/2. So,
```
I(1/2, 1/2)  =  -1/2 log₂ 1/2 - 1/2 log₂ 1/2

	     =  -1/2 log₂(2^-1) - 1/2 log₂(2^-1)

	     =  -1/2 (-1) - 1/2 (-1)

             =  1/2 + 1/2

             =  1  => information content is large
```
Since the information content is 1, that means all the information is still in the examples, so we have not reduced it at all by classifying the examples. The information gain is 1 - 1 = 0.