Which is best attribute for the root node of decision tree?

Remainder(color) = 3/6 I(2/3,1/3) + 1/6 I(1/1,0/1) + 2/6 I(0/2,2/2)
                    |     |   |      |                |
                    |     |   |      1 of 6 is blue   2 of 6 are green
                    |     |   |
                    |     |   1 of the red is negative
                    |     |
                    |     2 of the red are positive
                    |
                    |
                    3 out of 6 are red

                 = 1/2 * (-2/3 log2 2/3  - 1/3 log2 1/3)
                     + 1/6 * (-1 log21 - 0 log20)
                     + 2/6 * (-0 log20 - 1 log21)

                 = 1/2 * (-2/3(log22 - log23) - 1/3(log21 - log23))
                     + 1/6 * 0
                     + 2/6 * 0

                 = 1/2 * (-2/3(1 - 1.58) - 1/3(0 - 1.58))

		 = 1/2 * 0.914

		 = 0.457

Gain(color) = I(3/6, 3/6) - Remainder(color)

            = 1.0 - 0.457

            = 0.543

Remainder(shape) = 4/6 I(2/4, 2/4) + 2/6 I(1/2, 1/2)
                    4 are square       2 are round

		 = 4/6 * (-.5 * log .5 -.5 * log .5) + 2/6 * (-.5 * log .5 - .5 * log .5)

                 = .667 * (.5 + .5) + .333 * (.5 + .5)

                 = .667 + .333

		 = 1.0

Gain(shape) = I(3/6, 3/6) - Remainder(shape)

	    = 1.0 - 1.0

	    = 0.0

Remainder(size)  = 4/6 I(3/4, 1/4) + 2/6 I(0/2, 2/2)
                     4 are big         2 are small

                 = .667 * (-.75 * log .75 - .25 * log .25) + .333 * (-0 * log 0 - 1 * log 1)

		 = 0.541

Gain(size)  = I(3/6, 3/6) - Remainder(size)

	    = 1.0 - 0.541

	    = 0.459

Max(.543, .086, .459) = .543, so color is best. Make the root node's attribute color and partition the examples for the resulting children nodes as shown:

Green has examples 4 and 6, which are both negative (-), so we make it a negative classification node and it has no child nodes (there is no further information to be extracted from it.) Similarly, blue has example 2, which is positive (+) so we make it a positive classification node and it has no children.

Red has examples 1, 3, and 5. Some of these are positive and some are negative, so we need to do further analysis of the Red case.