Classification
The file
vote.arff contains the
voting records of members of the U.S. House
of Representatives. Each example consists of the attributes indicating the
yes/no/absent vote for each bill considered by congress and the political
party of this representative. You will use this data to learn a decision
tree that predicts the political party of the representative based on
his/her vote.
- Use the voting data to train a decision tree to predict the political
party (Democrat or Republican) based on the voting record. Set the option
-U of J48 so that no pruning is performed. (Otherwise, the trees are pruned
by default.) Randomly select 25% of the members of congress for training,
and the rest for testing. In order to do this, you have to split the ARFF
file vote.arff into several ARFF files. Rerun this experiment several times
and you will notice the impact of different random splits of the data into
training and testing sets. Report the sizes and accuracies of the trees
from 5 runs.
- Measure the impact of training set size on the accuracy and the size of
the learned tree (no pruning, so use option -U and use 60% of the data for
testing.) Consider training set sizes in the range 0-40% (include at least
the values .02, .1, .2, .3 and .4 for the training fractions.) Because of
the high variance due to random splits, repeat the experiment with ten
different random seeds for each training set size, then report the mean,
maximum and minimum accuracies at each training set size. Turn in two
plots, showing how accuracy varies with training set size.