Classification

The file vote.arff contains the voting records of members of the U.S. House of Representatives. Each example consists of the attributes indicating the yes/no/absent vote for each bill considered by congress and the political party of this representative. You will use this data to learn a decision tree that predicts the political party of the representative based on his/her vote.

  1. Use the voting data to train a decision tree to predict the political party (Democrat or Republican) based on the voting record. Set the option -U of J48 so that no pruning is performed. (Otherwise, the trees are pruned by default.) Randomly select 25% of the members of congress for training, and the rest for testing. In order to do this, you have to split the ARFF file vote.arff into several ARFF files. Rerun this experiment several times and you will notice the impact of different random splits of the data into training and testing sets. Report the sizes and accuracies of the trees from 5 runs.
  2. Measure the impact of training set size on the accuracy and the size of the learned tree (no pruning, so use option -U and use 60% of the data for testing.) Consider training set sizes in the range 0-40% (include at least the values .02, .1, .2, .3 and .4 for the training fractions.) Because of the high variance due to random splits, repeat the experiment with ten different random seeds for each training set size, then report the mean, maximum and minimum accuracies at each training set size. Turn in two plots, showing how accuracy varies with training set size.