Transforming the Data

1. This exercise aims to familiarize you with the feature selection algorithm(s) implemented in Weka. Load the labor.arff file into Weka.

(a) Using the preprocess tab, select 6 attributes arbitrarily (including the class label). Now, use this dataset and a classifier of your choice to be trained on this dataset. Report the attributes you selected and classification error using the 10 fold crossvalidation as evaluation method.

(b) Re-load the dataset to get all the 17 attributes again. We will now find the most informative features using the attribute selection algorithm in Weka as follows:

i. Click on "select attributes" tab and choose "infoGainAttributeEval" under "Attribute Evaluator". Note that "Ranker" search method will be chosen whenever you choose this attribute evaluator. Now click on start button. The output shows the attributes ranked by information gain. We now use this metric (information gain) to choose the 5 best attributes for our model building purpose.

ii. Now repeat part (a) but this time use the 5 best features from the previous step and make sure you use the same exact classifier you used in part (a). Report the attributes selected and the error rate. Did the classifier improve?