Association Rules

Consider the dataset below, adapted from G. F. Luger and W. A. Stubblefield "Artificial Intelligence Structures and Strategies for Complex Problem Solving" Third edition Addison-Wesley, 1998.

@attribute credit_history {bad,unknown,good}
@attribute debt {low,high}
@attribute collateral {none,adequate}
@attribute income {0-15,15-35,>35}
credit_history (CH)	debt (D)	collateral (CO)	income (I)
	bad		low		none		0-15
	unknown		high		none		15-35
	unknown		low		none		15-35
	bad		low		none		0-15
	unknown		low		adequate	>35
	unknown		low		none		>35
	unknown		high		none		0-15
	bad		low		adequate	>35
	good		low		none		>35
	good		high		adequate	>35
	good		high		none		0-15
	good		high		none		15-35
	good		high		none		>35
	bad		high		none		15-35

Faithfully follow the Apriori algorithm with minimal support = 14% (that is, minimum support count = 2 data instances) and minimal confidence 90%. [Note that the dataset above contains repeated instances. Consider them as different transactions containing the same items. Hence, each of the repeated transactions/instances contributes towards the support of the itemsets that contain them.]

1. [70 points] Generate all the frequent itemsets by hand, level by level. Do it exactly as the Apriori algorithm would. When constructing level k+1 from level k, use the join condition to generate only those candidate itemsets that are potentially frequent, and use the prune condition to remove those candidate itemsets that won't be frequent because at least one of their subsets is not frequent. Mark with an "X" those itemsets removed by the prune condition, and don't count their support in the dataset. SHOW ALL THE DETAILS OF YOUR WORK.

2. [30 points] In this part, you will generate association rules with minimum confidence 90%. To save time, you don't have to generate all associations rules from all the frequent itemsets. Instead, select the largest itemset (i.e., the itemset with most items) that you generated in the previous part of this problem, and use it to generate all association rules that can be produced from it (i.e., association rules with 2, or with 3, or with 4, ... items). For each such rule, calculate its confidence (show the details), and mark those rules that have confidence greater than or equal to 90%. SHOW ALL THE DETAILS OF YOUR WORK.