COMP3420 Tutorial 4, task 1 - Solution PC, 5 May 2011 ====================================== You should do the following manually - go through the process step by step with the students. Refer back to the corresponding lecture slides. TID | Item set Candidate set C1 Set L1 -------------------------- ---------------- ---------- 1 | ['a', 'c', 'd'] 'a' | 5 same as C1 2 | ['a', 'b', 'c'] 'b' | 5 except 'e' 3 | ['a', 'b', 'c', 'd'] 'c' | 7 removed 4 | ['a', 'b', 'c'] ==> 'd' | 2 5 | ['b', 'c', 'e'] 'e' | 1 <- Remove 6 | ['b', 'c'] 7 | ['a', 'c'] || \/ Length 2 frequent item-sets: Candidate set C2 (set L2) ---------------- ---------------- ['a', 'b'] | 3 ['a', 'b'] | 3 ['a', 'c'] | 5 ['a', 'c'] | 5 ['a', 'd'] | 2 ['a', 'd'] | 2 ['b', 'c'] | 5 ['b', 'c'] | 5 ['b', 'd'] | 1 <- Remove ['c', 'd'] | 2 <== ['c', 'd'] | 2 || \/ Candidate set C3 ---------------- ['a', 'b', 'c'] from ['a', 'b'] and ['a', 'c'] ['a', 'b', 'd'] from ['a', 'b'] and ['a', 'd'] ['a', 'c', 'd'] from ['a', 'c'] and ['a', 'd'] We can prune ['a', 'b', 'd'] because [ 'b', 'd'] is not a frequent item-set of length 2 (not in L2). Length 3 frequent item-sets (set L3): --------------------------- ['a', 'b', 'c'] | 3 <- Alphabetically sorted first item-set ['a', 'c', 'd'] | 2 Frequent rules of length 3 from first large 3-item set, and their support and confidence values as percentage numbers. Rule | Support | Confidence --------------------------------------------- ('a', 'b') -> c | 3/7 = 42.8% | 3/3 = 100.0% ('a', 'c') -> b | 3/7 = 42.8% | 3/5 = 60.0% ('b', 'c') -> a | 3/7 = 42.8% | 3/5 = 60.0% Formulas: support: s(x) = number of transactions containing x ----------------------------------- total number of transactions support: s(x->y) = number of transactions containing both x and y ---------------------------------------------- total number of transactions confidence(x->y) = support (x->y) / support (x)