Chapter 2 Association Analysis

Section 1 What is Association Analysis?

Page 3 Association Rule Mining

**The objectives of this section are**:

to introduce you to the concept of association analysis

to explain the basic problem that association rules present

to excite you to delve deeper into the world of data mining

**By the time you have completed this section you will be able to:****
**describe what association analysis is

define the key terms described in the problem definition

Copyright © Rakesh Verma 2009

The table below expounds on the beer and diapers scenario and introduces us to crucial parts of this topic. Each time a customer checks out items at a supermarket, a list is comprised of everything they bought and stored on a central system. The table is a collection of such data, commonly called market basket data.

In order to get a firm grip on association analysis one must** talk the talk**. Important terms used in association analysis that you should familiarize yourself with include:

**Itemset:** a collection of one or more items

**k-itemset:**an itemset that contains k items** **

for instance one such itemset in the above example is {diapers, milk}

it is a 2-itemset

**Support count (**s**)**: is the frequency of occurrence of an itemset

s({diapers, milk}) is 3

**Support:** is the ratio (or fraction) of the number of transactions that contain an itemset.

**Support: s(itemset) = support count of itemset/total number of transactions **

s({diapers, milk})= 3/5 = 0.6

**Confidence**: is the probability that itemset B will exist given itemset A exists in the transaction.

Itemset B is Coke, and Itemset A is {diapers, milk} so we want to find the probability that Coke exists in a transaction given that {diapers, milk} does. In order to find the confidence we need to divide the support count of the union of A and B by the support count of A.

**Confidence = support count of A U B/ support count of A.
** What does this mean, the union of A and B A U B is the itemset {diapers, milk, coke} and its corresponding support count is 2.

So the Confidence of {diapers, milk}→coke = 2/3 =0.667

{diapers, milk}→coke is one such example

for instance if the support threshold is 3, then a frequent itemset for this data can be

{diapers, milk} but it cannot be {diapers, coke} because it only appears twice.

for instance if the confidence threshold is 0.5

{diapers, milk}→coke is a strong association rule because its confidence is 0.67

Association Rule Mining uses these thresholds to reduce the time complexity of the computations and find strong association rules in the data set.

Association Rule Mining can be viewed as a two-step process:

**Frequent Itemset Generation**:- find all itemsets whose support is greater than or equal to the minimum support threshold.**Rule generation:**generate strong association rules from the frequent itemset whose confidence greater than or equal to minimum confidence threshold.

*The next section focuses on efficient techniques for generating frequent itemsets*