Chapter 2 Association Analysis
Section 1 What is Association Analysis?
Page 3 Association Rule Mining


The objectives of this section are:
to introduce you to the concept of association analysis
to explain the basic problem that association rules present
to excite you to delve deeper into the world of data mining


By the time you have completed this section you will be able to:
describe what association analysis is
define the key terms described in the problem definition

What is Association Rule Mining

Talk the Talk

The table below expounds on the beer and diapers scenario and introduces us to crucial parts of this topic. Each time a customer checks out items at a supermarket, a list is comprised of everything they bought and stored on a central system. The table is a collection of such data, commonly called market basket data.

In order to get a firm grip on association analysis one must talk the talk. Important terms used in association analysis that you should familiarize yourself with include:
Itemset: a collection of one or more items
k-itemset:an itemset that contains k items Market Basket Table
         for instance one such itemset in the above example is {diapers, milk}
         it is a 2-itemset
Support count (s): is the frequency of occurrence of an itemset
        s({diapers, milk}) is 3
Support: is the ratio (or fraction) of the number of transactions that contain an itemset.
Support: s(itemset) =  support count of itemset/total number of transactions
         s({diapers, milk})= 3/5 = 0.6
Confidence: is the probability that itemset B will exist given itemset A exists in the transaction.
Itemset B is Coke, and Itemset A is {diapers, milk} so we want to find the probability that Coke exists in a transaction given that {diapers, milk} does. In order to find the confidence we need to divide the support count of the union of A and B by the support count of A.
Confidence = support count of A U B/ support count of A.
 What does this mean, the union of A and B A U B is the itemset {diapers, milk, coke}  and its corresponding support count is 2. (verify this by checking the table above)
         So the Confidence  of {diapers, milk}→coke = 2/3 =0.667
Association Rule: relationship discovered between two itemsets.
         {diapers, milk}→coke is one such example
Frequent Itemset: an itemset whose support is greater than or equal to a support threshold value
for instance if the support threshold is 3, then a frequent itemset for this data can be
        {diapers, milk} but it cannot be {diapers, coke} because it only appears twice.

Strong Association Rules: rules whose confidence is greater than or equal to a confidence threshold value
for instance if the confidence threshold is 0.5
        {diapers, milk}→coke is a strong association rule because its confidence is 0.67


Finding and Making the Rules

Association Rule Mining uses these thresholds to reduce the time complexity of the computations and find strong association rules in the data set.
Association Rule Mining can be viewed as a two-step process:

  1. Frequent Itemset Generation:- find all itemsets whose support is greater than or equal to the minimum support threshold.
  2. Rule generation: generate strong association rules from the frequent itemset whose confidence greater than or equal to minimum confidence threshold.

The next section focuses on efficient techniques for generating frequent itemsets