Chapter 2 Association Analysis
Section 2 Frequent Itemset Generation
Page 2 F. I. Generation

Objectives

The objectives of this section are:
to explain the problem of frequent itemset generation
to introduce the main strategies for solving this problem
to explain the Apriori Principle
to explain the the Apriori Algorithm

Outcomes

By the time you have completed this section you will be able to:
list the main strategies for solving the frequent itemset generation problem
describe the Apriori Principle
explain the major steps of the Apriori Algorithm

Introduction

How do you recognize a frequent itemset?
What sets it apart from any of the other itemsets that can be constructed from the data set?
Hopefully you should be able to answer both of these questions in less than 3 seconds. As explained in the previous section a frequent itemset is an itemset that has a support that is greater than a threshold support that is predefined. This threshold is otherwise called minsup. Now I know you are thinking why in the world a whole section would be dedicated to just picking and sorting numbers, but it is a bit more complex than that. In order to have an efficient association rule model we can’t use the brute force approach of calculating the support for all the itemsets and then choosing those that are greater than minsup. This approach is computationally expensive and so the rest of this section introduces strategies used to address this problem of generating frequent itemset.

Strategies

There are three main strategies used for Generating Frequent Itemset efficiently; they all involve the reduction of critical parts of the data set.

    1. Reduce the number of candidates
    2. Reduce the number of transactions
    3. Reduce the number of comparisons

The Apriori Principle is an example of finding itemsets by reducing the number of candidates. This principle states that if an itemset is frequent, then all of its subsets must also be frequent. This principle holds true because of the anti-monotone property of support.

AprioriEquationUpward

This means that when f is anti-monotone (or downward closed) and X is a subset of Y, then the support of X, s(X) must not exceed the support of Y, s(Y).
The converse is also true for when f is monotone (or upward closed), this means that when X is a subset of Y, then the support of Y, s(Y) must not exceed the support of X, s(X).

AprioriEquationDownward