Chapter 2 Association Analysis

Section 2 Frequent Itemset Generation

Page 2 F.I. Generation

**The objectives of this section are**:

to explain the problem of frequent itemset generation

to introduce the main strategies for solving this problem

to explain the Apriori Principle

to explain the Apriori Algorithm and it's various parts

**By the time you have completed this section you will be able to:
**list the main strategies for solving the frequent itemset generation problem

describe the Apriori Principle

explain the major steps of the Apriori Algorithm

list the requirements for efficient Candidate Generation

distinguish between the methods for efficient Candidate Generation

Copyright © Rakesh Verma 2009

In order to generate a frequent itemset list one must avoid using the brute force approach described in the previous section because it can be very expensive to search through the whole data set to find the support count of each itemset. Some of the strategies used to address this problem are :-

**Reduce the number of candidates:**use pruning techniques such as the Apriori principle to eliminate some of the candidate itemsets without counting their support values**Reduce the number of transactions:**by combining transactions together we can reduce the total number of transactions**Reduce the number of comparisons:**use efficient data structures to store the candidates thereby eliminating the need to match every candidate against every transaction.

**States that if an itemset is frequent, then all of its subsets must also be frequent. **This principle holds true because of the anti-monotone property of support.

This means that when f is anti-monotone (or downward closed) and X is a subset of Y, then the support of X, s(X) must not exceed the support of Y, s(Y).

The converse is also true for when f is monotone (or upward closed), this means that when X is a subset of Y, then the support of Y, s(Y) must not exceed the support of X, s(X).