Chapter 2 Association Analysis
Section 6 FP-Growth Algorithm
Page 2 FP-Tree

Objectives

The objectives of this section are:
to define the FP-Tree data structure
to introduce an alternative algorithm called the FP-Growth Algorithm
to explain the construction of the FP-Tree
to demonstrate how frequent itemsets are generated using this tree

Outcomes

By the time you have completed this section you will be able to:
define a FP-Tree
construct a FP-Tree for a given data set
use a FP-Tree to determine the frequent itemset
explain the major aspects of the FP-Growth Algorithm

Introduction

The FP-Growth Algorithm is an alternative algorithm used to find frequent itemsets. It is vastly different from the Apriori Algorithm explained in previous sections in that it uses a FP-tree to encode the data set and then extract the frequent itemsets from this tree. This section is divided into two main parts, the first deals with the representation of the FP-tree and the second details how frequent itemset generation occurs using this tree and its algorithm.

FP-Tree Representation

Definition

A FP-tree is a compact data structure that represents the data set in tree form.  Each transaction is read and then mapped onto a path in the FP-tree. This is done until all transactions have been read. Different transactions that have common subsets allow the tree to remain compact because their paths overlap.
Best CaseThe diagram to the right is an example of a best-case scenario that occurs when all transactions have exactly the same itemset; the size of the FP-tree will be only a single branch of nodes.
The worst case scenario occurs when every transaction has a unique itemset and so the space needed to store the tree is greater than the space used to store the original data set because the FP-tree requires additional space to store pointers between nodes and also the counters for each item. The diagram below is an example of how a worst case scenario FP-tree might appear. As you can see, the complexity of the tree grows with the uniqueness of each transaction.

WorstCase

Construction

The construction of a FP-tree is subdivided into three major steps.

  1. Scan the data set to determine the support count of each item, discard the infrequent items and sort the frequent items in decreasing order.
  2. Scan the data set one transaction at a time to create the FP-tree. For each transaction:
    1. If it is a unique transaction form a new path and set the counter for each node to 1.
    2. If it shares a common prefix itemset then increment the common itemset node counters and create new nodes if needed.
  3. Continue this until each transaction has been mapped unto the tree.

The slideshow presentation below details a short example of a FP-tree construction

Content on this page requires a newer version of Adobe Flash Player.

Get Adobe Flash player