Evaluation

Chapter 2 Association Analysis

Section 7 Evaluation of Association Patterns

Page 2 Evaluation

Objectives

The objectives of this section are:
to introduce you to various measures used to evaluate association rules
to define subjective and objective measures of interestingness

Outcomes

By the time you have completed this section you will be able to:
distinguish between subjective and objective measures of interestingness
define some of the measures and explain their limitations

Page 1 of 1

Introduction

Butter →Bread,
Chocolate → Teddy,
Bear, Beer → Diapers, which of these three seem interesting to you?
Which of these three might affect the way you do business? We all can already assume that most people who buy bread will buy butter and if I were to tell you that I have analysis to show that most customers who buy chocolate also buy a teddy bear you wouldn’t be surprised. But what if I told you about a link between Beer and Diapers, wouldn’t that spark your interest?
After the creation of association rules we must decide which rules are actually interesting and of use to us. A market basket data which has about 10 transactions and 5 items can have up to 100 association rules and we need to be able to sift through this all these patterns and identify the most interesting ones. Interestingness is the term coined to define patterns that we consider of interest can be identified by subject and objective measures.

Evaluation of Association Patterns

Subjective vs. Objective Measures of Interestingness

Subjective measures are those that depend on the class of users who examine the pattern, for instance the example in the introduction about Teddy Bears → Chocolate & Beer → Diapers is an example of subjective measures, the pattern Teddy Bear → Chocolate can be considered subjectively uninteresting because it doesn’t reveal any information that isn’t expected. Incorporating subjective knowledge into pattern evaluation is a complex task and is beyond the scope of this introductory course.
An objective measure on the other hand uses statistical information which can be derived from the data to determine whether a particular pattern is interesting; support and confidence are both examples of objective measures of interestingness. These measures can be applied independently of a particular application. But there are limitations that we encounter when we try to use just the numerical support and confidence to determine the usefulness of a particular rule and because of these limitations other measures have been used to evaluate the quality of an association pattern. The rest of this section covers the details of objective measures of interestingness.

Objective Measures of Interestingness

Lift

This is the most popular objective measure of interestingness. It computes the ratio between the rule’s confidence and the support of the itemset in the rules consequent.

Interest Factor

Is the binary variables equivalent to the lift. Basically it compares the frequency of a pattern against a baseline frequency.

The interest factor lets you know if the itemsets are independent of each other, positively correlated or negatively correlated.

This measure is not without its own limitation, when dealing with association rules in which the itemset has a high support, the interest factor ends up being close to 1, which suggests that they itemsets are independent, this is a false conclusion and so in situations such as these, using the confidence measure is a better choice.

Correlation Analysis

Is another objective measure used to analyze relationships between a pair of variables. For binary variables, correlation is can be measured using the equation below

Limitations: The correlation measure does not remain invariant when there are proportional changes to the sample size. Another limitation is that it gives equal importance to both co-presence and co-absence of items in the transaction and so it is more suitable for analysis of symmetric binary variables

IS Measure

Is an object measure of interestingness that was proposed to help deal with the limitation of the Correlation measure. It is defined as follows

The limitation is that the value of the measure can be large even for uncorrelated and negatively correlated patterns.