Chapter 1 Decision Trees
Section 3 Efficient Decision Tree Construction
Page 3 Gini Index

Objectives:

The objectives of this section are:
to introduce you to the various attribute types and the ways they can be split
to present you with an algorithm for creating a decision tree classifier
to determine the best split for a given node
to inform you of the problems that arise with the algorithm and how they are addressed.

Outcomes:

By the time you have completed this section you will be able to:
compare decision trees and decide whether or not they are efficient
explain from a high level Hunts’ algorithms and the difficulties encountered
calculate the impurity of a node by using the measures outlined in this section
compare and contrast the measures of impurity

Measures of Impurity

Attribute Selection Measures

How do you know which attribute to split, do you select by random or based on its position in the data set? An attribute selection measure is a heuristic that helps us select the best splitting criterion for a specific node. The best splitting criterion would be the attribute that most closely results in a pure result.  Attribute selection measures provide us with a ranking for all the attributes so that we can select the one with the smallest node impurity. Basically what we are trying to find out in these calculations is the node impurity. The smaller the degree of impurity, the more skewed the distribution and more useful the split is to isolating the data sets into unique classes. The rest of this section covers three attribute selection measures that we focus on in this course – Gini Index, Entropy and Classification Error.

Gini Index

The first is the Gini Index which is explained in the slideshow below.

    Content on this page requires a newer version of Adobe Flash Player.

    Get Adobe Flash player