# Exercise 5: Evaluating using a testing set

The dataset

 District House Type Income Previous Customer Outcome Suburban Detached High No Nothing Suburban Detached High Responded Nothing Rural Detached High No Responded Urban Semi-detached High No Responded Urban Semi-detached Low No Responded Urban Semi-detached Low Responded Nothing Rural Semi-detached Low Responded Responded Suburban Terrace High No Nothing Suburban Semi-detached Low No Responded Urban Terrace Low No Responded Suburban Terrace Low Responded Responded Rural Terrace High Responded Responded Rural Detached Low No Responded Urban Terrace High Responded Nothing

The Holdout (validation) data

Select rows by clicking to move data instances (rows) between tables

 District House Type Income Previous Customer Outcome

The Decision Tree: Interactively build it

• Click on the root node below and start building the tree.
• Non leaf nodes can be "pruned" once they have been chosen (by clicking on the node and selecting "prune node completely")
• The ratios on the branches indicate how well the chosen attribute at a node splits the remaining data based on the target attribute ('outcome').
• Click on any nodes to hilight the rows in the data table that the rule down to that node covers.
• At each node, the entropy of the data at that point in the tree will be given.
• Information gain (entropy reduction) is specified for each attribute.
• Reducing entropy to zero is a way of building a decision tree here.
When no more nodes can be expanded, the tree has classified all the training data.

• Move data into the testing set from the training set (by clicking) and then contsruct a tree. Ideally a testing set should have 33% or less of the training data (about 3 or 4 instances here).
• Compare classification errors on the testing data for complex trees compared to simple trees.

Classification Errors (Totals)

 Correct Incorrect Training set Validation set