Generate Decision Tree

Clicking Create Decision Tree after you have configured the settings on the Decision Tree screen navigates you to the generated model screen.

All the available network Views are shown in a series of tabs on the left panel. The following actions are available at the top right:

Conditional Formatting (applies only to Tree and Paths views): The options that can be configured are displayed under the Threshold heading. Click the relevant box under the Colour heading and either type the hex code or use the pop-up colour picker.

Click New Threshold to add a new condition and use the percentage slider to specify its range. To delete a threshold, click the x icon next to the relevant colour.

Click Apply to save changes and close the window.

Click New Threshold to create an entry, then use the slider to select a desired percentage. Then choose a colour either by typing in the hex code or using the pop-up colour selector. When all thresholds are set, click Apply to save the changes.
New Decision Tree: Restarts the generation procecss

Display Options

The nodes used to build the decision tree are displayed under the Nodes heading on the left panel on the Tree and Paths views. The following options are available:

Sort: Changes the sorting of the nodes as follows:
- Name A-Z (default)
- Name Z-A
- Largest Percentage
- Smallest Percentage
Percentage: Changes the percentage type for each node.
- Show Percentage of Parent Node:Displays the percentage of the records in the data that match the outcomes of the node and all the nodes above it in the tree.
- Show Percentage of Total: Displays the percentage of a total for the node divided by the percentage of total for the node’s immediate parent.
Colour Nodes: Allows you to colour the background of any nodes selected in the drop-down list. Click Apply to close the pop-up window and view your colour selections.

Views

You can access the views through the following tabs:

Evaluation

Using the default algorithm settings, 80% of the data is used to generate the network and the remaining 20% is used for testing the performance of the network. In this scenario, the Evaluation view will display the results of the analysis detailed below.

If the network has been generated from 100% of the data, there are two options for evaluation:

Cross validation first divides the dataset into five parts. Four of the pieces are used for training and the last piece is used for testing. This is repeated for all five pieces of the dataset and the results of the evaluation are averaged.

To run this evaluation, click Run Cross Validation. The results of the analysis will then be displayed.

Note: Running cross validation takes approximately five times longer to generate the network.

Rerunning the Network with 80% of the data is also an option, using the remaining 20% for training and to test the model for accuracy. Click Rerun Network with 80/20 Split to generate the model with this analysis.

Once one of these options has been selected, statistics evaluating the classifier node are available.

Statistical Analysis

The following analysis is performed on the network:

Analysis	Description
Correctly Classified	The number of rows in the test set for which the classifier variable was correctly classified by the network.
Incorrectly Classified	The number of rows in the test set for which the classifier variable was classified incorrectly by the network.
Kappa Statistic	Classification accuracy normalised by the imbalance of the classes in the data. An alternative to simple percentage agreement calculation that takes into account the possibility of the agreement occurring by chance. The closer the result is to 1, the more accurately the network has classified the variables.
Mean Absolute Error	Measures how close forecasts or predictions are to the eventual outcomes regardless of direction. The closer the result is to 1, the less accurately the network has scored. A score of 0 indicates no errors.
Root Mean Squared Error	Represents the same standard deviation of the differences between predicted values and observed values. The greater difference between the Root Mean Squared error and the Mean Absolute Error is the greater the variance in the individual errors in the sample. If two measures are equal, then all the errors are of the same magnitude. The closer the result is to 1, the less accurately the network has scored. A score of 0 indicates no errors.
Relative Absolute Error	Takes the total absolute error and normalises it by dividing by the total absolute error of the simple predictor that classifies variables randomly.
ROC Curve	Select a value from the Outcome drop-down list. The resulting ROC curve plots the true positive rate against the false positive rate for varying threshold values on the probability estimates. For example, a threshold value of 0.5 means that the predicted probability of ‘positive’ must be higher than 0.5 for the instance to be predicted as ‘positive’. This displays the model’s ability to predict the outcome compared to a random classifier.