Algorithm Settings
The algorithms used to generate a model can be configured before it is generated or after it has been generated with the default settings. The length of the building process and network accuracy will depend on the complexity of the data and the algorithm settings.
The following algorithms are available for selection through the Learning Algorithm drop-down list:
- K2: A hill climbing algorithm with a fixed ordering of variables. A node’s parents are only considered from the nodes before it in the ordering. The Classifier node is always first in the ordering.
- Hill Climber: Adds and deletes arrows with no fixed ordering of nodes.
- Look Ahead Hill Climber: Applies hill climbing while looking ahead on a number of possible future steps.
- Repeated Hill Climber: Starts with a randomly generated network and then repeatedly applies hill climber to reach a local optimum before selecting the best result.
- Tabu Search: Performs hill climbing until it hits a local optimum. Then it steps to the least-worse candidate in the neighbourhood. However, it does not consider points in the neighbourhood it just visited, which are instead stored in a Tabu-list.
- Tree Augmented Naïve Bayes: The tree is formed by calculating the maximum weight spanning tree using Chow and Liu algorithm.
- Simulated Annealing: Using adding and deleting arrows, the algorithm randomly generates a candidate network close to the current network. It accepts the network if it is better than then current. Otherwise it accepts the candidate with probability decreasing with time.
Depending on the selected algorithm, the following settings can be configured:
Setting | Type | Algorithm(s) it applies to | Description |
---|---|---|---|
Use Local Score Metric | checkbox | <all> |
Determines how the network is scored at each stage of the algorithm.
|
Markov Blanket Classifier | checkbox | <all> | Ensures that all nodes are either a parent, child or any other parent of the classifier node. This ensures that the classifier node is conditionally dependant on all other nodes in the network. There will be better accuracy for the classifier node but the structure of the network will be less accurate. |
Score Type | drop-down list | <all> |
When Local Score Metric is used, this option determines how the network is scored:
|
Initialise as Naïve Bayes | checkbox | K2, Hill Climber, Look Ahead Hill Climber, Repeated Hill Climber, Tabu Search | Forces the initial structure of the network as a Naïve Bayes before the learning algorithm is applied, where the classifier node has an arrow pointing to every other node. |
Max Number of Parents | textbox | K2, Hill Climber, Look Ahead Hill Climber, Repeated Hill Climber, Tabu Search | The maximum number of parents for each node in the network. Increasing this may give a more accurate network but will increase the time the network takes to generate. |
Use Arc Reversal | checkbox | Hill Climber, Look Ahead Hill Climber, Repeated Hill Climber, Tabu Search | Considers reversing the direction of any arrows at each step of the generation process. May give a more accurate network but will increase generating time. |
Runs | textbox | Look Ahead Hill Climber, Tabu Search, Simulated Annealing, ICS | The amount of times an algorithm will run. The algorithm will start with a random network for each run and chooses the best scoring network overall. |
Number of Look Ahead Steps | textbox | Look Ahead Hill Climber | How many steps ahead the algorithm looks. |
Number of Good Operations | textbox | Look Ahead Hill Climber | How many operations are stored at each step. |
Random Layout | K2 | Randomises the initial ordering of the nodes. Otherwise the order the nodes will be ordered alphabetically. | |
Tabu-List Length | textbox | Tabu Search | The list of generation steps that the algorithm will not repeat. |
Sample Type | drop-down list | <all> |
Instead of all the data being used to train the network, a sample can be used:
|
Sample Percentage | textbox | <all> | The percentage of the data that will be used to create the model. A smaller sample size will decrease the time the model takes to generate but may reduce the accuracy of the model. |
Balance Data | checkbox | <all> | Reweights the data so that each class has the same total weight. |
Run in Background | checkbox | <all> | Generates the network in the background and saves the report once completed. |
Click Apply to update the Bayesian Network with the defined algorithm settings.