Algorithm Settings

The algorithms used to generate a model can be configured before it is generated or after it has been generated with the default settings. The length of the building process and network accuracy will depend on the complexity of the data and the algorithm settings.

The following algorithms are available for selection through the Learning Algorithm drop-down list:

  • K2: A hill climbing algorithm with a fixed ordering of variables. A node’s parents are only considered from the nodes before it in the ordering. The Classifier node is always first in the ordering.
  • Hill Climber: Adds and deletes arrows with no fixed ordering of nodes.
  • Look Ahead Hill Climber: Applies hill climbing while looking ahead on a number of possible future steps.
  • Repeated Hill Climber: Starts with a randomly generated network and then repeatedly applies hill climber to reach a local optimum before selecting the best result.
  • Tabu Search: Performs hill climbing until it hits a local optimum. Then it steps to the least-worse candidate in the neighbourhood. However, it does not consider points in the neighbourhood it just visited, which are instead stored in a Tabu-list.
  • Tree Augmented Naïve Bayes: The tree is formed by calculating the maximum weight spanning tree using Chow and Liu algorithm.
  • Simulated Annealing: Using adding and deleting arrows, the algorithm randomly generates a candidate network close to the current network. It accepts the network if it is better than then current. Otherwise it accepts the candidate with probability decreasing with time.

Depending on the selected algorithm, the following settings can be configured:

Setting Type Algorithm(s) it applies to Description
Use Local Score Metric checkbox <all>

Determines how the network is scored at each stage of the algorithm.

  • If selected, the score of the network is determined by the sum of the scores of the individual nodes.
  • If unselected, a Global Score Metric is used. This repeatedly splits the data into training and validation sets and assesses the quality of the network on how well it predicts the validation data. This greatly increases the time it takes for the network to be generated.
Markov Blanket Classifier checkbox <all> Ensures that all nodes are either a parent, child or any other parent of the classifier node. This ensures that the classifier node is conditionally dependant on all other nodes in the network. There will be better accuracy for the classifier node but the structure of the network will be less accurate.
Score Type drop-down list <all>

When Local Score Metric is used, this option determines how the network is scored:

  • BDe (Bayesian Dirichlet likelihood-equivalence): Seeks to maximise the joint probability of the data and the network. Evaluates the model and the data given the model.
  • MDL (Minimum Description Length): Biases the algorithm to favour a simpler network when choosing between multiple potential outputs.
Initialise as Naïve Bayes checkbox K2, Hill Climber, Look Ahead Hill Climber, Repeated Hill Climber, Tabu Search Forces the initial structure of the network as a Naïve Bayes before the learning algorithm is applied, where the classifier node has an arrow pointing to every other node.
Max Number of Parents textbox K2, Hill Climber, Look Ahead Hill Climber, Repeated Hill Climber, Tabu Search The maximum number of parents for each node in the network. Increasing this may give a more accurate network but will increase the time the network takes to generate.
Use Arc Reversal checkbox Hill Climber, Look Ahead Hill Climber, Repeated Hill Climber, Tabu Search Considers reversing the direction of any arrows at each step of the generation process. May give a more accurate network but will increase generating time.
Runs textbox Look Ahead Hill Climber, Tabu Search, Simulated Annealing, ICS The amount of times an algorithm will run. The algorithm will start with a random network for each run and chooses the best scoring network overall.
Number of Look Ahead Steps textbox Look Ahead Hill Climber How many steps ahead the algorithm looks.
Number of Good Operations textbox Look Ahead Hill Climber How many operations are stored at each step.
Random Layout   K2 Randomises the initial ordering of the nodes. Otherwise the order the nodes will be ordered alphabetically.
Tabu-List Length textbox Tabu Search The list of generation steps that the algorithm will not repeat.
Sample Type drop-down list <all>

Instead of all the data being used to train the network, a sample can be used:

  • First uses the first portion of the data
  • Last uses the last portion of the data
  • Random uses a random subset of data
  • Interval chooses records evenly across the data
Sample Percentage textbox <all> The percentage of the data that will be used to create the model. A smaller sample size will decrease the time the model takes to generate but may reduce the accuracy of the model.
Balance Data checkbox <all> Reweights the data so that each class has the same total weight.
Run in Background checkbox <all> Generates the network in the background and saves the report once completed.

Click Apply to update the Bayesian Network with the defined algorithm settings.