Previous Table of Contents Next


Mutation Operator Simulation Summary

As suspected, no obvious advantage of one mutation operator over the other was found. Each corresponding random mutation and window mutation simulation converged (or did not converge) on the solution within the same number of generations. Thus, due to the equal performance of these two operators, the random mutation operator was used in the remaining simulations due to its simplicity.


Note:  Mutation is further addressed, with respect to mutation probability, in the next section, “Genetic Algorithm Parameters Simulation Results.”

Genetic Algorithm Parameters Simulation Results

Several combinations of genetic algorithm parameters were simulated, as specified in the simulation test matrix. Using the same database configuration and random number generator seed as in simulation 11A, two genetic algorithm parameters (population size and mutation probability) were modified in an attempt to see if those simulation results could be improved.

Population Size Simulations

Recall that in simulation 11A, the genetic algorithm converged on a false plateau and never recovered. Postulating that this could have been caused by an initial population that was not diverse enough, I ran a simulation in which the population size was increased from 70 to 200. As can be seen from Figure 9.15, the item combination of interest entered the population around generation 4 and the simulation converged on the solution around generation 13. Thus, the fact that this simulation converged on the solution suggests that a larger population may help insure that a correct solution is determined. Of course, it must be considered that a larger population has the disadvantage of contributing to more computation and thus a reduced overall execution time.


Figure 9.15  Simulation 14 results

Mutation Probability Simulations (Random Mutation)

Again using simulation 11A as the test configuration, and using the random mutation operator, I attempted to recover the lost allele by increasing the mutation probability from 0.001 to 0.010. As can be seen from Figure 9.16, the item combination of interest entered the population around generation 190. Then the simulation, after having been stuck on a false plateau for most of the preceding simulation, quickly converged on the solution around generation 200. Thus, the fact that the simulation eventually converged on the solution suggests that an increased mutation probability may help insure that lost alleles are recovered and that a correct solution is determined.


Figure 9.16  Simulation 15 results

Mutation Probability Simulations (Window Mutation)

This simulation was identical to the previous simulation (simulation 15) except that the window mutation operator was used instead of the random mutation operator. As can be seen from Figure 9.17, the lost allele was never recovered and the simulation remained on a false plateau through 400 generations.


Figure 9.17  Simulation 16 results

Genetic Algorithm Parameters Simulation Summary

In the environment in which the population size and mutation probability parameters were tested, it appears that increasing the population size can greatly increase the chances of converging on the correct solution, but at the cost of more computation. In addition, an increased mutation probability may help recover lost alleles and enable the simulation to locate the item combination of interest, as it did in the random mutation operator case.

Genetic Algorithm Performance and Performance Simulation Results

Given the fact that the data sets used in this project were synthetic, and also considering the limited scope of this project, the performance of the genetic algorithm-based approach could not be analyzed in a “real” database environment. One performance-related area that was investigated, however, involved improving the genetic algorithm execution speed. As is the case in most genetic algorithms, the bulk of the execution time is spent performing fitness function calculations, and the situation was no different for the genetic-based search examined in this project.

Depending on the size of the database that is being investigated by the genetic algorithm, the time to scan the database in order to generate the fitness for a particular item combination can vary significantly. If a relatively small database is being used, all transactions can be examined and an exact fitness value can be determined within a reasonable amount of time1. On the other hand, if the database is very large, complete examination of every transaction in order to calculate an exact fitness value, may not be feasible. In this case, we can perform an approximate function evaluation to help speed up the execution time of the genetic algorithm. In the case of the data mining genetic algorithm, approximate function evaluation is fairly straightforward in that a subset of the transactions in the database can be sampled to determine a fitness value for an item combination.


1Of course, database size and fitness calculation times are relative, and the computer system on which a simulation is run will greatly impact overall execution time.

An approximate function evaluation method was implemented and compared to the full function evaluation method, and the simulation results that were obtained are discussed next.

Full Function Evaluation Simulation

The full function evaluation simulation was run with the fitness function, crossover operator, mutation operator, genetic algorithm parameters, and database configuration as noted in the simulation test matrix.

Due to the large number of transactions in the database (5000), the total running time for this simulation was much longer than all previous simulations, at 43.5 minutes.

As can be seen from Figure 9.18, the item combination of interest entered the population around generation 5 and the simulation converged on the solution around generation 14.


Figure 9.18  Simulation 17 results.


Previous Table of Contents Next

Copyright © CRC Press LLC