Previous Table of Contents Next


RESULTS

The following section discusses the different aspects of the genetic algorithm that were investigated in an effort to select the best performing fitness function and genetic operators. In addition, the specific simulations that were run for each area of investigation are discussed and summarized.

Areas of Investigation and Simulations

In an effort to better understand and optimize the data mining genetic algorithm, several aspects of its implementation were investigated. Areas that were investigated included fitness function, crossover operator, mutation operator, genetic algorithm parameters, and genetic algorithm performance.

Various simulations were run using different combinations of genetic algorithm operators, genetic algorithm parameters, and database configurations. The database configuration table (Table 9.1) and simulation test matrix (Table 9.2) summarize the simulation configurations that were run. A discussion of each area of investigation, the specific simulations that were run for each area, and the results that were obtained is presented following the database configuration table and simulation test matrix.

Database Configurations

The following table summarizes the different database configurations that were used in the simulations specified in the simulation test matrix:

Table 9.1 Database configuration table.
Database Configuration Number of Transactions Item Relationship(s)
1 100 Items 11, 22, 33, 44. 90% probability for each.
2 100 Items 11, 22, 33, 44. 60% probability for each.
3 500 Items 11, 22, 33, 44. 90% probability for each.
Items 55, 66, 77, 88. 80% probability for each.
4 5000 Items 11, 22, 33, 44. 60% probability for each.

Each database configuration contains a different type of item relationship where each relationship attempts to “challenge,” in a different way, the genetic algorithm in determining the four items that most often appear together. For example, database configuration 1 was generated such that each of the items 11, 22, 33, and 44 appear in a single transaction with a 90% probability. This implies that the probability of all four items appearing together in a single transaction is 0.9 × 0.9 × 0.9 × 0.9, or 0.656 (65.6%). This configuration is intended to be the simplest database from which the four items could be determined. The item relationships for the remaining database configurations can be determined in a similar manner using the information provided in the database configuration table.

Simulation Test Matrix

The following simulation test matrix (Table 9.2) shows the simulations that were run for each area of investigation (fitness function, crossover operator, mutation operator, genetic algorithm parameters, and genetic algorithm performance). The test case numbers are referred to in the subsequent sections that discuss the results of each simulation.

Table 9.2 Simulation test matrix.
Fitness Function Simulations
Test Case Database Characteristics GA Variables Fitness Function Crossover Operator Mutation Operator
1 Database configuration
1
Pop. Size: 70,
pc : 0.9, pm:
0.001
F1 UXSCO Random
2 Database configuration
2
Pop. Size: 70,
pc : 0.91, pm:
0.001
F1 UXSCO Random
3. Database configuration
3
Pop. Size: 70,
pc : 0.9, pm:
0.001
F1 UXSCO Random
4 Database configuration
1
Pop. Size: 70,
pc : 0.9, pm:
0.001
F2 UXSCO Random
5 Database configuration
2
Pop. Size: 70,
pc : 0.9, pm:
0.001
F2 UXSCO Random
6 Database configuration
3
Pop. Size: 70,
pc : 0.9, pm:
0.001
F2 UXSCO Random
7 Database configuration
1
Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 UXSCO Random
8 Database configuration
2
Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 UXSCO Random
9 Database configuration
3
Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 UXSCO Random
Crossover Operator Simulations
7, 8, 9 Note: Same as Fitness Function Simulations 7, 8, and 9
10 Database configuration
1
Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 ASPX Random
11 Database configuration
2
Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 ASPX Random
12 Database configuration
3
Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 ASPX Random
Mutation Operator Simulations
11 Database configuration
2
Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 ASPX Random
13 Database configuration
2
Pop. Size: 70,
pc : 0.91 pm:
0.001
F3 ASPX Window
Genetic Algorithm Parameters Simulations
14 Database configuration
2
Pop. Size: 200,
pc : 0.9, pm: 0.001
F3 ASPX Random
15 Database configuration
2
Pop. Size: 70,
pc : 0.9, pm:
0.010
F3 ASPX Random
16 Database configuration
2
Pop. Size: 70,
pc : 0.9, pm:
0.010
F3 ASPX Window
Performance Simulations
17 Database configuration
4
Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 ASPX Random
18 Database configuration 4 Note: Only a fraction (0.1) was sampled Pop. Size: 70,
pc : 0.9, pm:
0.001
F3 ASPX Random


Previous Table of Contents Next

Copyright © CRC Press LLC