| Previous | Table of Contents | Next |
Fitness Function
Two ways of coding the fitness function are considered here. First, the X and the O players could be pitted against one another in a series of games. Or, alternatively, the game-playing program could be matched against a fixed algorithm. If the two players were forced to play one another, then two population pools would be generated initially. One pool would contain the X player strategy, the other pool would represent the O player strategy. In this approach, if 30 strings were generated in each pool, and each pool played five games, there would be 30*30*5 = 4500 games played total in each generation. The fitness function value of each string is derived from the win-loss-tie record of the strings.
On the other hand, if the strings are pitted against fixed algorithms, each strategy would play against its opponent a fixed number of games. Then, the fitness value would be determined in the same way as above; according to the win-loss-tie record. In this approach, if there are 100 strings in the pool, and each string is forced to play 50 games, there would be total of 5,000 games played during each generation.
The following scoring methods are used:
| Player | Win | Tie | Loss |
| Player X | +4 | +1 | 0 |
| Player O | +5 | +2 | 0 |
The difference between the X player and the O player is that for the X player, its objective is to win, because it goes first. It receives a smaller score for a tie because it has the advantage of moving first. But, as for the O player, because it goes second, its objective is to prevent the X player from winning. Thus, the O player is considered successful (to some extent) if it ties the game, and an even bigger accomplishment if it wins the game.
Using the above table, for example, if the X player has a record of 11-5-4 after 30 games, then it will have a fitness value of 49. If the O player has the same record, then it will have a fitness value of 65.
Crossover & Mutation
After the random generation and evaluation of the first generation, there exist 30 strings with associated score or fitness values. Next, roulette wheel selection is implemented to generate a mating pool of 30 strings. For example, if the initial generation was represented by the strings in the table below, then string number one would have the probability of 34/430 or 8% chance of making it to the next generation. Over the 30 selections, it would have a probability of 91% to have one or more copies of itself in the next generation. Likewise, string two would have a probability of 0.9% over one selection and a probability of 24% over all 30 selection.
| String | 1 | 2 | 3 | 4 | 5 | Total | |
| Score | 34 | 4 | 23 | 12 | 2 | 430 |
The idea is, of course, to give the strings with a high fitness values a greater chance of making it to the next generation. And by using roulette wheel selection, it will add some variations into the next generation.
For this application, a crossover probability of 1.0 was used. This means that crossover will happen to all strings in the mating pool. A mutation probability of 0.001 was used.
Selecting GA Parameters
Initially, the GA was used to train the X player. A ternary alphabet was used consisting of 0, 1, and 2, so each move had two bits which combine to produce a number between 0 and 8. The GA strategy was developed by pitting it against a computer opponent that played with a very simple, perhaps minimal strategy (to be described below).
First, the board is initialized and place in a file so that the GA did not have to compute the boards every time it evaluated the fitness of a string. The board initialization program goes through the entire board configuration and saves the valid configurations while discarding the invalid ones. It also discards the boards which the X player can win during the next move and the ones X player must block in order to stay in the game. This is done because this domain knowledge was pre-programmed. For each valid configuration, the code applied each of the six variations and compared these variations of the current decision to the stored configuration. If the configuration is a variant of a previous one, it stores the current configuration, the original configuration, and the variation number. All the information is then saved to a data file.
After these preliminaries are completed, the GA program can now be executed. Each individual in the 100 string population is forced to play a computer opponent. This opponent uses following strategy:
This, obviously, is not a very sophisticated strategy. However, this player does know when to block and when to win the game. This is enough to train the GA player with. Here, the goal is to train the GA player so that it can beat this opponent virtually every time. If this is achieved, then the strategy employed by the GA is deemed to be effective for playing the game of Tic-Tac-Toe.
Each string or individual plays 50 games against its simplistic computer opponent. Using the scoring method described above, a maximum of 200 points can be scored.
| Previous | Table of Contents | Next |