| Previous | Table of Contents | Next |
GENETIC PROGRAMMING PARTICULARS
In his 1992 book [9], John Koza recommends six steps that should be followed to successfully complete a genetic programming implementation. In this section, the particulars of these six steps as they apply to the hydrocyclone system identification problem will be discussed.
In genetic programming applications, there are two key elements: terminals and functions. Terminals are the things that can appear at the terminal points in the graphical (or tree) representations of possible computer programs (see Chapter 14 for a detailed discussion). Examples of terminals include variables and constants.
Recognize that, theoretically, the solution space in a genetic programming problem is all possible computer programs. Contemplating this point makes the task appear quite daunting, if not impossible. Thus, the size of the search space is limited some by reducing the number of possibilities. The first step in doing so is to limit the terminals that are considered in the candidate computer programs. For instance, in the current problem, it is unlikely that the computer programs that successfully simulate the performance of a hydrocyclone will require the use of alpha characters. Thus, these can be eliminated from the pool of possible terminals, thereby reducing the size of the search space.
The terminals selected in the current problem are all floating point values (which may serve as constants in any equations) and all of the input and output variables in the hydrocyclone problem:
This can represent a tricky issue. Since there have been a number of empirical models developed for simulating hydrocyclone performance, the temptation here was to use the functions employed in some of the more accurate models. However, for the genetic programming approach to work in general on system identification problems, this cannot be considered a viable approach. The approach adopted here was to view the modeling equation as being a function that could perhaps be represented as a Fourier Series. Therefore, the functions selected for the current problem were:
Actually, this step is directly related to step 2 above of selecting the functions to be considered. One of the problems associated with limiting the pool of functions from which genetic programming can assemble a computer program is that this pool may be too restrictive; the necessary functions might not be included in the candidate pool. Thus, sufficiency involves ensuring that the set of terminals and the set of primitive functions are capable of expressing a solution to the problem at hand. Unfortunately, ensuring the extent to which the functions and terminals are sufficient to solve the problem is often difficult to judge until some genetic programming runs have been made.
The idea of closure is related to the problem of generating computer programs that are feasible, or that will compile. Most programmers (even highly skilled ones) generate computer programs that contain fundamental syntax errors. Certainly, this phenomenon can occur when a genetic algorithm is randomly generating codes by mixing and matching pieces of existing codes. Thus, in genetic programming, closure must be ensured. Closure dictated that each of the functions in the function set must be able to accept, as its arguments, any value and data type that may possibly be returned by any function in the function set. In this way, genetic programming will not generate infeasible codes; ineffective perhaps, but not infeasible.
The search conducted in genetic programming is performed by a genetic algorithm. Recall that genetic algorithms use as their only driving force, a fitness function which defines just how effective (or ineffective) a candidate solution is in solving the problem at hand. In the hydrocylcone identification problem, the effectiveness of a candidate solution is directly related to how well it predicts the split size of a particular set of hydrocyclones. Given this, the fitness function (or measure of quality) is:

where N is the number of data points available, d50data is the split size appearing in the data set, and d50calculated is the split size computed for a particular set of input values. Genetic programming is used to compute candidate solutions (computer programs depicting equations for calculating the split size). Each candidate solution is then used to compute a d50 size for N=200 input vectors (values of Dc, Di, Do, Du, h, Q, φ and ρ). Naturally, the more accurate the model, the closer the value of f is to 0. Thus, the genetic algorithm in the genetic programming code has a value that it can strive to minimize. If it finds a candidate solution that has f = 0, then that solution predicts d50 values that exactly match all 200 data values.
As with a standard genetic algorithm application, applying genetic programming to a problem requires the determination of some variable values. Along with those parameters typically associated with a genetic algorithm such as population size and probability of crossover, there are some parameters that are specific to genetic programming.
The pertinent variables and the values used in the current problem are:
In static problems such as the one considered here, selecting a stop criterion is a rather straightforward task. Here, the genetic programming is continued for a maximum number of generations (25,000), or until a solution is found that exactly models the data presented it (f =0). But, in other problems, appropriate stopping criteria are not so readily apparent. For instance, in real-time control systems, there are situations wherein the time to search is limited, the search must be terminated when something changes in the plant, or a variety of other equally acceptable criteria.
| Previous | Table of Contents | Next |