Previous Table of Contents Next


PROBLEM STATEMENT

The work presented here used a modified version of Goldberg’s [4] Simple Genetic Algorithm (SGA) to generate test data for one of the test programs used in Watkins’ work [3]. The program, trityp-easy, shown in Figure 4.1, is composed of multiple nested if-then-else constructs.


Figure 4.1  The trityp-easy program.

The difficulty lies in finding test data which executes each line of source code within the MUT. Here test coverage was measured by instrumenting each block of code between control statements. Viewing trityp-easy as a series of code blocks (consisting of one or more statements) punctuated by control instructions helps visualize the flow of execution and is called the control graph. The code blocks in trityp-easy are delimited by opening and closing braces such as found on lines 14 and 17 in Figure 4.1, and represent nodes in the control graph.

Figure 4.2 depicts a portion of the control graph for trityp-easy up through line number 53. The blocks of sequentially executed code become the nodes of the graph. Control instructions form the edges. Measuring test coverage is simply keeping track of which edges and nodes were traversed.


Figure 4.2  Partial control graph of trityp-easy.

McCabe [5] applied concepts from graph theory to control graphs to define his well known cyclometric complexity metric. This metric provides a measure of the number of linearly independent paths through a software module. The trityp-easy program has a cyclometric complexity of 21. The work described in this chapter varied population size as a multiple of the cyclometric complexity.

Source code was instrumented by adding a function call to each block of code forming a node in the control graph. In Figure 4.1, the macro“INSTRUMENT_CODE”expands to a function call passing ___LINE___ as a parameter. The line number macro is replaced with the source file line number by the C preprocessor. The line number is cached for later use when the MUT executes one of these statements. Edge and node traversal counts are updated with the cached path information after the MUT has been completely traversed by a set of test data. The trityp-easy program has eighteen distinct blocks of code. Each is marked by the‘INSTRUMENT_CODE’ preprocessor macro.

SETTING UP THE GENETIC ALGORITHM

As previously indicated, the current effort utilized Goldberg’s SGA [4], though some modifications were made to streamline the process of gathering data. Specifically, those portions of the program soliciting operator input were replaced by hardcoded parameter values. All graphical data presented in this chapter were obtained from ensemble averages of 200 runs. Each run lasted 50 generations. Table 4.1 indicates the values of other pertinent GA parameters.

Table 4.1 SGA parameters
Parameter Setting
Number of Runs 200
Chromosome Length 24 bits
Number of Generations 50
Probability of Crossover 1.00
Type of Crossover Two Point
Probability of Mutation 0.005
Scaling Not Used
Population Size Varied

Various scaling methods were investigated but were found to actually hinder the formation of stable sub-populations of test data. This is because scaling operators are designed to amplify small differences between individuals, which is a form of positive feedback. The resulting positive feedback is used in standard GA based optimizations to provide good performers with exponentially more offspring. In this application, fitness is initially used to grant more offspring to good performers. However, as a sub-population grows, fitness will decrease, limiting its size and providing negative feedback. This limits uncontrolled reproduction of the descendants of a highly fit ancestor.

CODING

The trityp-easy program used in this work takes three 32-bit integer parameters and determines what type of triangle (if any) they form. The coding used in this work is rather simple. Each 24-bit chromosome is split into three fields or alleles of eight bits each. For trityp-easy, the process of decoding a chromosome into parameters involved a simple conversion of an 8-bit allele into a 32-bit number. This coding was chosen because it was simple and allowed the author to focus on evaluating the fitness function. This coding also decreased the size of the search space, making this problem fairly easy for the GA.

Since the goal of this effort is to eventually produce a usable (automatic) tool, it is not desirable to manually produce a coding for each module evaluated. It would be very efficient to simply code complete function parameters into the chromosome. The drawback of this approach is the large size of the resulting chromosomes. Larger chromosomes take longer to process and evolve. The success of the proposed utility will be dependent on resolving this issue.


Previous Table of Contents Next

Copyright © CRC Press LLC