Previous Table of Contents Next


GENETIC OPERATIONS

The operations of reproduction, crossover, and mutation are used in applying genetic: concepts to the original study. It should be emphasized that it is small changes in the forecast that are important. These represent fluctuations that are present in the time series.

Reproduction and the Fitness Function

The fitness function replaces the simplex projection previously described. In this projection, the current point (a vector with the first coordinate as the current time value in a rolling series) associated with n nearby points. (In the study, this was necessarily a simplex requiring the extensive search.) The number n is the embedding dimension + 1, emulating the number of vertices in the simplex. These points are chosen as the nearest n neighbors (determined by Euclidean distance). The current position relative to the n points is calculated as the weighted distance to the points and saved for later use. The use of this type of function eliminates the extensive search to find the “smallest” simplex. The only difference between the weights using the simplex projection method and using genetic operations is that in the former case, all of the weights are positive while in the latter case, some of the weights can be (and probably will be) negative. As in the original study, we keep track of where the points end up after x forward time steps. From these, the forecast of the current point is made using the weights previously saved. Thus for every current time value in a rolling time series, one copy is created based upon the weighted distances from the nearest neighbors of the current point end up.

To demonstrate the operation of the fitness function, it may be helpful to review the example from step 5 in the section entitled: Solution Formulation. In that example, the following vectors, in terms of components, were obtained directly from the time series:

Test vector (T) (17,13,19)
Simplex Vertex 1 (V1) = (13,19,17)

These vectors were determined:

Vertex 2 (V2) = (19,17,11) as the four nearest neighbors
Vertex 3 (V3) = (17,11,4) to the test vector.
Vertex 4 (V4) = (11,4,8).
Projected Simplex Vertex 1 (P1) = (17,13,19)

These were determined:

Vertex 2 (P2) = (19,17,11) from the simplex above and
Vertex 3 (P3) = (17,11,4) tracking s time steps from each
Vertex 4 (P4) = (11,4,8) component in the series.

Goal: Use a fitness function to project the test vector s time steps into the future by:

1.  Determining the relationship of the test vector to the original simplex. This is accomplished by writing the test vector (T) as a linear combination of the simplex vertices (V1, V2, V3, V4) to form:

where the w's are weights.

The above system consists of three equations in four unknowns. An added restriction requires w1 + w2 + w3 + w4 = 1 to establish a relationship in a convex structure (in this case a tetrahedron). When the w's are either all positive or all negative, the test vector is encapsulated within the structure (simplex).

2.  Using the fitness function to project the test vector into the future. By allowing the fitness function to operate without the strict requirement that all w's be either positive or negative, the exhaustive search to encapsulate the test vector is eliminated. The fitness function is then:

Assuming, for example, that the solution to finding values for the w's are: w1=.6, w2=.3, w3=.5, and w4=-.4, then the fitness function which projects the test vector's time steps into the future is given by:

Crossover

A value of about 0.6 will initially be used for the value of the probability of crossover. Multipoint crossover will be determined by choosing n number of random numbers of low order bits for each component value in the projected current value. Using only low order bit positions of the components is consistent with time series that are highly correlated, such as the stock market prices and other economic series.

Changes in prices are normally very small and reflect fluctuations at or near the decimal point.

Mutation

The probability of mutation will be initially set to about .001. For a sample size of 1000, only one operation of mutation is expected.

Coding Scheme

The coding scheme comes directly from the vector point, where the components are time series values. As previously described, each point will have the form [xt,x t-1,xt-2], assuming a dimension of three and time difference of one. Sixteen bits will be used as the size for each component in order to provide for a large variety of series. Thus, in the case of dimension 3 and time value 1, three components, each of 16 bits would be required to effectively encode the point. If we were interested in analyzing a time series such as the time series associated with the stock market (whose value is commonly about 6000.00 using the DJIA, including decimals), all 16 bits of each component would be used. An important reason to use this scheme is that for highly correlated time series (which are very common), the most recent values are the most important values. The next most important value is the previous value. Thus, the structure provides for the most effective case for genetic operations - keeping the most important allele adjacent to each other.

Using the example previously described, the coding scheme for the test vector of (17,13,19) would be the 48 bits as follows:

[0000 0000 0001 0001 0000 0000 0000 1101 0000 0000 0001 0011].

In a correlated time series, fluctuations in the values are represented as small changes in the low order positions of each vector. In this case, the low order bits would be bits 15,16,31,32,47, and 48. These low order bits would then be subject to crossover and mutation operations. Restricting the effects of cross over and mutation to the low order bits results in short, low order schemata.


Previous Table of Contents Next

Copyright © CRC Press LLC