| Previous | Table of Contents | Next |
Clustering and Center Selection
RBFNNs fix the center positions (μjs) and use simple multi-linear regression to obtain the optimal output weight array. To give this linear RBFNN more flexibility, a subset of RBF centers should be chosen which can best explain the variance in the dependent (or output) variables; in other words, centers are chosen so that the input space is adequately covered. Considering this, it seems natural to simply place a center over every input training point, μj = xj for j=1 to j=p (p designates the number of training cases). This choice would result in a network that memorizes all the training pairs and on occasion an added impropriety of exhibiting extremely poor generalization ability. The large number of neurons provides too many free parameters for the linear regression phase, thereby making the network oversensitive to the training set details. Furthermore, if the quantity of available training sets is large, then additional computational speed and memory problems will result, further making the RBF infeasible.
To counter these effects, the smallest subset of kernel function centers is selected that can perform the necessary mapping with a sufficient amount of accuracy. There have been many heuristics developed that can accomplish this subset selection process. Some start by adding one neuron at a time until a pre-specified error goal is met. Others start with centers at all points in the initial input data and then make predictions on what subset will prove to be most efficacious.
RBF Width Estimation
The width parameter (σ in Figure 10.1) doesnt just control the shape of the RBF, it also specifies where and how much it overlaps with other RBFs spaced throughout the hypersurface. Physically, it determines the shape of the response surface (or the neurons receptive field). Thus, if an extremely nonlinear problem is encountered, a greater amount of flexibility built into the surface construction will lead to a better performing network.
Recall the RBFs detailed in Equations 10.1-10.3. The parameter z introduces this flexibility by determining the type radial spread. Assuming that the network input vectors are n-dimensional and there are in neurons in the network, then z can take on the following forms (taken from [6]).





The choice of r for each neuron is usually left to random decision. There are some methods which allow for local tuning of the parameter, like those found in [4], but none account for the simultaneous evolution of the hypersurface. The next section provides an overview of what has been done in the past for these parameter estimations, with the following section outlining the genetic algorithm approach specific to the problem stated at the outset of the chapter.
PREVIOUS RBF PARAMETER SELECTION APPROACHES
Non-genetic Approaches
Non-genetic approaches to the problem of center (μj) selection typically separate a subset of centers from the training input patterns (x). A very crude method is to evolve one hidden unit at a time during the training processes, with the center being randomly selected from the input training vectors. The addition of hidden units would continue until some training error goal has been met. A more elegant approach uses a k-means clustering algorithm that groups together input patterns belonging to the same class region [4,7]. The vector mean positions of these regions are then used as the RBF center subset.
References [5,6] make use of statistical error prediction methods. For example, Bayesian Information Criterion and Generalized Cross Validation can be used to select input patterns from the training set that have the most probability of reducing the mapping error during the supervised training phase. This is done by looping through the training set and assigning scores to each input pattern, with the best score being selected. The selection process continues until the error prediction heuristic reaches a minimum. In other words, the addition of further RBF centers will not assist in the mapping error reduction.
Cascade-correlation can also be used with an RBF architecture for the evolution of one hidden unit at a time [8]. The process is actually a gradient descent procedure that places the RBF center in a position that reduces the current training error the most.
Non-genetic RBF width estimation procedures are limited. Most rely on a random search, sort of brute force, method for width establishment. Reference [4] does provide local tuning procedures, but it doesnt consider the evolution of a whole network.
| Previous | Table of Contents | Next |