| Previous | Table of Contents | Next |
BACKGROUND
As discussed in the introduction, the objective of most neural networks is to estimate a function
from a training set of representative input/output pairings. Qualitatively, the RBF network does this by forming localized bumps or response regions within the input space. The superposition of the these local response regions forms a response surface that is of an order higher than the dimension of the input vector and spans the space covered by the input training patterns.
By definition, a radial basis function is one which decreases (or increases) monotonically away from a central point, thereby giving it an inherent bump form. Classic kernel functions (or in the case of the RBFNN, neurons) that exhibit this propensity are the Gaussian, Cauchy, and the Inverse Multiquadric. These forms can be written generally as [5,6]:



The form of z determines the type of radial scaling, or equivalently, the extent of the region influenced by the RBF.
Figure 10.1 delineates the Gaussian response function. To reiterate, the RBFNN positions a collection of these RBFs (in this case, Gaussians) through-out the space covered by the input training patterns. The parameter
specifies the location of the RBF within the input space (
has the same dimension as the input vector
) and the parameter
determines the width of the local function. Thus, a given RBF will be centered at μj within the input space and have a receptive field which is proportional to
. Moreover, it will give a maximum response for input vectors,
, which are nearest the RBF center,
.
Figure 10.1 Gaussian function.
By arranging an assortment of these receptive fields, response areas are created which sufficiently cover the input space; sufficient in the sense that the RBFNN can approximate the underlying function to within some pre-defined error criteria. More specifically, a complex decision hypersurface is constructed through the overlapping of the localized kernel regions. With a developed approximation surface, the RBFNN estimates an output, for an incoming input case, by first evaluating each of the kernel functions (in other words, determining where the input vector lies on the hypersurface) and then forming a weighted linear summation of their responses. The difficulty arises not from the logical evaluation of an input, but rather the establishment of the network parameters for the hypersurface construction, namely: center positions (μj), kernel widths (σi), and the weighting coefficients for the summation of the individual kernel responses.
The development of an RBFNN is done in a two-part learning scheme known as hybrid learning (Figure 10.2). The initial forward connections of the network contain the RBF centers μj, obtained through unsupervised assimilation, followed by an output layer of weighting parameters, formed through supervised instruction. Training in the unsupervised mode is done without a pre-defined learning goal; input categorization and learning must be done using correlations within the input training data in contrast to feedback from a teacher or critic. For the RBFNN, the learning scheme essentially clusters the training inputs vectors and specifies where to position the RBF centers so that the desired response coverage is obtained. Thus, via unsupervised learning, the RBF center positions (the forward connections of Figure 10.2, μj) are chosen a priori and remain fixed throughout the establishment of the weighting coefficients (wi).
The rearward connections, comprising the output layer of the network in Figure 10.2, specify the weighting (or regression) coefficients which are trained in a supervised fashion. Supervised means that the learning is based on comparison of the network output with the known correct answers.
For an RBFNN with a single layer of kernel functions, given that the basis function centers (μj) are fixed, the optimal weight array for the output connections which gives the best functional mapping can be found using the least squares normal equation developed in multiple linear regression theory. [5]
With all the parameters set, the fundamental mapping can then be written as:

Thus, for an input vector
the solution f(
) is a weighted linear summation of each RBFs response to
. Kernel functions that have centers within the region of
will give the largest responses, whereas those farthest away will give negligible contributions to the series formed by Equation (10.4). Moreover, the kernel function responses (gi(x)) will be bounded between 0 and 1 with the assigned weights (wi) specifying the neurons heights.
The following sections detail the procedures of RBF center selection and width estimation.
Figure 10.2 RBFNN architecture.
| Previous | Table of Contents | Next |