Please change directory to 'PERLGP_BASE/demos/sin' and look at the README file before running this demo.
Find an approximation to using arithmetic operations plus,
minus, multiply and protected division, and the integers 1,2,3,5,7 and
the input
.
The fitness measure is basically the root mean squared error or deviation of the predicted function from the target function, measured on a set of randomly sampled points, between -1.0 and +1.0 initially, but the range and number of points increases during the run (see below).
Here, loadSet() reads in data (randomly sampled values
from the files TrainingSet or TestingSet) which is
generated by a simple script ./generate_data.pl (which gets run by the
refresh() method, but more on this later). The general flow
of data is discussed in detail in Section 2.4.
Two subtle ``hacks'' seemed to be needed to get well behaved GP runs
on this problem. Domain specific knowledge was used in both cases.
Firstly, the error function incorporates a cap on the per-sample
error. If the squared difference between the predicted and known
values is greater than 1.0, it is counted as 1.0. This may not be
essential, but with prior knowledge of the Taylor expansions of
trigonometric functions we know that at
infinity the
approximation also deviates infinitely from the
axis. So this
error capping should help not to punish functions containing high
powers of
. The second hack is to start training on a small range
of
values centred around zero and then increase the range
(attribute Range) as the evolved solutions reach a certain
target fitness. Again we are using domain specific knowledge: we know
that the early terms in the Taylor expansion best fit the sine
function for values of
around zero. You can play around with more
complex target functions (for example, try
) where
there is no trivial solution early on.
Take a look at the refresh() method in Algorithm.pm, this is called on the first tournament and subsequently every RefreshInterval tournaments. When the best training fitness goes below a threshold, the Range is increased and the data is regenerated, and loaded up again (and the whole Population's fitnesses are reset). An ``anti-stagnation'' measure is also implemented in refresh(): if more than 1000 tournaments go by without an improvement in fitness, the data is resampled and reloaded (but Range does not increase).
This demo also uses self-adapting NodeXoverProb and NodeMutationProb attributes.
The various devices employed here may or may not be necessary to get a good result, but they illustrate some of the ways you can customise PerlGP.
Note: the fitness function and data structures are not really optimised for speed (wasteful use of hashes and so on). You can read about speed comparisons against lilgp on the same problem in my EuroGP 2003 paper (available on request and later on the web). It's always a good idea to have something else to do while jobs are running however...