Symbolic regression of sine function

Next: Compound interest Up: Demos Previous: Approximation of pi Contents

Subsections

Symbolic regression of sine function

Please change directory to 'PERLGP_BASE/demos/sin' and look at the README file before running this demo.

Problem definition

Find an approximation to $\sin 3x$ using arithmetic operations plus, minus, multiply and protected division, and the integers 1,2,3,5,7 and the input .

The fitness measure is basically the root mean squared error or deviation of the predicted function from the target function, measured on a set of randomly sampled points, between -1.0 and +1.0 initially, but the range and number of points increases during the run (see below).

PerlGP approach

Here, loadSet() reads in data (randomly sampled values from the files TrainingSet or TestingSet) which is generated by a simple script ./generate_data.pl (which gets run by the refresh() method, but more on this later). The general flow of data is discussed in detail in Section 2.4.

Two subtle ``hacks'' seemed to be needed to get well behaved GP runs on this problem. Domain specific knowledge was used in both cases. Firstly, the error function incorporates a cap on the per-sample error. If the squared difference between the predicted and known values is greater than 1.0, it is counted as 1.0. This may not be essential, but with prior knowledge of the Taylor expansions of trigonometric functions we know that at $\pm$ infinity the approximation also deviates infinitely from the axis. So this error capping should help not to punish functions containing high powers of . The second hack is to start training on a small range of values centred around zero and then increase the range (attribute Range) as the evolved solutions reach a certain target fitness. Again we are using domain specific knowledge: we know that the early terms in the Taylor expansion best fit the sine function for values of around zero. You can play around with more complex target functions (for example, try $\sin(3(x-1.23))$ ) where there is no trivial solution early on.

Take a look at the refresh() method in Algorithm.pm, this is called on the first tournament and subsequently every RefreshInterval tournaments. When the best training fitness goes below a threshold, the Range is increased and the data is regenerated, and loaded up again (and the whole Population's fitnesses are reset). An ``anti-stagnation'' measure is also implemented in refresh(): if more than 1000 tournaments go by without an improvement in fitness, the data is resampled and reloaded (but Range does not increase).

This demo also uses self-adapting NodeXoverProb and NodeMutationProb attributes.

The various devices employed here may or may not be necessary to get a good result, but they illustrate some of the ways you can customise PerlGP.

Note: the fitness function and data structures are not really optimised for speed (wasteful use of hashes and so on). You can read about speed comparisons against lilgp on the same problem in my EuroGP 2003 paper (available on request and later on the web). It's always a good idea to have something else to do while jobs are running however...

Next: Compound interest Up: Demos Previous: Approximation of pi Contents

Bob MacCallum 2003-02-03