How It All Works

Next: What You Have to Up: Overview Previous: Class/Object Hierarchy Contents

Subsections

How It All Works

Here I will take you through the perlgp-run.pl script and the sine curve fitting example, also discussed in Section 9.2. Some code is excluded for clarity. This script is always run from the ``experiment directory'', the one containing your locally modified Algorithm, Individual and Population classes (see Section 2.5).

Step-by-step: perlgp-run.pl

#!/usr/bin/perl -w

use lib '.', $ENV{PERLGP_LIB} ||  die "
  PERLGP_LIB undefined
  please see README file concerning shell environment variables\n\n";
use Population;
use Individual;
use Algorithm;
use Cwd;

my ($exptid) = cwd() =~ m:([^/]+)$:;

Here we make sure that the current directory and PERLGP_LIB are in the Perl include path. Then the three main classes are loaded and the experiment id is extracted from the trailing part of the current directory. In this case $exptid would be `sin'.

my $population = new Population( ExperimentId => $exptid );
$population->repopulate();

Then we make a new Population object with this experiment id, and ask it to repopulate itself from disk (only does something if this is a restarted run).

while ($population->countIndividuals() < $population->PopulationSize()) {
  $population->addIndividual(new Individual( Population => $population,
                                             ExperimentId => $exptid,
                       DBFileStem => $population->findNewDBFileStem()));
}

This bit fills up the Population with brand new Individuals, until it is full. Note that they are told the identity of their parent Population, but this isn't used by the Individuals in the standard PerlGP system.

my $algorithm = new Algorithm( Population => $population );

Now we create an Algorithm object which does need to know about a Population, because this is the Population that the (genetic) Algorithm will manipulate.

$algorithm->run();

Then we ask the algorithm object to run itself, and this is the last thing that perlgp-run.pl does. So you can see that perlgp-run.pl is just a wrapper which constructs some objects and tells them to get on with it...

Step-by-step: Data input

Now into TournamentGP.pm we go... The run() method is basically a for loop around calls to the tournament() method. But first, the data is loaded if needed with

  $self->loadData() unless ($self->TrainingData());

In other words, if the TrainingData attribute is undefined, then loadData() is called (which as you will see, will set both TrainingData and TestingData).

loadData() is just a wrapper to call the user-defined loadSet() method on training and testing data. In the sine curve example, the data is read from files TrainingSet and TestingSet (there is no real need for this, the data could be generated on-the-fly, but it is more transparent).

If you look at loadSet() in Algorithm.pm in the sin demo directory, you'll see that an array is filled as the input file is read. Each element represents one training point and an `x' and ``known'' `y' value are stored in an anonymous hash (wasteful but transparent). loadSet() returns a reference to this array, and this scalar reference will be stored in the attribute TrainingData or TestingData. We have now loaded up the data structures for training and testing data into the Algorithm object.

sub loadSet {
  my ($self, $file) = @_;
  my @set;
  open(FILE, $file) || die "can't read data set $file\n";
  while (<FILE>) {
    my ($x, $y) = split;
    push @set, { 'x'=>$x, 'y'=>$y };
  }
  close(FILE);
  return \@set;
}

Step-by-step: Tournaments I, Fitness Evaluation

So remember that run() calls tournament() many times. The most important steps in tournament() are outlined below.

First a group of Individuals is selected from the Population at random:

  my @cohort = $self->Population()->selectCohort($self->TournamentSize());

Then for each Individual in the cohort, the fitness on the training examples is calculated (simplified):

    my $fitness = $ind->Fitness();
    if (not defined $fitness) { # fitness is not cached
      $ind->reInitialise();     # evaluate evolved subroutines
      $fitness = $self->calcFitnessFor($ind, $self->TrainingData());
      $ind->Fitness($fitness);  # set new fitness
    }

The method reInitialise() calls evalEvolvedSubs() which expands the code tree into Perl code and evaluates it, which redefines object methods for the Individual including, importantly, evaluateOutput().

The method calcFitnessFor() is a wrapper which calls evaluateOutput() on the Individual object and then calls the Algorithm method fitnessFunction() on the output.

Let's take a closer look at evaluateOutput(). You can get a copy by either looking at Grammar.pm or running perlgp-rand-prog.pl, which will generate a random version from the grammar.

sub evaluateOutput {
  my ($self, $data) = @_;
  my ($x, $y, $z, @output);
  foreach $input (@$data) {
    $x = $input->{x};
    # begin evolved bit

    $y = (2 + ($x - pdiv(5,2)));

    # end evolved bit
    push @output, { 'y'=> $y };
  }
  return \@output;
}

Note that this has some similarities to loadSet(); mainly in the way it fills an array with anonymous hashes and returns a reference to it. This argument passed to this method is the data structure that loadSet() returned. The data structure returned by evaluateOutput() is then passed to the fitness function.

In most cases, fitnessFunction() also requires the ``correct answer'' too. You will see that in this example, loadSet() loads the ``known'' value into the input data structure. It is critical that the value is not accessed by the evolving code, otherwise the trivial result $y = $y would occur, and we would be wasting our time. So long as is not mentioned in the grammar, this will not occur. The main point here is that the input data structure is not purely input in nature, it will often also contain the known or observed output too. The use of the word ``input'' here is not perfect, and I apologise.

The only requirement that PerlGP makes on the data structures passed between these methods is that it must be a scalar variable. You will usually use a reference to an array or a hash, but look in the pi demo (Section 9.1) where you will see that the scalar is a number.

Step-by-step: Tournaments II, Selection and Reproduction

Now that we have a fitness value for every Individual in the tournament, the rest is standard. The tournament members are sorted on fitness and the best ones get a chance to reproduce (by calling the Individual method crossover()) and replace the worst ones. The offspring are mutated.

Next: What You Have to Up: Overview Previous: Class/Object Hierarchy Contents

Bob MacCallum 2003-02-03