next up previous contents
Next: Attributes & Variables Up: Class: Algorithm Previous: Class: Algorithm   Contents

Subsections

Methods

Methods marked with * must be implemented in Algorithm.pm for each new application/experiment. These are the routines for reading/writing data, fitness function etc.


new()

Arguments: Population => OBJECT, ATTRIBUTE-HASH (optional)
Return value: OBJECT
Defined in: PerlGPObject.pm
Mainly called by: perlgp-run.pl
Usually calls: _init()
Relevant attributes: all

This is the constructor, but you probably don't need to call it directly, except in specialist applications. A Population object must be given in the argument hash. Other attributes may also be specified in the argument hash, but note that the ``usual'' way to set these attributes is to edit Algorithm.pm in the experiment directory.


_init()

Arguments: ATTRIBUTE-HASH
Return value: void
Defined in: Algorithm.pm, TournamentGP.pm, SupervisedLearning.pm
Mainly called by: constructor
Usually calls: _init() cascade (see below)
Relevant attributes: all, and in particular: BestFitness, WorstPossibleFitness, FitnessDirection, TournamentLogFile, Tournament

This method sets the attributes. You should customise the object attributes by editing the _init() method in Algorithm.pm. Default settings can be found in TournamentGP.pm and SupervisedLearning.pm. The _init() methods are called in a cascade in the following order Algorithm::_init(), TournamentGP::_init(), SupervisedLearning::_init(). Attributes set at the beginning of the cascade override those defined at the end.

TournamentGP::_init() performs some other initialisation. If WorstPossibleFitness has not already been defined in Algorithm.pm, it is set to zero or an arbitrarily large number (1e99) according to the FitnessDirection. The attribute/variable BestFitness is used to store the fitness of the fittest Individual during the run, and is initialised to WorstPossibleFitness. If the file `results/tournament.log' file exists (for example after a restart), Tournament (the current tournament number) and BestFitness are updated from the values in last line in this file. If the `results' directory does not exist it is created, likewise the directory KeepBestDir if required.


loadSet()*

Arguments: STRING filename
Return value: SCALAR (usually a reference to training/testing data structure)
Defined in: Algorithm.pm
Mainly called by: loadData()
Usually calls:
Relevant attributes:

You must redefine this method to load up training or testing data into a suitable data structure, and return a scalar variable (which is usually going to be a reference to a more complex data structure). If your data is in files then you can use filename; see Section 3.1.4 for more details, and see the sine curve fitting demo in Section 9.2 for an example. You don't have to read from a file of course; see the pi demo in Section 9.1 for an example.


loadData()

Arguments: none
Return value: void
Defined in: SupervisedLearning.pm
Mainly called by: run(), refresh()
Usually calls: loadSet()
Relevant attributes: TrainingSet, TestingSet, TrainingData, TestingData

A simple wrapper to call loadSet() twice, once each on the training and testing data. The filenames passed to loadSet() are the attributes TrainingSet and TestingSet. The two return values (usually references to data structures) are stored in the attributes TrainingData and TestingData.


fitnessFunction()*

Arguments: Input => SCALAR, Output => SCALAR, TimeTaken => NUMBER, CodeSize => NUMBER
Return value: NUMBER
Defined in: Algorithm.pm
Mainly called by: calcFitnessFor()
Usually calls:
Relevant attributes: FitnessDirection

Calculates the fitness given the Input data (which should also contain the desired/true output value), and the Output data (which is generated by Individual::evaluateOutput()). The Input and Output scalars will usually be references to data structures containing multiple data points, which you will have to loop through (see Section 9.2 for an example). You may ignore CodeSize and TimeTaken. The fitness value may go ``up'' or ``down'' as you wish, but you must set the attribute FitnessDirection accordingly.


saveOutput()*

Arguments: Filename => FILENAME, Input => SCALAR, Output => SCALAR, Individual => OBJECT (optional)
Return value: void
Defined in: Algorithm.pm
Mainly called by: tournament()
Usually calls:
Relevant attributes:

This method saves the output (usually from the best-of-tournament Individual) in FILENAME for further offline analysis, for example plotting a regression curve. Both Input and Output data are required as in Section 3.1.5, so that input variables, output variables and the desired/true outputs can be saved. If the Individual object is passed to this function, you can print out some information about that too. See the sine curve demo in Section 9.2 for an example of how to implement this method.

This method should perhaps have been put in the Individual class, but it was put in the Algorithm class alongside loadSet() which also deals with data formatting.


refresh()

Arguments: none
Return value: void
Defined in: TournamentGP.pm
Mainly called by: run()
Usually calls:
Relevant attributes: RefreshInterval, BestFitness

Redefine this method in Algorithm.pm if you want to re-read or modify the training/testing data at certain intervals (see attribute RefreshInterval). The sine demo in Section 9.2 uses this technique. The default refresh() does nothing. It is usually a good idea to reset BestFitness if the training data changes.


run()

Arguments: none
Return value: void
Defined in: TournamentGP.pm
Mainly called by: perlgp-run.pl
Usually calls: tournament(), loadData(), refresh(), stopCondition(), Population::{backup(),immigrate(),emigrate()}
Relevant attributes: Tournaments, Tournament, RefreshInterval, ComplexityInterval, ComplexityLogFile, PopulationBackupInterval, EmigrateInterval, ImmigrateInterval

This method will try to call the tournament() method Tournaments times, regardless of the starting tournament number (in the case of a restart). If RefreshInterval is non-zero, refresh() is called before any tournaments are performed. Then loadData is called if TrainingData has not already been loaded.

Before each call to tournament(), refresh() is called if the tournament number (Tournament) modulus RefreshInterval equals zero. Afterward each tournament, if stopCondition() returns true, the main loop is terminated, and no more tournaments will be performed.

The ComplexityLogFile is updated every ComplexityInterval tournaments with size in characters of the Perl code of a random 10% of the population after compression with gzip. This gives an estimate of the complexity of the population of evolved programs. The 10% sample is made using perlgp-sample-pop.pl.

At intervals of PopulationBackupInterval, the Population object belonging to this Algorithm object is told to perform a backup. Immigration and emigration are handled similarly. See Section 5 for more details.


tournament()

Arguments: none
Return value: void
Defined in: TournamentGP.pm
Mainly called by: run()
Usually calls: Population::selectCohort(), Individual::reInitialise(), calcFitnessFor(), makeFamilies(), crossoverFamily(), extraLogInfo(), saveOutput()
Relevant attributes: TournamentSize, TournamentKillAge, ForkForEval, BestFitness, TournamentsSinceBest, WorstPossibleFitness, LogInterval

This is where it all happens. First a cohort1 of size TournamentSize is generated by calling the Population's selectCohort() method. The Individuals in the cohort whose Age is greater than or equal to TournamentKillAge are removed and labelled ``old'', and those that remain are called ``young''.

The fitnesses of each young Individual are either retrieved from memory (see Section 4.1.44 and AlwaysEvalFitness) or calculated afresh with a call to calcFitnessFor(). Before calcFitnessFor() is invoked, the reInitialise() method is called on the Individual, to redefine any evolved methods, such as evaluateOutput(). This is important, because otherwise all Individuals would generate the same outputs.

There is an option to restrict run-time with an alarm() call (see calcFitnessFor and AlarmTime). Sometimes this causes instability in Perl when the alarm call arrives (particularly when it interrupts the evaluation of regular expressions). A work-around is to do the fitness calculation in a forked process by setting the ForkForEval attribute. Before any fitness evaluation is performed, the fitness of the Individual is set to WorstPossibleFitness, in case something goes wrong during forking or as a result of alarm calls.

The Age of each young Individual is incremented by one every tournament.

The young Individuals are sorted by fitness, so that the best Individuals are at the top of the list. The old Individuals are added back to the bottom of the list. The method makeFamilies() is called to convert the sorted cohort into an array of ``families'' each containing four individuals: two parents and two unfit Individuals which will be replaced when the parents reproduce (see crossoverFamily()).

A number of log-files are created or appended every LogInterval tournaments. The most important is `results/tournament.log', where a summary of the run's progress through time is stored. Table 1 explains the format of this file. Additional columns can be specified by redefining extraLogInfo() for both the Algorithm and Individual objects. The output of the best-of-tournament Individual for both training and testing data are saved using saveOutput() (to `results/recent.training.output' and `results/recent.testing.output') and the Perl code is also written to a file (`results/recent.pl').


Table 1: Descriptions of columns in `results/tournament.log'
column description
1 unix time when log was updated
2 tournament number
3 BestFitness - best (training) fitness seen so far
4 (training) Fitness of best-of-tournament Individual
5 test set fitness for the same Individual as in column 4
6 result of best-of-tournament Individual::Age()
7 result of best-of-tournament Individual::getSize()

If the best-of-tournament Individual has better fitness than BestFitness (this usually means this is the best Individual seen so far), the files discussed above are always saved (as `results/best.*'), even if logging is not being done this tournament. BestFitness is updated and the counter TournamentsSinceBest is reset to zero.


calcFitnessFor()

Arguments: OBJECT individual, SCALAR inputdata
Return value (array context): NUMBER fitness, SCALAR outputdata
Return value (scalar context): NUMBER fitness
Defined in: TournamentGP.pm
Mainly called by: tournament()
Usually calls: Individual::evaluateOutput(), fitnessFunction()
Relevant attributes: WorstPossibleFitness, AlarmTime

This method just runs individual->evaluateOutput() on inputdata and then runs fitnessFunction() on the returned output. There are a few extra details; firstly, if AlarmTime is non-zero, a system alarm call is set for AlarmTime seconds. The call to evaluateOutput() is protected within an eval{ } block, so that run-time errors or alarm calls do not terminate the entire GP run, but just the evaluation within the block. The call to evaluateOutput() is also timed using the times() Perl function/system call, which gives precision to around 0.01s. The elapsed time is passed to fitnessFunction(). The result of individual->getSize() is also passed to fitnessFunction().


makeFamilies()

Arguments: ARRAYREF cohort, ARRAYREF families
Return value: void (but families is filled)
Defined in: TournamentGP.pm
Mainly called by: tournament()
Usually calls:
Relevant attributes: TournamentParents, MateChoiceRandom

The argument cohort is a reference to an array of Individuals (which are presumably sorted by fitness and age). This method generates ``families'' containing two potential parents and two Individuals which will be replaced by their offspring. First we remove the first TournamentParents elements from cohort and put them in an array called parents. Likewise we take the last TournamentParents elements from cohort and put them in an array called rip (from Rest In Peace). Then while these two arrays each contain at least two elements, we generate families as follows:

The array pointed to by families is filled with references to four-member arrays containing Parents 1&2 and Children 1&2. This routine does not check that families is empty.


crossoverFamily()

Arguments: ARRAYREF family
Return value: void
Defined in: TournamentGP.pm
Mainly called by: tournament()
Usually calls: Individual::crossover(), Individual::mutate()
Relevant attributes:

This method takes a single family (generated by makeFamilies()) and initiates the crossover mechanism (which is a method in the Individual class). The two offspring from the crossover overwrite the two Individuals which were selected to ``die'' and mutate() is called on each offspring.

However, crossover is not performed if the fitnesses of the two parents are numerically identical. If they are identical then the second parent is subjected to mutation, and the two Individuals that would have been replaced by offspring survive.


decideBetterFitness()

Arguments: NUMBER fitness1, NUMBER fitness2
Return value: BOOLEAN
Defined in: TournamentGP.pm
Mainly called by: tournament()
Usually calls:
Relevant attributes: FitnessDirection

A simple wrapper which returns true if fitness1 is greater than fitness2 if the FitnessDirection is `up'. If the direction is `down' then less-than is used.


stopCondition()

Arguments: none
Return value: BOOLEAN
Defined in: TournamentGP.pm
Mainly called by: run()
Usually calls:
Relevant attributes:

As supplied, this method always returns false, so a run will continue until Tournaments have been completed. If you override it to perfom some check, perhaps on BestFitness, then you can cleanly stop the run when some criteria have been met. Make sure you are not running perlgp-run.pl with the -loop option, or it will just start again!


extraLogInfo()

Arguments: none
Return value: STRING text
Defined in: TournamentGP.pm
Mainly called by: tournament()
Usually calls:
Relevant attributes:

If you need need more info about the progress of your Algorithm in `results/tournament.log', then override this method to return a string which will be printed out in the log. See parseExtraLogInfo() before you implement this function.


parseExtraLogInfo()

Arguments: STRING lastline
Return value: void
Defined in: Tournament.pm
Mainly called by: TournamentGP::_init()
Usually calls:
Relevant attributes:

If you have restartable runs and you have special variables/attributes that need to persist between restarts, then you can re-initialise them from the values last written to `results/tournament.log' by overriding this method. First make sure that the attributes are written to `results/tournament.log' by overriding extraLogInfo(), but make sure each value is preceded by a unique string. Then implement parseExtraLogInfo() to recover those values from the last line of the log file as in the example below:

sub extraLogInfo {
  my $self = shift;
  return sprintf "SpecialAttrib %3d", $self->SpecialAttrib();
}

sub parseExtraLogInfo {
  my ($self, $lastline) = @_;
  if ($lastline =~ /\s+SpecialAttrib\s+(\d+)/) {
    $self->SpecialAttrib($1);
  }
}

See Section 6.1.2 on the dual use of attribute names as method names if the use of SpecialAttrib() is not clear to you.


next up previous contents
Next: Attributes & Variables Up: Class: Algorithm Previous: Class: Algorithm   Contents
Bob MacCallum 2003-02-03