A field guide to genetic programming by Torben Buch

A field guide to genetic programming

3.5 Step 5: Termination and solution designation

approximation, GP runtime can be estimated by the product of: the number of runs R, the number of generations G, the size of the population P , the average size of the programs s and the number of fitness cases F . Typically, the number of generations is limited to between ten and fifty; the most productive search is usually performed in those early generations, and if a solution hasnâ&#x20AC;&#x2122;t been found then, itâ&#x20AC;&#x2122;s unlikely to be found in a reasonable amount of time. The folk wisdom on population size is to make it as large as possible, but there are those who suggest using many runs with much smaller populations instead. Some implementations do not require arbitrary limits of tree size. Even so, because of bloat (the uncontrolled growth of program sizes during GP runs; see Section 11.3), it is common to impose either a size or a depth limit or both (see Section 11.3.2). Sometimes the number of fitness cases is limited by the amount of training data5 available. In this case, the fitness function should use all of it. (One does not necessarily need to use verification or holdout data, since over-fitting can be avoided by other means, as discussed in Section 13.12, page 140.) In other cases, e.g. 22-bit even parity, there can almost be too much training data. Then the fitness function may be reduced to use just a subset of the training data. This does not necessarily have to be done manually as there are a number of algorithms that dynamically change the test set as the GP runs. (These and other speedup techniques will be discussed in Chapter 13, particularly Section 10.1, page 83.) It is common to record these details in a tableau, such as Table 4.1 on page 31.

3.5

Step 5: Termination and solution designation

The fifth preparatory step consists of specifying the termination criterion and the method of designating the result of the run. The termination criterion may include a maximum number of generations to be run as well as a problem-specific success predicate. Typically, the single best-so-far individual is then harvested and designated as the result of the run, although one might wish to return additional individuals and data as necessary or appropriate for the problem domain.

typically rely more on mutation than crossover for their primary search mechanism. 5 Training data refers to the test cases used to evaluate the fitness of the evolved individuals.