Neat4J Matt Simmerson Introduction The goal of this controller was to find a solution with no domain knowledge, i.e.

its was given no information about how to drive a car, and certainly no information about racing fast around a track. The underlying algorithm is based on NEAT (Neuro Evolution of Augmenting Technologies), specifically an updated version of my implementation of it, Neat4J (http://neat4j.sourceforce.net). NEAT essentially allows the topology of a neural network to be described through the use of genetic algorithms. The idea behind this is to start with a simple topology, and let genetic algorithms complexify the network topology to find complex and new solutions whilst attempting to solve the underlying domain problem. The genetic algorithm supporting NEAT, mandates that speciation be used as a means of fitness sharing and hence preventing any one species completely dominating the evolutionary cycle. Neat4J is my Java implementation of the NEAT algorithm based on the original papers by Kenneth Stanley and Risto Miikkulainen. There are a couple of differences to their original implementation in that Neat4J can start in one of two modes; feature search or fully connected. With feature search, the initial topologies have each output connected to one of the inputs at random. For example there were 29 input nodes and 3 output nodes meaning initially 3 unique inputs were connected to one of the 3 outputs nodes. Essentially, the effective topology is 3 inputs and 3 outputs with 3 links. An initially fully connected is simply every input node is connected to every output node. The second difference is that Neat4J implements a concept of feature nodes. These nodes are part of the genome but describe no part in the network topology. They are useful for evolving values that are pertinent to domain, in this instance the minimum and maximum engine revs, per gear, that indicate a gear change up or down; they are still part of the same GA and evolve at exactly the same time as the rest of the genome. Both the above differences are available in the currently released version of Neat4J. The differences between the new version (yet to be released) and the original are several. First, it has been rewritten to take advantage of Java 5 e.g. generics. Second, there has been extensive work on creating a framework that integrates easily with any software as far as is possible. For example, in this competition I wrote 2 small classes to interface with the given Race Client software and provide a fitness function. There is also support for custom written environments within Neat4J e.g. evolution of Ant like behaviour. All the configuration files and saved phenotypes are now stored in XML rather than proprietary property and Java serialised files.

Inputs There were 29 input and 3 output nodes available for evolution. As stated previously, initially there were 3 randomly selected inputs connected to the 3 outputs. The 29 inputs available were in order: • Current Speed • Angle to Track Axis • The 19 track sensors • Track Position (with respect to Left and Right edges) • Current Gear Selection • The 4 wheel spin sensors • Current RPM • Lateral speed across the track The inputs were created to be in the range [-1,1] or [0,1] depending on the sign of the input. This prevents large speeds (up to 360km/h) completely swamping the input signals. To prevent over-reaction by the car, the track sensors were smoothed over a history of the last 10 values for that sensor. However, there did not appear to be any great difference in the training effect of changing this value up or down. Outputs The outputs in order were: • Steering • Change Gear Indicator • Power (equivalent of the accelerator/brake pedals) The steering value was modified to give a [-1,1] output from the Sigmoid node, the change gear indicator, [0,1], attempted to change up if the output was <= 0.5 and down if the value was > 0.5. The selected gears were limited to 1 to 6 for forward gears and –1 for reverse. All output nodes used the Sigmoid activation function, and any evolved hidden neurons used Tanh. The input nodes were linear. Training The car was trained on just one track, G3, which was not one selected for the initial valuation. This track was selected as it had some varying turns i.e. left, right, curve and also straights of varying length into the various corners. Ideally, for the sake of generalisation, I would have liked to train the cars over several tracks; however, the current version of the Torcs environment prevented this. The mutator and crossover functions were exactly that as described by the NEAT algorithm. The parent selector function was a tournament round where the allowed parents were pitted against each other with the winner taking the spoils (i.e. the fittest). Recurrency was allowed in this experiment.

The car was tested on the track for a maximum of 4000 steps, equivalent to around 80 seconds. If the car sustained more than 100 points of damage (out of a maximum of 10000), the evaluation for that car was aborted. The overall fitness of the car was calculated thus: fitness = (2 * distance) – damage – outside + max speed + Constant • • • The distance was the value reported by Torcs engine and could be +ve and – ve. Damage was the value reported by the Torcs engine and has a maximum value of 10000. Outside is a measure of how much the car stayed on the track. This was necessary to circumvent the car using the barriers as a guide – with no damage penalty. Until this variable was added, it prevented any really successful controllers. The edge of the track was represented by –1 (right) and +1 (left). The outside value was calculated as 0 if the car was in these limits, and Mod(track position) – 1 for values outside these edges. Max speed was calculated throughout the car’s trial based on the speeds reported by the Torcs engine. This was to try and reward fast cars early on that crashed, as the name of the game is speed. The last value, Constant, was used to ensure the fitness value was always positive and was set to 10000. Negative values were created when cars went the wrong way round the track and early high speed crashes, resulting in large damage.

• •

Whilst the 4000 time steps, used for evaluation, represented 80 seconds real time, the Torcs engine allowed a non-gui version, which was evaluated in around 3-4 seconds. The population was set to 100, with 3 species, meaning each evolutionary epoch lasted 400 seconds. The entry, for Neat4J, was select from the winning phenotype from the 170th generation i.e. nearly 19 hours on my dual core laptop. Results After training, the car was tested on the initial 3 evaluation tracks, given by the organisers. The general behaviour of the car was excellent providing advanced driving behaviours such as skid correction at high speed. The gears selection strategy evolved was somewhat simplistic, i.e. change up to 6th (top gear) as quickly as possible and then keep it there (reminiscent of some human drivers I know!). The only real disappointment as far as the racing car was concerned was it did not really take any racing lines. However, remember, the controller had no knowledge of what its domain represented. I did try other approaches including using the feature genes in the manner described above; whilst it performed better than the competition entry, by some 2Km, over the 3 evaluation courses, the driving was much more ragged as it used some extreme gear changes (6th to 1st @ 250Km/h), and I don’t believed it would have performed so well for the final evaluation.

Network The evolved network is shown below: 1 2 3

49 7

48 1

36 6

14 0

23 2

26 5

2

8

1 1

1 7

1 8

2 1

2 2

2 4

2 5

Note, not all the inputs are used – in fact less than half are used. The input and output nodes correspond to inputs 1-29 and outputs 1-3. It is also, interesting to note, only 5 of the available track sensors are used and only 2 of the wheel spin sensors. For such a complex domain, the evolved network is very simple. Finally, I’d like to thank the organisers of this competition, it was well run and lots of fun.