In part 1 of this project we have created a data structure to represent formulaic alphas and have created a bunch of functions to manipulate and get info out of the alphas.
Besides that we’ve also come up with a way to measure how well our alphas are performing and generated a bunch of random alphas to start evolution on.
In this part we will be taking those random alphas and running a genetic algorithm on them to hopefully make them evolve and improve and give us better alphas that are good at predicting returns.
In the next part we will be looking at how well the genetic algorithm works, if we potentially need to add any improvements and what conclusion we can draw from the generated alphas.
Table of Content
Better Fitness
Crossover
Mutations
Genetic Algorithm
Final Remarks
Better Fitness
First we are gonna improve the simple fitness function we came up with in part 1.
Instead of having a penalty term for the depth of the alpha we instead have a min_depth and max_depth that we allow. If the depth of the alpha is outside that we set the fitness to 0.
We also set the fitness to 0 if the alpha contains any nans (ignoring the nans we get in the beginning of the data when calculating something on a rolling window and ignoring the last nan we get because there is no return for the last data point).
The fitness function now becomes:
next_return = data["close"].pct_change(1).shift(-1)
def fitness(tree, min_depth, max_depth):
depth = tree_depth(tree, 0)
if depth < min_depth or depth > max_depth:
return 0
x = evaluate_index(tree, 0)
fitness = abs(np.corrcoef(x[40320:-1], next_return[40320:-1])[0][1])
if np.isnan(fitness):
return 0
return fitness
This max_depth term will prevent bloating of the alphas. Before this addition I’ve seen a lot of alphas that were of the form [sign, sign, sign, sign, sign, sign, …] for example. It will also help prevent overfitting to a degree by not considering alphas that are too complex.
By setting a min_depth we also eliminate alphas that are too simple. I will usually set min_depth to 2 because I don’t want alphas that just contain a single piece of data like [return] but want actual formulaic alphas. So the simplest type of alpha I could get are alphas like [sign, return].