In the previous article, we looked at cash and carry arbs on binance.

In this article, we are gonna look at an example strategy that uses alternative data, in this case, football matches for soccer clubs that have (liquid) publicly listed stocks.

If you end up enjoying this article consider getting the paid subscription, that way you can support me and I can write even more articles!

Here is a discount code for those interested: https://www.vertoxquant.com/62375e34

## Table of Content

Asset Universe

Alternative Data

A Simple Strategy

Analyzing the Data

Final Remarks

## Asset Universe

First of all, we need to decide which soccer clubs have a publicly listed stock and which of them are liquid enough.

From most liquid to least liquid those are:

Manchester United (MANU)

Juventus Turin (JUVE.MI)

Borussia Dortmund (BVB.DE)

Parken Sport (FC Copenhagen) (PARKEN.CO)

We can get daily data for those stocks from Yahoo Finance.

The data is available since August 10th 2012.

```
MANU = pd.read_csv("MANU.csv").set_index('Date')[["Open", "Close"]]
JUVE = pd.read_csv("JUVE.MI.csv").set_index('Date')[["Open", "Close"]]
BVB = pd.read_csv("BVB.DE.csv").set_index('Date')[["Open", "Close"]]
PARKEN = pd.read_csv("PARKEN.CO.csv").set_index('Date')[["Open", "Close"]]
```

```
df = pd.concat([MANU, JUVE, BVB, PARKEN], axis=1)
df.columns = ["MANU Open", "MANU Close", "JUVE Open", "JUVE Close", "BVB Open", "BVB Close", "PARKEN OPEN", "PARKEN Close"]
df.index = pd.to_datetime(df.index)
df.dropna(inplace=True)
```

Here are the normalized closing prices:

## Alternative Data

Now it’s time for the key part of this strategy: the data.

We are gonna get our data from here:

https://www.uefa.com/uefachampionsleague/history/seasons/2012/matches/

Gonna have 1 file per team which contains the following information on each match:

Date

Other Team

Points of both Teams

I’ve collected this data and uploaded it on the discord for paid subscribers.

Let’s start with just Manchester United for now:

```
MANU_Matches = pd.read_csv("MANU Matches.csv").set_index('Date')
MANU_Matches.index = pd.to_datetime(MANU_Matches.index)
```

Let’s plot a vertical line whenever there was a game on our price chart, maybe we notice something interesting.

This is a little messy, let’s plot wins and losses separately.

```
MANU_Wins = MANU_Matches[MANU_Matches['MyPoints'] > MANU_Matches['OpponentPoints']]
MANU_Losses = MANU_Matches[MANU_Matches['MyPoints'] < MANU_Matches['OpponentPoints']]
```

We do notice something really interesting, this could be a coincidence of course but it looks like there are some big moves during those times.

## A Simple Strategy

We buy the stock 2 days before a game and sell it at the end of game day.

The idea behind this is that fans of the respective soccer club will buy up the stock close to the game as they get hyped.

Here is the equity on Manchester United:

Looks good but we don’t have a lot of data points so here are the other teams:

### JUVE:

We would have 4x our money before fees with JUVE!

### BVB:

### PARKEN:

They didn’t play a lot of games but oh well, here is the equity:

Let’s combine them now:

```
MANU_Matches = pd.read_csv("MANU Matches.csv").set_index('Date')
JUVE_Matches = pd.read_csv("JUVE Matches.csv").set_index('Date')
BVB_Matches = pd.read_csv("BVB Matches.csv").set_index('Date')
PARKEN_Matches = pd.read_csv("PARKEN Matches.csv").set_index('Date')
MANU_Matches.index = pd.to_datetime(MANU_Matches.index)
JUVE_Matches.index = pd.to_datetime(JUVE_Matches.index)
BVB_Matches.index = pd.to_datetime(BVB_Matches.index)
PARKEN_Matches.index = pd.to_datetime(PARKEN_Matches.index)
```

And here is the combined backtest with 10bps of fees:

## Analyzing the Data

We know more than just the dates of the games, we know the opponent and the points.

Now while we won’t use the opponent data we can see if winning/losing/drawing has any impact on returns.

We are once again gonna run the analysis separately for each team:

```
MANU_rets = df['MANU Close'].pct_change(-1)
JUVE_rets = df['JUVE Close'].pct_change(-1)
BVB_rets = df['BVB Close'].pct_change(-1)
PARKEN_rets = df['PARKEN Close'].pct_change(-1)
```

```
MANU_Wins = MANU_Matches[MANU_Matches['MyPoints'] > MANU_Matches['OpponentPoints']]
MANU_Losses = MANU_Matches[MANU_Matches['MyPoints'] < MANU_Matches['OpponentPoints']]
MANU_Draws = MANU_Matches[MANU_Matches['MyPoints'] == MANU_Matches['OpponentPoints']]
JUVE_Wins = JUVE_Matches[JUVE_Matches['MyPoints'] > JUVE_Matches['OpponentPoints']]
JUVE_Losses = JUVE_Matches[JUVE_Matches['MyPoints'] < JUVE_Matches['OpponentPoints']]
JUVE_Draws = JUVE_Matches[JUVE_Matches['MyPoints'] == JUVE_Matches['OpponentPoints']]
BVB_Wins = BVB_Matches[BVB_Matches['MyPoints'] > BVB_Matches['OpponentPoints']]
BVB_Losses = BVB_Matches[BVB_Matches['MyPoints'] < BVB_Matches['OpponentPoints']]
BVB_Draws = BVB_Matches[BVB_Matches['MyPoints'] == BVB_Matches['OpponentPoints']]
PARKEN_Wins = PARKEN_Matches[PARKEN_Matches['MyPoints'] > PARKEN_Matches['OpponentPoints']]
PARKEN_Losses = PARKEN_Matches[PARKEN_Matches['MyPoints'] < PARKEN_Matches['OpponentPoints']]
PARKEN_Draws = PARKEN_Matches[PARKEN_Matches['MyPoints'] == PARKEN_Matches['OpponentPoints']]
```

```
print(f"MANU avg win return: {np.mean(MANU_rets.loc[MANU_Wins.index.intersection(MANU_rets.index)])}")
print(f"MANU avg loss return: {np.mean(MANU_rets.loc[MANU_Losses.index.intersection(MANU_rets.index)])}")
print(f"MANU avg draw return: {np.mean(MANU_rets.loc[MANU_Draws.index.intersection(MANU_rets.index)])}")
print()
print(f"JUVE avg win return: {np.mean(JUVE_rets.loc[JUVE_Wins.index.intersection(JUVE_rets.index)])}")
print(f"JUVE avg loss return: {np.mean(JUVE_rets.loc[JUVE_Losses.index.intersection(JUVE_rets.index)])}")
print(f"JUVE avg draw return: {np.mean(JUVE_rets.loc[JUVE_Draws.index.intersection(JUVE_rets.index)])}")
print()
print(f"BVB avg win return: {np.mean(BVB_rets.loc[BVB_Wins.index.intersection(BVB_rets.index)])}")
print(f"BVB avg loss return: {np.mean(BVB_rets.loc[BVB_Losses.index.intersection(BVB_rets.index)])}")
print(f"BVB avg draw return: {np.mean(BVB_rets.loc[BVB_Draws.index.intersection(BVB_rets.index)])}")
print()
print(f"PARKEN avg win return: {np.mean(PARKEN_rets.loc[PARKEN_Wins.index.intersection(PARKEN_rets.index)])}")
print(f"PARKEN avg loss return: {np.mean(PARKEN_rets.loc[PARKEN_Losses.index.intersection(PARKEN_rets.index)])}")
print(f"PARKEN avg draw return: {np.mean(PARKEN_rets.loc[PARKEN_Draws.index.intersection(PARKEN_rets.index)])}")
```

No clear effect there.

Let’s now see if there is some momentum in our returns.

If the day before the game is positive, is the next day also positive?

```
MANU_rets_before = df['MANU Close'].pct_change(1)
JUVE_rets_before = df['JUVE Close'].pct_change(1)
BVB_rets_before = df['BVB Close'].pct_change(1)
PARKEN_rets_before = df['PARKEN Close'].pct_change(1)
```

```
common_MANU = MANU_Matches.index.intersection(MANU_rets.index)
plt.scatter(MANU_rets_before.loc[common_MANU], MANU_rets.loc[common_MANU], alpha=0.4)
common_JUVE = JUVE_Matches.index.intersection(JUVE_rets.index)
plt.scatter(JUVE_rets_before.loc[common_JUVE], JUVE_rets.loc[common_JUVE], alpha=0.4)
common_BVB = BVB_Matches.index.intersection(BVB_rets.index)
plt.scatter(BVB_rets_before.loc[common_BVB], BVB_rets.loc[common_BVB], alpha=0.4)
common_PARKEN = PARKEN_Matches.index.intersection(PARKEN_rets.index)
plt.scatter(PARKEN_rets_before.loc[common_PARKEN], PARKEN_rets.loc[common_PARKEN], alpha=0.4)
total_before = np.concatenate([MANU_rets_before.loc[common_MANU], JUVE_rets_before.loc[common_JUVE], BVB_rets_before.loc[common_BVB], PARKEN_rets_before.loc[common_PARKEN]])
total_after = np.concatenate([MANU_rets.loc[common_MANU], JUVE_rets.loc[common_JUVE], BVB_rets.loc[common_BVB], PARKEN_rets.loc[common_PARKEN]])
slope, intercept = np.polyfit(total_before, total_after, 1)
plt.plot(total_before, slope * total_before + intercept)
plt.title("Return before game vs return after game")
```

Just from the scatterplot it does like there is an effect!

We can improve our strategy by shorting if there has been a negative return.

I wouldn’t keep being long though, most fans are gonna dump right after the game so you will likely lose money. But we will likely see that even though our idea might be valid our returns will suffer.

Switching from a full long to a full short and then closing the position will double our fees.

```
fees = 0.001
equity = [1]
for i in range(len(df)-1):
total = 0
MANU_ret = 0
JUVE_ret = 0
BVB_ret = 0
PARKEN_ret = 0
if df.index[i] in MANU_Matches.index:
if (df['MANU Close'].iloc[i] - df['MANU Close'].iloc[i-1])/df['MANU Close'].iloc[i-1] < 0:
MANU_ret = (df['MANU Close'].iloc[i] - df['MANU Close'].iloc[i-2])/df['MANU Close'].iloc[i-2] - (df['MANU Close'].iloc[i+1] - df['MANU Close'].iloc[i])/df['MANU Close'].iloc[i]
else:
MANU_ret = (df['MANU Close'].iloc[i] - df['MANU Close'].iloc[i-2])/df['MANU Close'].iloc[i-2]
total += 1
if df.index[i] in JUVE_Matches.index:
if (df['JUVE Close'].iloc[i] - df['JUVE Close'].iloc[i-1])/df['JUVE Close'].iloc[i-1] < 0:
JUVE_ret = (df['JUVE Close'].iloc[i] - df['JUVE Close'].iloc[i-2])/df['JUVE Close'].iloc[i-2] - (df['JUVE Close'].iloc[i+1] - df['JUVE Close'].iloc[i])/df['JUVE Close'].iloc[i]
else:
JUVE_ret = (df['JUVE Close'].iloc[i] - df['JUVE Close'].iloc[i-2])/df['JUVE Close'].iloc[i-2]
total += 1
if df.index[i] in BVB_Matches.index:
if (df['BVB Close'].iloc[i] - df['BVB Close'].iloc[i-1])/df['BVB Close'].iloc[i-1] < 0:
BVB_ret = (df['BVB Close'].iloc[i] - df['BVB Close'].iloc[i-2])/df['BVB Close'].iloc[i-2] - (df['BVB Close'].iloc[i+1] - df['BVB Close'].iloc[i])/df['BVB Close'].iloc[i]
else:
BVB_ret = (df['BVB Close'].iloc[i] - df['BVB Close'].iloc[i-2])/df['BVB Close'].iloc[i-2]
total += 1
if df.index[i] in PARKEN_Matches.index:
if (df['PARKEN Close'].iloc[i] - df['PARKEN Close'].iloc[i-1])/df['PARKEN Close'].iloc[i-1] < 0:
PARKEN_ret = (df['PARKEN Close'].iloc[i] - df['PARKEN Close'].iloc[i-2])/df['PARKEN Close'].iloc[i-2] - (df['PARKEN Close'].iloc[i+1] - df['PARKEN Close'].iloc[i])/df['PARKEN Close'].iloc[i]
else:
PARKEN_ret = (df['PARKEN Close'].iloc[i] - df['PARKEN Close'].iloc[i-2])/df['PARKEN Close'].iloc[i-2]
total += 1
if total != 0:
equity.append(equity[i] * (1 + (MANU_ret + JUVE_ret + BVB_ret + PARKEN_ret)/total - 4*fees))
else:
equity.append(equity[i])
```

Just as we thought, fees kill any potential extra profits and some more.

Next, let’s see if MyPoints - OpponentPoints is predictive power.

The intuition is that if we lose by a large margin then fans are happy and buy.

If we lose by a large margin fans are sad and sell.

```
MANU_Matches['diff'] = MANU_Matches['MyPoints']-MANU_Matches['OpponentPoints']
JUVE_Matches['diff'] = JUVE_Matches['MyPoints']-JUVE_Matches['OpponentPoints']
BVB_Matches['diff'] = BVB_Matches['MyPoints']-BVB_Matches['OpponentPoints']
PARKEN_Matches['diff'] = PARKEN_Matches['MyPoints']-PARKEN_Matches['OpponentPoints']
```

```
mean_ret_MANU = []
for diff in MANU_Matches['diff'].unique():
MANU_Matches_temp = MANU_Matches[MANU_Matches['diff'] == diff]
mean_ret_MANU.append(np.mean(MANU_rets[MANU_Matches_temp.index.intersection(MANU_rets.index)]))
results_MANU = pd.DataFrame({'MANU':mean_ret_MANU},index=sorted(MANU_Matches['diff'].unique()))
mean_ret_JUVE = []
for diff in JUVE_Matches['diff'].unique():
JUVE_Matches_temp = JUVE_Matches[JUVE_Matches['diff'] == diff]
mean_ret_JUVE.append(np.mean(JUVE_rets[JUVE_Matches_temp.index.intersection(JUVE_rets.index)]))
results_JUVE = pd.DataFrame({'JUVE':mean_ret_JUVE},index=sorted(JUVE_Matches['diff'].unique()))
mean_ret_BVB = []
for diff in BVB_Matches['diff'].unique():
BVB_Matches_temp = BVB_Matches[BVB_Matches['diff'] == diff]
mean_ret_BVB.append(np.mean(BVB_rets[BVB_Matches_temp.index.intersection(BVB_rets.index)]))
results_BVB = pd.DataFrame({'BVB':mean_ret_BVB},index=sorted(BVB_Matches['diff'].unique()))
mean_ret_PARKEN = []
for diff in PARKEN_Matches['diff'].unique():
PARKEN_Matches_temp = PARKEN_Matches[PARKEN_Matches['diff'] == diff]
mean_ret_PARKEN.append(np.mean(PARKEN_rets[PARKEN_Matches_temp.index.intersection(PARKEN_rets.index)]))
results_PARKEN = pd.DataFrame({'PARKEN':mean_ret_PARKEN},index=sorted(PARKEN_Matches['diff'].unique()))
results = pd.concat([results_MANU, results_JUVE, results_BVB, results_PARKEN], axis=1)
results = results.mean(axis=1, skipna=True)
```

There does seem to be an uptrend!

We can use this to improve our strategy without actually incurring higher fees, we simply hold for 1 day longer if we win.

```
fees = 0.001
equity = [1]
for i in range(len(df)-1):
total = 0
MANU_ret = 0
JUVE_ret = 0
BVB_ret = 0
PARKEN_ret = 0
if df.index[i] in MANU_Matches.index:
if MANU_Matches['MyPoints'][df.index[i]] > MANU_Matches['OpponentPoints'][df.index[i]]:
MANU_ret = (df['MANU Close'].iloc[i+1] - df['MANU Close'].iloc[i-2])/df['MANU Close'].iloc[i-2]
else:
MANU_ret = (df['MANU Close'].iloc[i] - df['MANU Close'].iloc[i-2])/df['MANU Close'].iloc[i-2]
total += 1
if df.index[i] in JUVE_Matches.index:
if JUVE_Matches['MyPoints'][df.index[i]] > JUVE_Matches['OpponentPoints'][df.index[i]]:
JUVE_ret = (df['JUVE Close'].iloc[i+1] - df['JUVE Close'].iloc[i-2])/df['JUVE Close'].iloc[i-2]
else:
JUVE_ret = (df['JUVE Close'].iloc[i] - df['JUVE Close'].iloc[i-2])/df['JUVE Close'].iloc[i-2]
total += 1
if df.index[i] in BVB_Matches.index:
if BVB_Matches['MyPoints'][df.index[i]] > BVB_Matches['OpponentPoints'][df.index[i]]:
BVB_ret = (df['BVB Close'].iloc[i+1] - df['BVB Close'].iloc[i-2])/df['BVB Close'].iloc[i-2]
else:
BVB_ret = (df['BVB Close'].iloc[i] - df['BVB Close'].iloc[i-2])/df['BVB Close'].iloc[i-2]
total += 1
if df.index[i] in PARKEN_Matches.index:
if PARKEN_Matches['MyPoints'][df.index[i]] > PARKEN_Matches['OpponentPoints'][df.index[i]]:
PARKEN_ret = (df['PARKEN Close'].iloc[i+1] - df['PARKEN Close'].iloc[i-2])/df['PARKEN Close'].iloc[i-2]
else:
PARKEN_ret = (df['PARKEN Close'].iloc[i] - df['PARKEN Close'].iloc[i-2])/df['PARKEN Close'].iloc[i-2]
total += 1
if total != 0:
equity.append(equity[i] * (1 + (MANU_ret + JUVE_ret + BVB_ret + PARKEN_ret)/total - 2*fees))
else:
equity.append(equity[i])
```

## Final Remarks

The fun thing about alternative data is that you can get really really creative.

If we know that our team is strong and our opponent is weak, do many goals still give as much profit as dominating a strong opponent?

Which opponents are strong and which aren’t?

Can we use sentiment data with this strategy (Twitter etc.)?

Can we do some sort of statistical arbitrage by also betting on an outcome on a bookmaker?

etc. etc.

Discord: Available to paid subscribers (Discount: https://www.vertoxquant.com/62375e34)