An introduction to Kelly's Alpha

How to generate alpha from probabilistic thinking.

Apr 16, 2021

No matter the type of investor you are, you cannot deny that investing is a craft that involves dealing with luck, randomness, unknowns— whatever you want to call it. We all know this instinctually, but it enters our work in different ways depending on our specialty. For fundamental L/S guys, this could be a portfolio company launching a new product, with no guarantee it will be successful. For vol-arb quants, it could be the random walk underlying their option models.

But in whatever way you interact with the markets, there is no denying that dealing with probabilities well can make us better investors. So today I’m going to (in simple terms) lay out why we should be focused on this part of portfolio management and how to do it well.

Ergodic averages

There is nothing a gambler loves more than finding a game rigged in their favor. I remember doing calculations on the expected value (EV) of a lottery ticket a couple of years ago when the jackpot hit $1.5B. I thought the numbers worked out pretty well and ended up talking to my boss about it only to find out he had done the exact same math. But as the lottery shows, even with a game rigged in our favor, it isn’t always easy to make money.

Let me give you another example. Let’s say you’re playing poker at a cash table with a bunch of well-off amateurs. You know as long as you keep playing you can make thousands of dollars an hour, but you’ve lost your wallet and all you have is on the table. If you get pocket Aces (in non-poker speak, a really really good hand), and someone across from you who will always call you decides to raise you, are you going to go all in?

Now sure, there are some material details I’m ignoring to avoid the math, but generally it’s a bad idea. Even if you had very high chance of winning, if you lose, you also lose the ability to keep playing the game, and that is often worth more than you can get from any single bet. Going bust for the night is an opportunity cost that is both (a) very hard to quantify, and (b) very very expensive.

This has real-life implications. If you run an investment management company and lose all of your client’s money, the biggest problem you’re going to face is that you’ll never get another job in finance (unless you’re John Meriwether… or Nicholas Maounis). If you’re running a personal account, investing in risky OTM calls and puts is generally a bad strategy, even if there is an expected positive EV. This is because if you lose all your money, you don’t just lose the money, you lose the opportunity cost of letting that money grow in some other investment while you’re instead working to save up a bankroll again.

Meriwether Is Shutting Hedge Fund, Sans Drama - WSJ — John Meriwether may be a brilliant guy, but the fact people were willing to give him money **after** LTCM is mind-blowing to me. Source: WSJ

Now I sure hope at this point you’re thinking to yourself “well duh, obviously blowing up is bad”. But “blowing up” can have implications in how we talk about probability. There are implicit risks in the tails that have different numeraire, or impacts that are harder to quantify. Russian roulette is an example of this. Rational people won’t take any amount of money to play, because the downsides are so much worse than the upside, even if the downsides aren’t counted in dollars.

And even when the downsides are captured in dollars, this downside risk can complicate the meaning of standard descriptive statistical quantifiers. Let’s say 100 people go into a casino at the same time with a new card counting strategy to beat blackjack. The strategy makes a 50% return on 95% of days, but on the other 5% you lose all your money. Classic statistics will tell us that this has an expected return (EV) of +47.5%.

But instead of 100 people all playing the game simultaneously, let’s assume one person plays this new strategy by herself every day. If she does this every day for a year, she will go broke. She may become a billionaire before she does it, but eventually she will lose everything and go broke. This is called ergodic theory, which is just a fancy word for statistics in repeating games. While the EV of this strategy might be +47.5%, the ergodic average (or the time average) is -100%.

Kelly

Dealing with repeating games is pretty straightforward and the methodology was solved a long time ago— all you have to do is not bet your entire net worth on each wager. But this solution adds a new dimension to gambling— proper bet sizing.

So let’s say we come up with another game, the odds are even (100% if we win, -100% if we lose) but we have a 60% chance of winning. What is the right amount of our wealth to bet on this game?

Well we know right off the bat, it isn’t 100%. While betting 100% of your net worth theoretically has a 20% EV (60% * 100 + 40% * -100), you will only be able to play the game a couple times before you go broke. So how does 50% do? Well, not much better. While you don’t go broke for a long, long time, your wealth compounds over time at an average rate of -1.5%. You’re actually LOSING money, even though the EV is theoretically 10%.

So what is the optimal amount of risk to take over time? Let’s start by plotting a variety of bet weightings on a chart and compare both their statistical EV, as well as their ergodic average (that is, their average rate of compounding over time).

Note that the ergodic average here is equivalent to the theoretical rate of wealth compounding over time. I.e. your expected CAGR over ~1000’s of iterations should be -1.5% per period for a 50% fractional bet.

So while it is sub-optimal to be betting more than 25% of one’s wealth on this strategy, it is also suboptimal to be betting too little. If you wager only 3%, you aren’t protecting yourself anymore, you’re only locking in a lower rate of return on your wagers.

The actual number lies somewhere between 12.5% and 25%, and we can calculate it more precisely using a formula called the Kelly Criterion. The Kelly Criterion is calculated as follows1:

Prob/Loss – (1 – Prob)/(Win)

Where Prob = probability of a win, Loss = loss in positive percent terms, and Win = win in positive percent terms. Given the above game, the portfolio allocation that would have the highest rate of return is calculated as

( 60% / 100% ) - (40% / 100%) = 20%

The bottom line is most fundamental managers understand that sizing is important, but often many don’t take the time to think through how important it can be. Using Kelly can give us a way to understand what proportion of our wealth we should be betting on any one thing at any given time. But what this shows us is how important getting the sizing correct can be. The difference between being right, and being too large is significant in our long term CAGRs, and there is real alpha to be made in sizing our bets appropriately.

The real world

I’ve always loved poker. I remember playing for crayons with my sister when I was as young as 5. But the thing about poker that stops it from becoming an allegory for everything else is that there are only 52 cards in a deck.

The combinations of all cards are easy enough to calculate that any mediocre player can figure them out multiple times per hand. E.g., if I have an ace, and three non-aces showing, then I know, for certain, the chance the next card is an ace is 3/47, or approximately 6%.

In the real world, anything can happen. There are no real constraints on what is possible, and we have no way of knowing anything certain about the future. We can use frequentism to count what has happened historically, and we can use Bayes to continually improve our guesses. But we will never know, even ex post, if our probabilities were true— we only get to see if the event we were forecasting happened or not.

But this doesn’t mean making our best judgements can’t help our investment performance. If we can formulate both (a) a projection of what scenarios are possible in the future, and (b) assign some probabilities to those scenarios, we can use Kelly to help us size our bets.

With a deep fundamental understanding of any situation, we do have some ability to forecast. In some stock that has overhanging litigation weighing down it’s multiple, we can estimate what the stock price could be if they win or lose, and we can look at precedent cases to understand how likely they are to win. If we’re working on a fundamental thesis for a chain of Hospice centers, we can understand what the demographic growth would look like over the next 5 years, and then make some assumptions around this firms ability to compete, the ability for Hospice to win share from alternatives. This should give us a range of outcomes from which we can create a probability distribution of returns. If we’re looking at a growth company, we can estimate it’s TAM, and then break down the optionality it has in reaching material profits. The process may change from case-to-case, but the outcome is always an estimated probability distribution of returns.

Often, these probability outcomes won’t be binary, but that’s okay. The Kelly Criterion itself will only solve binary solutions, but the spirit of Kelly — that time averages matter more than expected values — is transferable anywhere. All it takes is mapping out the situation in Excel (or Python?), and then running models that makes that same bet over and over again hundreds of thousands of times. The weighting that produces the greatest CAGR is the proper sizing.

That being said, we need to account for the fact that probabilities are not known, they are only estimated. As we saw earlier, the drop off of returns above a certain leverage can be dangerous. Because of this, it is often wise to build in some margin of safety, and use a buffer zone by under-sizing our bets to ensure that we aren’t overleveraging ourselves. From speaking to colleagues, the most normal allocation and sizing is ~1/2 to ~3/4 Kelly. This margin of safety can also protect against tail risks, which is another very important, and tangibly related topic.

Dynamic systems

The other aspect of poker that isn’t applicable to the real world, is that events are discrete. New cards will turn over, or someone will raise you, and there is new information that you now have an allotment of time to respond to.

Absorbing the information from wall street is like drinking from a broken fire hose. It is not only continuous, but new information comes out sporadically.

Having a probability distribution allows us to understand the proper sizing of a position relative to the rest of the portfolio as we initiate a new position. But it also allows us to update that sizing with new information as situations evolve. Did the confidence in our thesis go up? We can size up the bet. Did the stock crash, but we’re still confident in our bets? We can buy the dip. Did the stock rise on no news? We can take some profits.

While the specifics of any situation are complex, the sentiment remains the same. If we understand how to size positions, we can figure out how to rebalance, take profits, hedge, and leverage ourselves. All topics I hope to dive into deeply sometime soon.

You can read more on Kelly here, here, and here.

Kelly's alpha