Hello today I’m going to be talking to you about Slay the Spire.
If you’re not interested in Slay the Spire, that’s OK. I’m actually making a pretty general point, and will explain what you need to know about the game, but I will be talking to you about Slay the Spire.
Slay the Spire is a game about making good choices.
Two types of choice specifically stand out: You are building a deck of cards that represent actions you can take. You use these cards in fights. The strategy of fights consists of making good choices of what to play from the cards you have, and the rest of the game is primarily about choosing cards that will allow you to fight better.
The specific choice I want to talk to you about today is whether to take this card:
This is Melter. It’s an attack card for The Defect, one of the characters in Slay the Spire, and it’s one that I am significantly more likely to pick than most professional players1 of the game are. If I’m offered a Melter in Act 1 (Slay the Spire is composed of four acts, with a boss fight at the end of each one), I will probably take it.
Here is the basis of my valuation of Melter:
It is, by default, a pretty OK card. 10 is quite a large amount of damage for one energy, so even if the enemy has no block it’s decent.
There are a moderate number of fights where Melter’s removal of block is pretty useful. e.g. Melter is quite useful against the
AvocadoShelled Parasite, and it’s handy against Jaw Worms, it’s handy if there’s a Shield Gremlin around, etc. It’s also a little helpful against Guardian at the end of Act 1. Lots of enemies deploy a small amount of block, and against those enemies Melter is handy.It will absolutely wreck the Spheric Guardian (this is what is pictured in the card art), and fuck that guy.
For me this adds up to a card that is usually worth taking in Act 1, because there’s a decent chance that it will moderately improve my early Act 2, because I am really bad at the Spheric Guardian fight and will very regularly lose 30-40 health in it.
In contrast, as far as I can tell streamers rarely take Melter. They are better at the game than me, and usually when someone who is better at the game than me reliably makes a different decision than me that should be a sign that I’m wrong and they’re right, but in this case I think we’re both right: They’re right that they shouldn’t be taking Melter, and I’m right that I should. This would be true even if we had played the same seed (i.e. had the same random game set up and layout) and made the same decisions up to that point.
Playing Slay the Spire is an extremely crunchy problem. There is an objective answer as to whether you’ve succeeded or not - either you won the game or you didn’t, and you get the score that you got.2 This means that there should in some sense be right answers. Certainly there is a theoretical optimal strategy for playing the game that maximises your probability of victory, albeit one that probably nobody will ever be able to calculate.
But, suppose you had some oracle that every now and then (but only once, or maybe a handful of times, per game) would say “BTW the optimal move here is…”. Should you take its advice?
Not necessarily, because such an optimal strategy is recursive: Every move it chooses is optimal given that you will play all subsequent moves optimally. For example maybe the optimal strategy has some much better solve for Spheric Guardian, so it correctly determines that the optimal move is almost never to take Melter, so it tells me not to take Melter, I follow its advice, and then walk into the Spheric Guardian fight and die preventably. The optimal choice for a perfect player doesn’t take into account the fact that I am a decidedly imperfect player and will make mistakes.
This means that although there are objectively correct answers to decisions in Slay the Spire, they depend crucially on the person who is making those decisions. The crunchiness of the problem does not remove the subjective element from the individual steps along the way.
Slay the Spire is full of decisions like this. For example, when making a pathing decision as to whether to pick an elite fight or a shop, it may be clearly correct to pick the elite in terms of how much benefit I could get out of each… unless I know I’m reliably not going to pilot the deck well enough to win that elite fight.3
This is true in general: The decision you make it always predicated on the fact that it’s you who is going to act on the consequences4, and you should plan accordingly based on the ways you know you’re likely to get things wrong. e.g. if you know you’re liable to be forgetful about some things, you can use alerts and timers to remind you, if you know you’re likely to forget things you can write things down. If you’re a beginner and you don’t know which steps are safe to skip, do all of them. These are all things where your decision will be correct despite differing from the experts’ decision in your place, because that decision is made with the context that you are not an expert and will make errors that they don’t.
A lot of productivity and writing advice is criticised because it’s very much not the thing that the actual experts do. e.g. you get a lot of writers who suggest you write every day when starting out. Do they do that? No, absolutely not. Does that make it bad advice? Probably not.5
Another example I regularly run into is when driving. Driving in the UK and navigating with Google Maps, I often end up on insane horse tracks. The official speed limit on these roads is 60mph. I generally drive them at 30-40mph. The locals who end up behind me don’t like this very much. They probably don’t want to go the full 60mph (I hope), but I think they’d be fine driving 40-50mph. This is because they know all the upcoming turns, where they actually need to slow down etc, and I don’t.
This isn’t that they’re better drivers than me (though they may also be better drives than me - I’m at best a decent driver), it’s because they know the choices and challenges that are coming up. Probably I’d be fine driving faster, but because I’m less aware of the context and the upcoming problems, making a decision to err on the side of caution is correct for me in a way that it isn’t necessarily correct for them.
This dynamic is important to be aware of when learning or teaching: A beginner does not necessarily want to emulate an expert in general. They should be more cautious, and make choices that build in safety, because they cannot rely on being able to handle what follows as reliably as an expert can.
You shouldn’t overcorrect on this too much: If an expert is reliably doing something different from you in general, you should consider whether you should be doing that thing too, and you should definitely (if you want to become an expert, which isn’t required!) aim to eventually be making expert level choices, but until you’ve reached that point you may want to play it safe.
Yes there are professional Slay the Spire players. They stream the game to audiences, so their “job” is to produce entertaining content more than to be the best in the world at the game. This does however require them to be very good at the game. Certainly they’re much better than me.
Most people don’t actually seem to care much about the score though. Success at Slay the Spire is generally measured by the difficulty level you played at and whether you killed the heart.
There is also an argument that the correct thing to do here is to take that elite fight and use it as a learning opportunity to git gud, because even if it will lose me this run it will improve my future runs.
Sometimes you make decisions where other people are going to act on the consequences. This is essentially the same phenomenon, just with a different set of strengths and weaknesses.
I don’t think writing every day is good indefinitely, and some people will benefit from it more than others, but I do think that it’s very helpful to spend some time with a daily writing practice and maybe some people will want to keep that up indefinitely.
I absolutely love this example! Also I want to play Slay the Spire now. And I also love that the implication here is "optimization" is basically about knowing yourself and having an accurate picture of your abilities and of what you do and (crucially) do not know about reality. I think this is v important in life, and love looking for ways to optimize for optimization lol.
Thanks for this, it's got me thinking about where it might apply in my life.
I'm not an ML expert, but I think this distinction you're talking about maps pretty well to the distinction between on-policy and off-policy action-value functions in reinforcement learning.
Here's an explanation from GPT (didn't find a good one googling):
Suppose you're navigating your character in a video game. [...]You want to maximize points, and you have a map plus a strategy or policy, denoted by π. Each action you take has a potential reward, represented by numerical values.
When using the on-policy action-value function, denoted by Qπ(s, a), you're estimating the value of each action (a), you could take in each possible state (s), considering your current game-playing style or strategy (π).
Mathematically, the on-policy action-value function Qπ(s, a) is the expected return (or future accumulated reward) when starting in state s, taking action a, and then following policy π:
Qπ(s, a) = Eπ[Rt | St=s, At=a]
Here Eπ denotes the expectation according to π, Rt is the total accumulated reward after t steps, and St=s, At=a means start at state s and taking action a at time t.
Now, suppose you decide to jump over a pit in the game. You calculate, "If I stick with my strategy (π), what's my expected total score (Rt) after this jump?". The on-policy value function directly uses the current strategy (π) to estimate this value.
On the other hand, with an off-policy action-value function, denoted by Q*(s, a), you're still playing with your current style but envisioning an optimal strategy. This optimal strategy dictates the 'best' possible actions to take to achieve maximum rewards, without regard to what your current policy recommends.
The off-policy action-value function Q*(s, a) is defined as the maximum expected return when starting in state s, selecting action a, and thereafter following an optimal policy:
Q*(s, a) = max π Eπ[Rt | St=s, At=a]
In this case, when faced with the same pit in the game, you ponder, "If I were playing ideally, what's my maximum possible total score after this jump?". The off-policy value function does not consider what the current policy would do next, rather it tries to estimate the value based on an optimal policy.
To summarize, on-policy context values actions on your current playstyle, while the off-policy estimation theoretically takes the best possible actions under optimal play, regardless of your current strategy. Both functions help to forecast the value of potential actions, but they do so under different hypothetical futures, dictated by either your current or the optimal strategy.