How to pick a number for any purpose

Jul 25, 2022

This is another post in the estimation series, following on from How to think about task estimation, but this one doesn’t actually have much to do with task estimation in particular. Instead it’s a pretty general technique for picking a number in response to an underspecified question.

The technique is this: When you have some numeric question you don’t have enough information to answer, first construct a plausible range for values the answer could take, and then use that to construct a decent first guess.

This won’t give you the final answer, hence the name, but it will give you a starting point to improve on.

Estimating unknown quantities

Quick, what’s the population of Paraguay?

You don’t know? Me neither. I’ve deliberately written this bit before looking it up. The world is big, this isn’t the sort of information I’m good at retaining, and I specifically picked this example as one I was failing to remember so I could honestly go through the steps in detail. As a result, I might be about to be embarrassingly wrong.

If you’re more familiar with Paraguay than me, please try a similar exercise with Gabon or Croatia. If you’re also sufficiently familiar with those, congratulations you’re much better at geography than I am. Please bear with me while I demonstrate my ignorance.

Currently I don’t know where Paraguay is, or even what it is - is it a country or a city? All I remember is that I’ve got the name Paraguay in my head somewhere. I think it’s in South America somewhere? Possibly an island? I’m about 80% sure it’s a country, but I honestly wouldn’t rule out that it’s actually a city or a region in some country.

Obviously I could just type that question into Google and get the answer, but that would defeat my purpose here. So let’s try and figure this out without looking anything up.

What we’re going to do is we’re going to identify upper and lower bounds for the population of Paraguay and narrow those.

The fact that I’ve heard of Paraguay means that it’s probably not tiny, so let’s guess 100k (100 thousand) people as a lower bound. That is, I’m pretty sure that Paraguay has at least 100k people. Most of the uncertainty here is that I don’t know what sort of thing Paraguay is. If Paraguay is a country, this is probably much too low. If it’s a city, this is probably on the low side. If it’s something else, I don’t know.

The fact that I don’t remember much about Paraguay suggests that it’s on the small side. Granted this is probably my ignorance of geography as much as anything, but if it’s got a huge population I hope I’d at least have a bit of a better idea of it. So let’s say it’s probably noticeably smaller than the UK, and estimate 50 million as an upper bound. I don’t think Paraguay has a population of more than about 50 million.

Now what we’re going to do is pick points somewhere in the middle of our range and see how we feel about them.

Let’s see, does Paraguay have 25 million people? It’s certainly possible, but it feels higher than I’d expect. So let’s set the upper bound for our estimate to 25 million.

15 million? Ehh… I feel like we’re entering dangerous waters. I’m happy to say it’s probably less than that, but I wouldn’t want to confidently say that it’s much less than that. So let’s set our upper bound to 15 million and fix it there.

So now we want to try some points closer to our lower end. How do we feel about 1 million? Are we happy to say that Paraguay has more than one million people?

Honestly, no. I’m pretty sure it does, but there’s too much uncertainty riding on what Paraguay actually is.

500k? It seems unlikely that Paraguay has a population of less than 500k given that I’ve heard of it at all. Though it’s certainly possible. I’ve heard of plenty of European cities with fewer people than that, and maybe it’s a popular destination for unrelated reasons.

I find I’m not really willing to move the lower bound up from 100k. I think it’s probably much more than that, but it’s plausible enough that it’s not. So this gives us a pretty large range - 100k to 15 million people. If I had to guess now, I’d just pick something roughly in the middle - say, 8 million, but if I needed this for something more precise I’d want to narrow this further.

What happens when you seek more information

I’ve guessed 8 million for the population of Paraguay, but this is based on knowing almost nothing about Paraguay. So let’s get more information to improve our results. Google, “What is Paraguay?”

Paraguay is a landlocked country between Argentina, Brazil and Bolivia, home to large swaths of swampland, subtropical forest and chaco, wildernesses comprising savanna and scrubland.

OK, so I was right about it being a country in South America and wrong about it being an island. Now that I know it really is a country, I can comfortably say that that lower bound is way too low.

Let’s pick another number for our lower bound. How about 5 million? Less than 5 million would be a pretty small country, but not impossibly small.

3 million? Yeah, I really doubt Paraguay has fewer than 3 million people. New lower bound, 3 million.

That 15 million upper bound is starting to sound suspicious though, honestly. I think I was anchoring it more on Paraguay being a small island than I was admitting to. Although the landscape description does make it sound underpopulated. So let’s bump the upper bound to say… 25 million?

This didn’t actually narrow our estimate that much, but it did rule out any of the really tiny answers. Let’s again pick somewhere in the middle - I don’t really have any basis for deciding here, so let’s guess halfway between these two. Paraguay, population 14 million plus or minus 11 million.

(It’s been subsequently pointed out to me that the geometric mean is a better estimate for things like populations, so I shouldn’t be picking a number halfway between the two, I should be multiplying the two together and taking the square root. This gives 8.66 million here. But if you’re doing this in your head, feel free to keep taking the middle)

If we wanted to narrow this down further we could do some sort of Fermi estimate. How many cities does Paraguay have? What’s their average population? How much of the population lives in cities? etc. But for our purposes, we’ll stop here, we’ve got an OK estimate of my intuitions about what Paraguay is likely to be, population wise.

Now, moment of truth where we find out if I’ve embarrassed myself. Google, What’s the population of Paraguay? 7.133 million.

OK, so my estimate wasn’t bad. I was out by a factor of two upwards, which feels reasonable enough. My estimate before I found out more information about Paraguay was actually significantly closer, and if I’d used the geometric mean like I subsequently realised I should this would actually have been very good.1

If I’d had the courage of my convictions and stuck with that initial 15 million upper bound, I’d have guessed halfway between 3 and 15 which would have been 9 million, which would have been very close, but I was worried about finding out that in fact Paraguay was huge and I was merely being terribly ignorant of South American geography (because I am in fact terribly ignorant of South American geography) and embarrassing myself.

Anyway, that’s how you apply this terribly specialised technique for working out the population of Paraguay based on almost no information. Clearly this is a special case that doesn’t generalise at all.

Figuring out plausible ranges

Obviously, nothing about that was specific to Paraguay - I barely used any information about Paraguay at all, and my estimate actually got worse when I looked up information about Paraguay.

Instead, it’s a fully general technique that works as follows:

First, pick a number that is obviously too large (your plausible upper bound).

Now, pick a number that is obviously too small (your plausible lower bound).

These together define what I call your plausible range - any number outside of that range is implausible, any number in between them might be plausible.

Sometimes you’ll be wildly off in what’s plausible. That’s OK. Errors happen, and when they do you can learn from them. For the purposes of this process, assume your intuitions are roughly correct and if they turn out not to be, you can learn better intuitions.

Your goal is now to narrow the plausible range. You do this by picking numbers in the range and asking whether it’s plausible that the number be larger or smaller than that.

e.g. when I picked that initial 25 million for Paraguay, it felt obviously plausible that it was smaller than that, and implausible that it was larger than that, so we set our new plausible upper bound to that.

Initially, you should pick numbers roughly in the middle (but not necessarily exactly in the middle. e.g. I feel like I have a better intuition for whether something is more or less than 10 or 15 million than I do for whether it’s more or less than 12.5 million. Also for things like populations you indeed might be better off picking the geometric mean, which will generally skew you towards the lower end of the range).

Often you’ll see a number and your intuitive response will be “I don’t know, maybe?”. That’s fine, it just means that that number is definitely in the plausible range, and you need to stop picking numbers in the middle. Instead, try things that are quite close to the ends.

Eventually, you’ll get to the point where every number more or less in the middle will result in this “I don’t know, maybe?” response. Now try things that are near the end points.

Eventually you’ll end up with a plausible range that is about as narrow as you can comfortably go. Your desired number is somewhere in here.

Making a decent first guess

For estimating the population of Paraguay, we just picked a number in the middle of our plausible range. This is because our estimate didn’t actually matter very much because we weren’t basing any real decisions on it.

With real estimates, you want to take into account the types of error you’re more comfortable making. An estimate can be too large, or too small, and how much it’s too large or too small by can matter as much or more than the direction.

Anyway, here are the rules for going from a plausible range to an estimate:

If it’s much worse to be too small than too large, pick the upper bound of your plausible range.
If it’s much worse to be too large than too small, pick the lower bound of your plausible range.
Otherwise, pick a nice number in the middle (again, maybe the geometric mean).
If the resulting number feels wildly implausible to you, adjust it a bit in the right direction until it doesn’t, but otherwise don’t worry too much about getting it right.

This is the point where the “What are you trying to do?” question comes in to the estimate if you’re using this method for task estimation. For example, if you want to know whether you’re going to make a deadline, it’s definitely much worse to be too small than too large.

This procedure won’t necessarily get you a great answer, but this is not a method for getting great answer, that’s why it’s called a decent first guess.

In particular, if your plausible range is very wide, this will tend to be very off the mark if you pick in the middle. If you want more precise estimates, you’ll need more sophisticated techniques. For example, if you add a guess of the typical value (which should lie in your plausible range) you can use three point estimation again, with the plausible range giving you your worst and best case scenarios.

Better and worse errors

Sometimes some errors are just obviously worse than others. For example:

When feeding a group, it’s obviously worse to have too little food than too much because too little means people go hungry.
When adding salt to a dish, it’s obviously worse to add too much than too little, because you can always add more salt in later but you can’t take salt out.

In both of these cases, it’s not that the other way around doesn’t matter. If it wasn’t bad at all to under-salt food, you just wouldn’t add salt at all. If it wasn’t bad at all to have too much food, you’d overbuy to a ridiculous degree. But by having the plausible range, you ensure you’re ensuring that you use an amount that it’s at least reasonable compensates for the other side of the error.

An example where the errors are more or less equally bad is pricing. When deciding how much to charge for your services, charging too much will get you rejected, charging too little will mean you make less money than you could. Both are bad. It’s not really true that they’re equally bad, but for the purposes of a decent first guess they’re both a lot closer to equally bad than they are to one dominating the other.

“Worse” isn’t necessarily strictly about the consequences being particularly good or bad. This is especially true when you’re making repeatable decisions, where often what’s more important is what you’re going to learn from the experience.

Let me give two concrete examples where I’ve helped someone using this method:

Someone is trying to decide how much to offer for a house they’re interested in but not in love with.
Someone is trying to decide how much of their week to spend on planning, personal development, and general maintenance vs actually doing the most immediately important work.

In the first case, erring on the low end is good: Either your offer will be accepted in which case, great! You’ve got a bargain. Or, your offer will be rejected, in which case you’ve learned a bit about what sort of offers get accepted. If they laugh in your face, you know you probably offered much too low. If they think about it for a while before declining, your offer was probably reasonable but not quite enough in context. Either way this informs your next offer. If, on the other hand, you offer too much, you’ve potentially cost yourself tens of thousands of pounds you didn’t need to spend.

In the second case, erring on the high side is good, because this helps you find out how much you could spend on those things before it stops being useful. It’s a more dakka sort of situation. Chances are, you’ll start out spending too much, and gradually whittle it down, but also chances are you’re currently spending not nearly enough, so it will be useful to have an opportunity to see what too much feels like.

Another case where it’s worth erring on the side of too much is where there is social pressure not to change estimates. If an estimate is going to be treated as a commitment (“You said you could do this in nine days, but now it’s ten days in and it’s not done yet! How dare you?”), erring on the side of caution and picking the maximum of the plausible range is the safe bet.

The role of the decent first guess

The role of the decent first guess is that it is decent, it is first, and it is a guess.

That is to say: This method will not give you particularly good estimates. It’s not meant to. It’s meant to give you estimates that are basically alright, but to get them pretty quickly and without stress. If you want a better estimate, you don’t stop with your first guess, you instead work to improve it.

Depending on what you’re doing, often the way to do this is to use your decent first guess, and see what happens, and use this to make a better decent first guess next time.

But even if you never want to settle for your decent first guess for a question, these techniques are still very useful for a couple reasons.

The first is, often you’re done as soon as you have the plausible range, if you make the same decision for any plausible value, so you don’t even need to make the decent first guess. No immediate time pressure and an obviously beneficial task will take you two to four days? Great, go do it, it’ll be done when it it’s done. Willing to spend somewhere between £100 and £1000 for something you can get done in £50? Perfect. Willing to spend a week on something that you estimate will take between a month and a year? No thanks.

The second thing is that this is often a good building block. If you end up with a plausible range that is larger than you need it to be, you need to break the problem down. This will often cause you to want to estimate other things, and you can use plausible ranges for those estimates. For example, you break a task up into smaller tasks, work out the plausible range for each of those, and use that to refine your estimate for the current task.2

The most important thing though is, I think, that decent first guesses and plausible range become the thing to beat. The work of estimation is often discounted, and starting from a plausible range helps make it more explicit. Is it worth spending 10 minutes narrowing an estimate from a plausible range of a day to a week? Yeah, probably. Is it worth spending half a day narrowing a plausible range from one to two days? Probably not.

Work and estimation are, broadly contiguous with each other, and they blend together at the edges. By changing the work of estimation from a black box to one of narrowing a range of possibilities, it becomes immediately apparent whether it’s worth spending more effort on estimation, or whether that would be better spent on just doing the actual work.

Subscribe to my newsletter!

If you liked this piece and want to read many more like it, why not subscribe if you’ve not already? Here’s a subscribe button for you to click. Go on, click the button…

Community

If you’d like to hang out with the sort of people who read this sort of piece, you can join us in the Overthinking Everything discord by clicking this invitation link. You can also read more about it in our community guide first if you like.

Cover image

The cover image is a hut in Paraguay, made available by Wikimedia user Herr stallhoefer.

This is apparently a general purpose thing. e.g. Germans are better at predicting which of two cities in the USA is larger than people from the USA are, and vice versa. Although this is a psychology result and I don’t know if it replicates, and can’t even be bothered to chase down a cite right now.

You need to be a bit careful about how you do this. If you break an estimate up into a lot of sub-estimates and add them together, this can make it look like there is more uncertainty than there actually is, because the uncertainties tend to cancel out when you add them up, and the plausible ranges don’t and can’t capture this.

Mo Nastri

Jul 27, 2022

I was wondering why you didn't take the geometric mean, as Guy mentioned upthread -- intuitively you'd expect the *log* of country population to be evenly distributed, since assuming the (non-log-transformed) country pops to be evenly distributed doesn't make much sense when the endpoints are a rounding error away from zero on one end and China on the other. You would've basically nailed this Fermi exercise. Of course that's not a straightforwardly generalizable takeaway though.

But there is a generalizable takeaway from this ostensible nitpick (I think), which is to familiarize yourself with lots of real-world examples of distributions. The "strategy" here is mainly broad reading plus doing frequent little Fermi estimates from time to time.

Just to not sound like a party-pooper I'd like to balance out my comment by saying I enjoyed your post, and I'd like to share a related article: https://forum.effectivealtruism.org/posts/3hH9NRqzGam65mgPG/five-steps-for-quantifying-speculative-interventions

It's by a Good Judgment superforecaster, Nuno Sempere. I thought it was a cool look into how a world-class estimation guy does his thing, even if his methods are way too effortful for me.

Expand full comment

1 reply by David R. MacIver