If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics

Posted on May 2, 2013 by Scott Alexander

I do not believe that the utility weights I worked on last week – the ones that say living in North Korea is 37% as good as living in the First World – are objectively correct or correspond to any sort of natural category. So why do I find them so interesting?

A few weeks ago I got to go to a free CFAR tutorial (you can hear about these kinds of things by signing up for their newsletter). During this particular tutorial, Julia tried to explain Bayes’ Theorem to some, er, rationality virgins. I record a heavily-edited-to-avoid-recognizable-details memory of the conversation below:

Julia: So let’s try an example. Suppose there’s a five percent chance per month your computer breaks down. In that case…
Student: Whoa. Hold on here. That’s not the chance my computer will break down.
Julia: No? Well, what do you think the chance is?
Student: Who knows? It might happen, or it might not.
Julia: Right, but can you turn that into a number?
Student: No. I have no idea whether my computer will break. I’d be making the number up.
Julia: Well, in a sense, yes. But you’d be communicating some information. A 1% chance your computer will break down is very different from a 99% chance.
Student: I don’t know the future. Why do you want to me to pretend I do?
Julia: (who is heroically nice and patient) Okay, let’s back up. Suppose you buy a sandwich. Is the sandwich probably poisoned, or probably not poisoned?
Student: Exactly which sandwich are we talking about here?

In the context of a lesson on probability, this is a problem I think most people would be able to avoid. But the student’s attitude, the one that rejects hokey quantification of things we don’t actually know how to quantify, is a pretty common one. And it informs a lot of the objections to utilitarianism – the problem of quantifying exactly how bad North Korea shares some of the pitfalls of quantifying exactly how likely your computer is to break (for example, “we are kind of making this number up” is a pitfall).

The explanation that Julia and I tried to give the other student was that imperfect information still beats zero information. Even if the number “five percent” was made up (suppose that this is a new kind of computer being used in a new way that cannot be easily compared to longevity data for previous computers) it encodes our knowledge that computers are unlikely to break in any given month. Even if we are wrong by a very large amount (let’s say we’re off by a factor of four and the real number is 20%), if the insight we encoded into the number is sane we’re still doing better than giving no information at all (maybe model this as a random number generator which chooses anything from 0 – 100?)

This is part of why I respect utilitarianism. Sure, the actual badness of North Korea may not be exactly 37%. But it’s probably not twice as good as living in the First World. Or even 90% as good. But it’s probably not two hundred times worse than death either. There is definitely nonzero information transfer going on here.

But the typical opponents of utilitarianism have a much stronger point than the guy at the CFAR class. They’re not arguing that utilitarianism fails to outperform zero information, they’re arguing that it fails to outperform our natural intuitive ways of looking at things, the one where you just think “North Korea? Sounds awful. The people there deserve our sympathy.”

Remember the Bayes mammogram problem? The correct answer is 7.8%; most doctors (and others) intuitively feel like the answer should be about 80%. So doctors – who are specifically trained in having good intuitive judgment about diseases – are wrong by an order of magnitude. And it “only” being one order of magnitude is not to the doctors’ credit: by changing the numbers in the problem we can make doctors’ answers as wrong as we want.

So the doctors probably would be better off explicitly doing the Bayesian calculation. But suppose some doctor’s internet is down (you have NO IDEA how much doctors secretly rely on the Internet) and she can’t remember the prevalence of breast cancer. If the doctor thinks her guess will be off by less than an order of magnitude, then making up a number and plugging it into Bayes will be more accurate than just using a gut feeling about how likely the test is to work. Even making up numbers based on basic knowledge like “Most women do not have breast cancer at any given time” might be enough to make Bayes Theorem outperform intuitive decision-making in many cases.

And a lot of intuitive decisions are off by way more than the make-up-numbers ability is likely to be off by. Remember that scope insensitivity experiment where people were willing to spend about the same amount of money to save 2,000 birds as 200,000 birds? And the experiment where people are willing to work harder to save one impoverished child than fifty impoverished children? And the one where judges give criminals several times more severe punishments on average just before they eat lunch than just after they eat lunch?

And it’s not just neutral biases. We’ve all seen people who approve wars under Republican presidents but are horrified by the injustice and atrocity of wars under Democratic presidents, even if it’s just the same war that carried over to a different administration. If we forced them to stick a number on the amount of suffering caused by war before they knew what the question was going to be, that’s a bit harder.

Thus is it written: “It’s easy to lie with statistics, but it’s easier to lie without them.”

Some things work okay on System 1 reasoning. Other things work badly. Really really badly. Factor of a hundred badly, if you count the bird experiment.

It’s hard to make a mistake in calculating the utility of living in North Korea that’s off by a factor of a hundred. It’s hard to come up with values that make a war suddenly become okay/abominable when the President changes parties.

Even if your data is completely made up, the way the 5% chance of breaking your computer was made up, the fact that you can apply normal non-made-up arithmetic to these made-up numbers will mean that you will very often still be less wrong than if you had used your considered and thoughtful and phronetic opinion.

On the other hand, it’s pretty easy to accidentally Pascal’s Mug yourself into giving everything you own to a crazy cult, which System 1 is good at avoiding. So it’s nice to have data from both systems.

In cases where we really don’t know what we’re doing, like utilitarianism, one can still make System 1 decisions, but making them with the System 2 data in front of you can change your mind. Like “Yes, do whatever you want here, just be aware that X causes two thousand people to die and Y causes twenty people an amount of pain which, in experiments, was rated about as bad as a stubbed toe”.

And cases where we don’t really know what we’re doing have a wonderful habit of developing into cases where we do know what we’re doing. Like in medicine, people started out with “doctors’ clinical judgment obviously trumps everything, but just in case some doctors forgot to order clinical judgment, let’s make some toy algorithms”. And then people got better and better at crunching numbers and now there are cases where doctors should never use their clinical judgment under any circumstances. I can’t find the article right now, but there are even cases where doctors armed with clinical algorithms consistently do worse than clinical algorithms without doctors. So it looks like at some point the diagnostic algorithm people figured out what they were doing.

I generally support applying made-up models to pretty much any problem possible, just to notice where our intuitions are going wrong and to get a second opinion from a process that has no common sense but is also lacks systematic bias (or else has unpredictable, different systematic bias).

This is why I’m disappointed that no one has ever tried expanding the QALY concept to things outside health care before. It’s not that I think it will work. It’s that I think it will fail to work in a different way than our naive opinions fail to work, and we might learn something from it.

EDIT: Edited to include some examples from the comments. I also really like ciphergoth’s quote: “Sometimes pulling numbers out of your arse and using them to make a decision is better than pulling a decision out of your arse.”

This entry was posted in Uncategorized and tagged utilitarianism. Bookmark the permalink.

65 Responses to If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics

Reverse order

Pingback: Common Responses to Earning to Give | The Centre for Effective Altruism
Pingback: The disappointing rightness of Scott Alexander | The Last Conformer
Douglas Knight says:

May 3, 2013 at 2:37 pm

The first half of this post seems to me to say that making up numbers and plugging them into a model is better than using your gut, while the second half says merely that you should make up models and numbers and compare them to your gut. This seems to have lead to a lot of people ignoring the second half and objecting to the first half. First impressions!
Jack says:

May 3, 2013 at 2:10 am

I wonder if people grok probabilities better if expressed as what you’d bet. I know people are bad at that too, but I often find a lot of the self-justification going away if I ask myself what I’d _really_ bet on something.

Something like, “if you bet £10 that your computer would be working at the end of the week, how much would you need to win to make the bet worthwhile?”
Bruno Coelho says:

May 2, 2013 at 6:45 pm

So, when is better to not make numbers out of thin air?
- Rory O’Kane says:
  
  May 2, 2013 at 9:39 pm
  
  When the cost of failure is low, and figuring out numbers might cost a relatively large amount of time. Though I’m having trouble putting together a precise example for this, where you could imagine numbers but it wouldn’t be worth your time. I had something in mind about choosing which of two adjacent food carts to buy your burger from for lunch today, but I’m not sure what numbers you could make up if you decided to make up numbers.
- David Chapman says:
  
  May 3, 2013 at 1:23 am
  
  I’d argue that the burden of proof goes the other way. Making up numbers is inherently dubious. The question should be stated “in what sorts of cases can we be reasonably confident that making up numbers will be better than all available alternatives?” Advocates of making-up out to have the burden of showing that such cases can be identified prospectively.
- Paul Crowley says:
  
  August 20, 2013 at 3:20 am
  
  When our intuitions are likely to be good, which is either when it’s a situation we’ve evolved to handle or where it’s one we’ve seen many times and have had an opportunity to train.
Ishmael says:

May 2, 2013 at 6:09 pm

> The explanation that Julia and I tried to give the other student was that imperfect information still beats zero information

But your rationality virgin was not operating with zero information. Here we are with two kinds of imperfect information: the folk “not likely” and the made-up “point oh five.” If you are saying the latter always beats the former, why should I believe you? How many QALYs can I expect to gain by systematically doing it your way and not the folk way?

I suspect (“p = .7!”) that a good study would not find that patients of Bayesian doctors have better health outcomes than the patients of innumerate ones.

I wonder (“p = .5!”) whether public health policy was measurably worse before the development of QALYs.
- Scott Alexander says:
  
  May 2, 2013 at 10:52 pm
  
  Yes, I made this point below (“But the typical opponents of utilitarianism have a much stronger point than the guy at the CFAR class. They’re not arguing that utilitarianism fails to outperform zero information, they’re arguing that it fails to outperform our natural intuitive ways of looking at things”) and everything beneath it was a response.
  
  I think people in Near Mode (including doctors) are probably pretty good at this and people working in Far Mode are terrible.
Deiseach says:

May 2, 2013 at 10:18 am

I have some sympathy with the bratty student, although I agree that the constant “Hey, stop, no” would be irritating. But the trouble is, it’s fine to use made-up numbers when everyone knows they’re just pulled out of the air; once you start making concrete and binding decisions on the basis of invented numbers, though, then you do need to either back it up with “this is how I decided on the number” or admit it’s all hot air.

Whether or not a computer is likely to break down would depend on is it new, is it old, do you maintain it properly, did you drop it down the stairs last week, etc.

And the likelihood of being poisoned by a sandwich does depend on whether you bought it from a reputable outlet or not – though, as the horsemeat in the beef products mini-scandal we had over here demonstrated, even brand-name quality products are not the quality you would expect.

I can see the necessity to do a rough approximation for some things you either can’t quantify or have no evidence one way or the other for yet, but it does strike most people who are not accustomed to thinking in that manner as being weirdly precise over numbers pulled out of thin air.
Paul Crowley says:

May 2, 2013 at 8:26 am

I think this is another opportunity to quote my maxim: it is often better to pull numbers out of your arse and use them to make a decision, than to pull a decision out of your arse.
- Sebastian says:
  
  May 2, 2013 at 1:48 pm
  
  Hah. I was just about to comment that this posted sounded like the sort of thing you were always saying to me. I like your phrasing best though. xxx
- Scott Alexander says:
  
  May 2, 2013 at 2:01 pm
  
  I like your phrasing too.
yli says:

May 2, 2013 at 7:38 am

Interestingly, if you just asked them to tell you how sure they are their computer will not break down this month, as a number from 0 to 10, they’d give a number wouldn’t think they were weirdly being told to make up numbers. But when it’s 0.0 to 1.0, they start thinking you have aspegers.
- James Goulding says:
  
  May 2, 2013 at 7:56 am
  
  That may be because 1–10 is a social convention for comparing rough, ordinal utility. If I rate one movie on IMDB 7 and another 6, it’s given that I prefer the former than the latter—and that is all. I wouldn’t expect anyone to start doing precise calculations, multiplying my Shakespeare-time by 0.7 or 0.6 or whatever.
  
  The 0–1 scale, however, is reserved by most people for technically precise measurements, derived from statistics or experiment. They would think that such a person has Asperger syndrome because he is acting as though human reasoning and human brains are susceptible to the same type of analysis as computer calculations and scientific instruments.
  
  This is indeed a popular belief amongst nu-rationalists, but I think that the inference from “humans necessarily approximate Bayesian reasoners” to “it is often useful to treat humans as though they were explicit Bayesian reasoners” is a non-sequitur.
  - James Goulding says:
    
    May 2, 2013 at 8:00 am
    
    OK, on this occasion my reading comprehension was indeed horrible.
- Army1987 says:
  
  May 2, 2013 at 1:16 pm
  
  The normal way to ask is from zero to a hundred per cent.
Randy M says:

May 2, 2013 at 7:25 am

“So this is where I go back to the example of that scope insensitivity experiment where people were willing to spend about the same amount of money to save 2,000 birds as 200,000 birds. And the experiment where people are willing to work harder to save one impoverished child than fifty impoverished children. And the one where judges give criminals two to three times longer sentences on average just before they eat lunch than just after they eat lunch.”

In all these experiments, were people asked to compare two things, or two groups asked to each give an opinion about a different on? Because if it is the latter as makes more sense, I don’t see how it would be different with a semi-arbitrary utilitarianism. The probabilities or utilions I naivly assign could just as well be influenced by my mood, etc. I think the advantage you see in utilitarianism comes from either giving people a constrained scale (so they can’t be off by orders of magnitude unless they want to go into decimals which I expect people won’t) or implicitly asking them to order things relative to one another.
Or maybe that’s all utilitarianism is?
- Nancy Lebovitz says:
  
  May 2, 2013 at 9:01 am
  
  I have a notion that the reason questions about charity get that sort of answer is that, when asked the first time (seems to usually be about the smaller number of recipients), respondents say how much they’re willing to contribute to that general sort of problem, which means they’re not likely to contribute more if the problem is then presented as larger.
  
  I’m not sure that “sort of problem” thinking is sound, but in a world where the are many requests for help about ongoing problems, it may not be an especially bad way to think.
  
  I wonder what happens to hypothetical contribution size if 200,000 birds are asked about first, and then 200 birds are asked about.
David Chapman says:

May 2, 2013 at 7:16 am

If the method of making up numbers can be justified at all (let’s grant so for the sake of the argument), it would be good to have heuristics at least for when it is likely to work and when it is likely to be misleading.

Anyone know of such heuristics?
- Randy M says:
  
  May 2, 2013 at 7:32 am
  
  “Even if the number “five percent” was made up (suppose that this is a new kind of computer being used in a new way that cannot be easily compared to longevity data for previous computers) it encodes our knowledge that computers are unlikely to break in any given month. ”
  
  It does sound like Scott thinks it is always superior to not.
  - David Chapman says:
    
    May 2, 2013 at 7:44 am
    
    Well, if so, that’s clearly false. Say we’re reasoning about whether it’s worth buying an extended warranty. You could make up a .05/month chance of breakdown, estimate cost of repair, and decide. Maybe you’d be wrong, but it would be better than if you estimated a .9/month chance of breakdown.
    
    However, I’d never do that analysis. For my amusement, I’ve sometimes done the inverse model analysis I suggested earlier: what likelihood of breakdown would justify the warranty cost? If it’s .0001, it’s probably worth getting; if it’s .9, it’s not worth getting; if it’s .05 or .02 you’d probably say “I have no idea if the likelihood is greater or less than that”, in which case you might be indifferent. Saying “is your probability guess bigger or smaller than .9” is a lot easier/more meaningful than pulling a number out of thin air.
    
    But even that analysis is stupid, because you know you are in a situation of information asymmetry. You know that Dell knows much better than you what the value of the contract is, and they won’t offer it to you unless they make a profit on it.
    
    So, without making up any numbers at all, the right answer is arrived at by non-Bayesian reasoning.
    - ThrustVectoring says:
      
      May 2, 2013 at 1:26 pm
      
      You and Dell have different utility functions. To first approximation, every extra dollar works the same for Dell. You might need to borrow dollars at credit-card rates to deal with repairing or replacing a computer. If the 300th+ dollar is more expensive to you than the first 15, the insurance can be worthwhile.
      - David Chapman says:
        
        May 3, 2013 at 1:16 am
        
        Yes; more generally the details of the contract might make it a positive-sum game in other ways. I implicitly assumed that it was zero-sum, and assumed that readers would also understand it that way because otherwise my example doesn’t make sense.
    - David Chapman says:
      
      May 3, 2013 at 1:17 am
      
      Sorry, meant positive utility for you, as well as positive sum. Haven’t had coffee yet…
- Scott Alexander says:
  
  May 2, 2013 at 2:00 pm
  
  As I said above, I’m less interested in when it “works” or “doesn’t work” with when it will work better than native decision-making processes.
  
  I would suggest making up numbers any time one is suspicious that native decision-making processes will fail so miserably – by so many orders of magnitude – that they will dwarf any errors in the numbers you choose. And also when there’s no weird failure of math to model the domain space like Pascal’s Mugging.
Michael Vassar says:

May 2, 2013 at 7:02 am

It seems to me that unlike the case with scope insensitivity, we are more likely to be mislead WRT the signs of questions like the utility of life in North Korea or of life in general than WRT the approximate order of magnitude. We might be grossly mislead about life in North Korea or about what things are good and what things are bad, but we’re less likely to be mislead about the very approximate magnitudes within our numerical model.
- Scott Alexander says:
  
  May 2, 2013 at 1:59 pm
  
  The correct assumption I think the North Korea example encodes is that life in North Korea is significantly worse than life in America.
  
  I would not myself place the assumption that it is better than death in quite that category of certainty, but it seems like a legitimate result of the survey and seems to correctly codify the natural human belief it derives from, whether or not that natural human belief is right or wrong.
  - Michael Vassar says:
    
    May 2, 2013 at 3:12 pm
    
    I’m highly confident that life in North Korea is significantly worse than life in America, but people in North Korea are, I think, generally confident of the opposite. Updating for their belief, I remain highly confident, but I note that I haven’t either looked or engaged them on the topic or really tried hard to look for alternative hypotheses, so my confidence, while high, is going to be bounded by what I see as maximum reasonable confidence for someone with those glaring warning signs, maybe 99.97%, e.g. much less surprising than some other things I have come to believe in over the last decade through personal experience and not so far out there that I have to be grossly mis-calibrated or that I couldn’t be convinced by easily conceivable evidence. More likely even than strong forms of the efficient market hypothesis and some other standard claims by smart academics. I’ll be much more surprised than this many more times.
    - naath says:
      
      May 3, 2013 at 5:03 am
      
      Many people in North Korea are confident enough that North Korea is a very bad place to live that they are willing to risk very serious consequences (death, torture, imprisonment, extreme poverty, massive culture-shock… amongst others) to get out. That doesn’t mean they think the USA is better of course.
      
      Presumably other people in North Korea have bought the propoganda; and still others are having a fine time being part of the ruling class.
Ben Landau-Taylor says:

May 2, 2013 at 6:46 am

This is why I’m disappointed that no one has ever tried expanding the QALY concept to things outside health care before.

Amartya Sen’s work in general, and the Human Development Index in particular, seems relevant. It’s not quite what you’re looking for since it’s done at the national rather than the individual level, but it uses a lot of similar tools.
James Goulding says:

May 2, 2013 at 6:41 am

Also, as I said here:

Our black-box reasoner processes a mountain of evidence, not all of which is easy to apprehend consciously or make explicit. The pitfalls, as in mathematical economics, are facile caricatures of complex phenomena and concealed assumptions.

When I come up with a number like 0.05, I don’t know that this is a very accurate estimate of my brain’s “real” probability. Calibration is often awful, and cannot easily be improved.

The part of the mind that dredges up this number does not have transparent and facile access to the entirety of my brain’s map and reasoning power; human brains, although they should, don’t work that way. So if in a complex problem I pull out ten or twenty of these numbers and multiply them in Bayes’s theorem, I might compound introspective errors and end up with a final number that is inferior to a direct, “intuitive” estimate.

This won’t always be the case, but my impression having seen a few explicit Bayesian calculations (outside technical fields) is that these are only worthwhile when the evidence comprises detailed, reliable, highly pertinent and extensive statistics.
- Michael Vassar says:
  
  May 2, 2013 at 6:57 am
  
  And now, for your third and final reiteration of the refuted position without addressing the refutation at all, you will dazzle and delight us all by supporting your assertions with made-up numbers!
- Benquo says:
  
  April 22, 2014 at 7:58 am
  
  Calibration is often awful, and cannot easily be improved.
  
  Do you have evidence for this? I thought Calibration was one of the few things we actually did understand how to improve, by giving people Calibration exercises.
James Goulding says:

May 2, 2013 at 6:27 am

Even if we are wrong by a very large amount (let’s say we’re off by a factor of four and the real number is 20%), if the insight we encoded into the number is sane we’re still doing better than giving no information at all (maybe model this as a random number generator which chooses anything from 0 – 100?)

Spurious precision can cause inaccurate inferences, where vague, verbal statements would not. If someone says, “My computer has a fairly small chance of breaking down each month”, I understand that he has not investigated the matter thoroughly. If he says, “My computer has a 0.05 probability of breaking down each month”, or “I estimate my computer has a 0.05 probability of breaking down each month” I’d be inclined to think that he got this number from a technical manual, or is otherwise some kind of expert—which in this case would be false.

This is part of why I respect utilitarianism. Sure, the actual badness of North Korea may not be exactly 37%. But it’s probably not twice as good as living in the First World. Or even 90% as good. But it’s probably not two hundred times worse than death either. There is definitely nonzero information transfer going on here.

I don’t think we know enough to make sense of statements like “twice as good”. This is, at this stage of human development, an inevitably nebulous class of statement.

Perhaps one day we’ll be able to quantify hedonic utility, and describe a decision agent’s preference ordering over physical (or mathematical) outcomes according to something resembling a utility function. For now, I think it wise to stick with the Austrian economist’s “ordinal” utility: we can deduce that someone prefers one thing to another, e.g. living in China to living in Korea, because humans in practice do actually choose, or can easily imagine choosing one or the other. On the other hand, “cardinal” utility with numbers is never put into practice and is extremely difficult to introspect about, thus invocations of it are likely to mislead us.
- Michael Vassar says:
  
  May 2, 2013 at 6:52 am
  
  You fail at reading comprehension.
  - James Goulding says:
    
    May 2, 2013 at 7:08 am
    
    Thank you for your helpful and good faith feedback.
  - James Goulding says:
    
    May 2, 2013 at 10:45 am
    
    Perhaps it were better put this way: not only probability distributions, but A_p distributions are important. Here is E.T. Jaynes on the A_p distribution:
    
    So, the stability of the robot’s state of mind when it has evidence E is determined, essentially, by the width of the density (A_p|E). There does not appear to be any single number which fully describes this stability. On the other hand, whenever it has accumulated enough evidence so that (A_p|E) is fairly well peaked at some value of p, then the variance of that distribution becomes a pretty good measure of how stable the robot’s state of mind is. The greater amount of previous information it has collected, the narrower its A_p distribution will be, and therefore the harder it will be for any new evidence to change that state of mind.
    
    When someone has a peaky A_p distribution, the social convention is for him to give a probability like 0.5. When he has a broad A_p distribution, it is conventional for him not to pull numbers out his arse but to accordingly vague and verbal-ish.
    
    There are at least two reasons not to throw away this convention:
    
    1. It is memetically evolved, and therefore more trustworthy than someone’s rational design (see: Chesteron’s fence, Hayek).
    
    2. There is still a need for an (implicitly) understood distinction between the two different A_p distributions, even if the probabilities are the same. If we—and the 99.9% of people who haven’t read PT:TLOS—don’t make the distinction clear by using “0.05” in one case and “rather unlikely” in the other, then what is the realistic alternative?
    
    I don’t believe Scott addressed this issue, or if so only in an unsatisfyingly oblique way.
  - James Goulding says:
    
    May 2, 2013 at 10:52 am
    
    …although complicated by the fact that where the A_p distribution is *obviously* broad, numbers seem to be OK. 🙂
  - Scott Alexander says:
    
    May 2, 2013 at 1:41 pm
    
    Please be nice. If you don’t feel like being nice for James sake (though I assure you he’s well past my filter of ‘people smart enough I assume they haven’t made an obvious mistake’), be nice for my sake so I don’t have to come up with a formal comment policy on niceness.
    - Michael Vassar says:
      
      May 2, 2013 at 3:00 pm
      
      Will do. I’m not a fan of uniform niceness, but it’s your blog.
- Army1987 says:
  
  May 2, 2013 at 1:13 pm
  
  If he says, “My computer has a 0.05 probability of breaking down each month”, or “I estimate my computer has a 0.05 probability of breaking down each month” I’d be inclined to think that he got this number from a technical manual, or is otherwise some kind of expert—which in this case would be false.
  
  I dunno, 0.05 does sound like an obviously rounded number. (But I’d say “1 in 20”.) What if they said “my computer will break about once in a year and a half in average”?
- Scott Alexander says:
  
  May 2, 2013 at 1:56 pm
  
  I agree that one needs to clearly tag made-up numbers as made-up.
  
  My point here is that once the inherent errors in non-numerical reasoning become more significant than the inherent errors in making numbers up, then making numbers up becomes more accurate than trying to reason non-numerically.
  
  For example, in the Bayes mammogram problem, by fiddling with the numbers you can make doctors off by whatever amount you want (for example, if you make the test 99.99% accurate but put the prevalence of the disease involved at .00001%). You can make doctors wrong by a factor of ten thousand if you want. Even if doctors don’t have the exact statistics in front of them, they probably won’t make an estimate of the prevalence of disease that’s wrong by a factor of ten thousand. So in this case, whatever errors they make estimating the prevalence of disease are more than compensated for by their ability to do real math with it.
  
  If the doctor just goes up to the patient and says “Hey, the prevalence of this disease is .00001%, then yes, they’re claiming accuracy and knowledge they don’t have.
  
  I agree that one has to be careful not to use made-up numbers in cases where the benefits of making them up do not outweigh the risk in terms of false precision.
  - James Goulding says:
    
    May 2, 2013 at 3:06 pm
    
    I agree that one has to be careful not to use made-up numbers in cases where the benefits of making them up do not outweigh the risk in terms of false precision.
    
    In highly routine cases, where people work with familiar statistics, numerical probabilities could be a really good idea. They are also liable to funge against clear communication of how robust someone’s probability is in the face of new evidence. I guess it’s a minor problem in a lot of cases, but maybe not e.g. in a telephone conversation with a stranger.
    
    Perhaps more seriously, numerical probability and mathematics can be used, intentionally or otherwise, to cloak inaccurate reasoning. Example: professional statistician Helen Joyce’s spectacularly bad inference, revealed from p. 320 onwards of this this paper by Neven Sesardic, which I think would stand out far more clearly (to non-savants) were it laid out in English.
    
    So I’m pessimistic about where this intersection lies, although I do think numerical probabilities are likely to be more accurate and helpful than numerical utilities.
    
    Komponisto, the relevance of statistics is not that frequentism is correct, but that peaky A_p distributions are usually due to the availability of statistical evidence. Also, CFAR attendees have to interact with the majority of other people who associate probability with frequencies, and that isn’t something one can wish away.
  - David Chapman says:
    
    May 3, 2013 at 1:34 am
    
    I agree that one has to be careful not to use made-up numbers in cases where the benefits of making them up do not outweigh the risk in terms of false precision.
    
    So, this opens a potentially fruitful area of investigation. Inverting your OP title: “When is it worth doing with made-up statistics?”
    
    Much of the argument in this thread seems to be between those whose intuition is “usually” and those whose intuition is “hardly ever.” Probably all would grant that the right answer is neither “never” nor “always.” Going meta, I would say this is a case where making up a number inbetween would not be helpful!
    
    Instead, it could be useful to find heuristics, such as categories of problems, types of available evidence, naivete of reasoners, and so forth.
  - Deiseach says:
    
    May 3, 2013 at 10:48 am
    
    So when do you know that there really is a good chance that you need to have a mammogram? I’m coming up to the age where the advice is contradictory: one lot say women need national screening programmes and every woman from the age of X should have one every Y years; another lot says mammograms are unnecessary and do more harm than good by making women think every lump or bump is potential cancer and false positives are more prevalent than not.
    
    I’m really not heartened to know that decisions about medical intervention (either pro or con) are made on the basis of “Yeah, we just pulled this number out of the air”.
- komponisto says:
  
  May 2, 2013 at 2:07 pm
  
  If someone says, “My computer has a fairly small chance of breaking down each month”, I understand that he has not investigated the matter thoroughly. If he says, “My computer has a 0.05 probability of breaking down each month”, or “I estimate my computer has a 0.05 probability of breaking down each month” I’d be inclined to think that he got this number from a technical manual, or is otherwise some kind of expert
  
  This is exactly the kind of inference that people need to stop making. It’s like they’ve never heard of the Bayesian interpretation of probability theory. “Probability means Frequentism” is such a tired trope!
  
  If you got it from a technical manual, you can just say, “according to my technical manual,…”. Otherwise, it’s you speaking, no one else.
Nancy Lebovitz says:

May 2, 2013 at 5:11 am

I wonder if people who don’t like making up vaguely plausible probabilities are worried that they’re stabilizing inaccurate ideas.
- Mary says:
  
  May 2, 2013 at 6:26 am
  
  Oh, yes. It can be very dangerous to import more precise than you actually have.
  
  Not to mention feigning knowledge is not exactly something that people do rarely. And it is an observation as old as Plato that you can delude yourself as well as other people when you start feigning knowledge. Knowledge of one’s ignorance is a rare and precious gift.
- Creutzer says:
  
  May 2, 2013 at 12:06 pm
  
  I highly doubt that there is such a high-level though behind it. The phenomenon isn’t even specific to making up numbers. There is a much worse version of the sandwich thing.
  
  “All X are Y, do you agree?”
  “Yes.”
  “So, now I tell you that A is an X. So he must be Y, right?”
  “No, I haven’t seen him. He could be Y.”
  
  People without schooling just suck at abstract, hypothetical reasoning. Hence even the notion of logical consequence is alien to them, and they resist playing along with the game.
  - Creutzer says:
    
    May 2, 2013 at 12:06 pm
    
    Sorry, should be “he could be non-Y”.
David Chapman says:

May 2, 2013 at 4:43 am

Seems like if you are going to make up numbers, you have the responsibility to do a sensitivity analysis to see how your conclusions change depending on assumptions.

This is particularly true in a decision-theoretic context, where infinitesimal differences in inputs can give discontinuously large differences in outputs. You need to invert the model to find the subspaces of inputs that give rise to different classes of outputs.

Presumably there’s a word for this inversion in decision theory? I wouldn’t know because I’m not a fan of making up numbers.

Seems like if you are teaching people to make up numbers, you have a responsibility to do so only with the caveat that it’s OK to do that only if accompanied by the sensitivity analysis.
- Creutzer says:
  
  May 2, 2013 at 7:42 am
  
  Is there a particular piece of reading that you could point me to so I can figure out what this business of infinitesimally different inputs is about? Right now I can’t incorporate this in my network of ideas.
  - Ben L says:
    
    May 2, 2013 at 7:50 am
    
    Would it be deciding between two different things that are roughly equal, and you give *all* your money to the better one?
  - David Chapman says:
    
    May 2, 2013 at 7:50 am
    
    Let’s say your decision criterion is f(x) > g(y). (Maybe f and g are utility functions.)
    
    Say f(x) and g(y) are both 5 based on your made-up numbers. Then the criterion says “don’t do it”. However, any increase at all in f(x), however small, will change to that to “do it”.
- jimrandomh says:
  
  May 2, 2013 at 10:50 am
  
  Made up numbers plus a sensitivity analysis is better than made up numbers alone, but telling people that it’s required would tend to deter them from using numbers at all, which is worse. Since middling numbers like 10% probability tend not to fail too badly, I’d go with a lesser caveat like “be wary of counterintuitive conclusions you reached using any probability below 10^-4, because there’s a higher chance it’s radically wrong”.
  - David Chapman says:
    
    May 2, 2013 at 11:21 am
    
    Mmm. I’d worry that the take-away would be:
    
    “I took an advanced rationality class, and they said it was totally OK to invent numbers when you are arguing with people who are wrong. There’s a bunch of math that proves it! I didn’t totally understand the details, but those CFAR people sure sound confident.”
    
    Presumably it depends on what sort of people you are teaching. Or, “things are complicated.”
    
    You seem to have stated a heuristic: “middling numbers like 10% probability tend not to fail too badly”.
    
    Any support for that?
  - Mary says:
    
    May 2, 2013 at 7:02 pm
    
    Given the amount of innumeracy in this country, I’m not sure that’s a bad thing. I still remember the episode of Cosmos where Sagan conjured one in a thousand as the probability for every element in Drake’s Equation as if it were reasonable. You note it’s much higher than your odds of winning the lottery.
- ThrustVectoring says:
  
  May 2, 2013 at 1:17 pm
  
  Clarifying question: by “invert the model”, you mean “figure out what prior distributions support each answer to the question”, right? So if you assume a 25% chance of a medication working, and that results in not prescribing the medication, “invert the model” would be figuring out what percentage chance of the medication working you’d need in order to prescribe it.
  - David Chapman says:
    
    May 3, 2013 at 1:12 am
    
    Yup!
    
    (Doesn’t this have a word in decision theory? If not, I declare Bayesian decision theory broken. Luckily, the fix is simple.)
    - David Chapman says:
      
      August 25, 2013 at 4:37 pm
      
      While preparing another post on this topic, I’ve found that this kind of sensitivity analysis is indeed a known technique. It’s called Robust Bayesian analysis. FTW!
- Scott Alexander says:
  
  May 2, 2013 at 2:34 pm
  
  I agree that doing sensitivity analyses is better than not doing sensitivity analyses, maybe with the same caveat as Jim. I think the goal would be to test whether the expected arbitrariness of your number-making-up process is greater than your expected error from doing things the intuitive way.
- DanielLC says:
  
  August 4, 2014 at 10:31 pm
  
  You can do a simple sensitivity analysis by making up numbers more than once. If the answer changes too much, then don’t rely on the numbers.
- DanielLC says:
  
  August 4, 2014 at 10:39 pm
  
  That is not what utilitarianism is. Utilitarianism does not say that felicific calculus is how to solve your problems. It says to solve your problems. Deontology says to use certain methods. Virtue Ethics says to do things for certain reasons.
  
  If you are deciding whether or not to use felicific calculus based on which method will have superior results, you are already a utilitarian. If you decide against felicific calculus based on the idea that it’s heartless, then you are not a utilitarian.

Blogroll

Economics

Effective Altruism

Rationality

Science

SSC Elsewhere

Archives

If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics

65 Responses to If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics

Meta