SELF-RECOMMENDING!

# 2015 Predictions: Calibration Results

Many people are making predictions for the new year around now. I see fewer people going back and grading the accuracy of their predictions about last year, even though that obviously has a lot of relevance for how seriously we should take this year’s predictions.

In keeping with the SSC tradition, here are the results for my (late) predictions for 2015. Successful predictions are normal-looking, failed predictions are crossed out:

World Events
1. US will not get involved in any new major war with death toll of > 100 US soldiers: 70%
2. North Korea’s government will survive the year without large civil war/revolt: 95%
3. Greece will not announce it’s leaving the Euro: 60%
3. Neither Russia nor Qatar will lose their World Cups: 80%
4. Ebola will kill fewer people in second half of 2015 than the in first half: 95%
5. No terrorist attack in the USA will kill > 100 people: 90%
6. Assad will remain President of Syria: 70%
7. Israel will not get in a large-scale war (ie >100 Israeli deaths) with any Arab state: 90%
8. Syria’s civil war will not end this year: 80%
9. ISIS will control less territory than it does right now: 70%
10. ISIS will continue to exist: 80%
11. Iran will reach a deal with the West on nuclear weapons: 80%
12. No major civil war in Middle Eastern country not currently experiencing a major civil war: 90%
13. Iraq’s situation not to get any worse (eg gov’t collapse, new rebellion): 60%
14. Obamacare will survive the year mostly intact: 60%
15. Hillary Clinton will be the top-polling Democratic Presidential candidate: 95%
16. Jeb Bush will be the top-polling Republican candidate: 50%
17. Trans-Pacific Partnership to pass at least mostly intact: 60%
18. US official unemployment rate will be less than 7% in Dec 2015: 95%
19. Bitcoin will end the year higher than $200: 95% 20. Oil will end the year greater than$60 a barrel: 50%

Personal Life
21. SSC will remain active: 95%
22. SSC will get fewer hits in the second half of 2015 than the first half: 60%
23. At least one SSC post in the second half of 2015 will get > 100,000 hits: 70%
24. Shireroth will remain active: 90%
25. I will remain at my same job through the end of 2015: 95%
26. There will be no further ramifications or lawsuits from either side over the flooding of my house: 80%
27. I will reach my savings target: 90%
28. I will get a score at >95th percentile for my year on PRITE: 50% (unknown, haven’t gotten score back)
29. I will be involved in at least one published/accepted-to-publish research paper by the end of 2015: 60%
30. I will not break up with any of my current girlfriends: 80%
31. I will not get any new girlfriends: 50%
32. I will not finish [project]: 60%
33. I will attend NYC Solstice ritual: 80%
34. I will flake out of my plan to lead some kind of Solstice Ritual myself: 60%
35. I will be living in the house I’m currently trying to arrange to rent: 70%

Scoring
Of items I marked as 50% confident, 0 were right and 3 were wrong
Of items I marked as 60% confident, 4 were right and 4 were wrong
Of items I marked as 70% confident, 4 were right and 1 was wrong
Of items I marked as 80% confident, 5 were right and 1 was wrong
Of items I marked as 90% confident, 4 were right and 0 were wrong
Of items I marked as 95% confident, 7 were right and 0 were wrong

As usual, the dotted line represents perfect calibration; the closer my blue line comes to that, the better I’m doing.

The graph looks like there’s a massive failure at 50%, but this is just an artifact of very few questions at that level. If I’d gotten just one more right, I would be at 33%, ie as close to 50% as it’s possible to get with a set of three. Given that the difference between total success and total failure was just one question, I don’t feel too bad about total failure. Everything else looks pretty good. I’m prepared to call this another successful year.

A side note: Scott Adams has also graded his predictions from the past year, and reports incredible success: 9/9 correct despite going way out on a limb and saying things everyone else found really unlikely (like that Trump would stay Republican front-runner). The obvious way to accomplish like that is to make lots of things that vaguely sound like predictions, then only highlight and count the ones that end up correct; after a quick scan of Adams’ blog, there’s no sign that he’s doing this; his win seems pretty genuine. Another method might be to make vague predictions and grade them in your favor, and there is some sign of this – for example, someone going off Clinton’s poll numbers versus Sanders could say they are in fact still going up. Nevertheless, it’s obviously been a good year for Adams, and I’d be fascinated to see him make a list of official concrete predictions for 2016, all in one place, maybe even associated with confidence levels. [EDIT: Someone else’s more pessimistic analysis; this makes it more pressing that he do everything beforehand this time]

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

### 173 Responses to 2015 Predictions: Calibration Results

1. Ted Sanders says:

Couldn’t your 50% answers be equivalently considered all correct? I’m not sure what to think about the implied difference between the declared and undeclared sides of a 50:50 binary proposition.

• Douglas Knight says:

He does have some flexibility in choosing which side he reports positively and you could imagine that there are patterns. Well, I used to, but now I’ve forgotten. In the opposite direction, where people do not choose the phrasing, you could imagine that the phrasing biases them and makes them systematically over- or under-confident at 50%. Indeed, I believe that people on predictionbook.com have very large deviations from 50%.

Scott could solve this problem by eliminating the 50% bucket.

• Scott Alexander says:

Doesn’t matter. Whether I grade myself as 100% or 0%, I’m still equally far away from the 50% where I want to be.

• Factitious says:

If “I will not get any new girlfriends” had been “I will get at least one new girlfriend” instead, would you still have given it 50/50 odds? If so, that rephrasing would have brought you up to 33%. It’s odd for the difference between total failure and total success to be the difference between thinking there’s a 50% chance of P, and thinking there’s a 50% chance of ~P. Although I can understand this pretty clearly when imagining someone giving thousands of 50% judgments, all for predictions that end up coming true.

Is the failure interpretation that you may be looking at a claim, judging it to be less likely than not, but rounding that up to 50% without realizing that’s what you’re doing? I wonder if using the negations of the 50% statements would have avoided this; it’s not obvious to me that there would have been an exactly symmetric process of judging it to be likely on an unconscious level and then under-reporting that at 50%.

Of course, putting both P and ~P on your list doesn’t make much sense, because you’d see the easy way to boost your record on 50% judgments. And if there was an effect where your assessments on them individually didn’t add to 100%, you’d notice that as a mistake and correct it when assigning the probabilities. Are there safe and reliable ways to induce amnesia?

• HeelBearCub says:

If “I will not get any new girlfriends” had been “I will get at least one new girlfriend” instead, would you still have given it 50/50 odds?

This seems like the absolutely key point. These are predictions that come out correct or incorrect based on phrasing. It seems like a very large flaw for any question that is truly binary.

The cost of oil seems to fall into this trap as well.

The Bush question isn’t really binary. There are many possible candidates who could lead the polling. The negation prediction (choosing the whole field over Bush) seems more problematic, though.

• Scott Alexander says:

Hmm…that makes sense. So how would I make predictions at 50% without having that problem?

• Zach Pruckowski says:

Don’t make 50% binary predictions, make 51% (or 50.1% or 50.01%) binary predictions. The fact that you’re defaulting to phrasing it the way you are is likely evidence that you believe that side of the “bet” to be marginally more likely anyhow, making a 50.01% prediction instead of a 50% prediction just formalizes that.

• Make the prediction in the form:

There is a .5 probability that event X will happen.

Similarly for Y and Z

Then report, at the end, what fraction of your .5 probability events occurred.

No need to put in terms of a prediction failing or succeeding.

Does that create problems with your general scheme for evaluating predictions?

• Johannes says:

This does not make sense, I don’t think there is anything special going on at 50%.

Yes, the phrasing is arbitrary, but after Scott has made his choice, he’s made his choice (think of the positive or negative phrasing as being chosen by 50% chance as well). Evaluating the predictions is straightforward, same as in every other case, and it’ll give the right value for Scott’s calibration. (Of course one is not allowed to change the phrasings ex-post, that would be cheating)

• HeelBearCub says:

Think of making a true binary prediction of the form: “The flipped coin will land on heads: 50%”

What is this? It’s a prediction that the coin is fair.

Predicting truly binary events at 50% isn’t useful for calibration because each coin you are flipping is unique. You can’t know that the coin is fair.

• Chrysophylax says:

As I said in a different comment, the problem isn’t the 50% predictions, it’s how you’re rating your accuracy. Use prob(P)/prob(~P) for all confidence levels (comparing to observed P/~P) and your problem goes away.

• HeelBearCub says:

@Chrysophylax:

I don’t believe it does.

Imagine a large pile of coins, each with two sides Each coin is marked with a unique set of letters, identifying the individual coin, the same on bother sides.. It is also marked with a unique picture on each side.

So, one coin might be AA, the sides being hammer and screwdriver. Another might be BF, the sides being apple and orange.

You carefully go through the pile and make a prediction for each coin, AA through ZZ, with 50% confidence. You then flip all 26^2 coins once.

Even if every coin was 100% biased, but undetectable, you would still expect to get a 50% match rate.

• Johannes says:

@HeelBearCub

Your labeled coin example is convincing, but I still feel confused… The construction doesn’t work for 50.01% predictions, right, so these are fine? But say that Scott made just the same predictions, but assigned 50.01% instead of 50% — the graph and the conclusions would be (near) identical, but this time valid, or what?

• Troy says:

Imagine a large pile of coins, each with two sides Each coin is marked with a unique set of letters, identifying the individual coin, the same on bother sides.. It is also marked with a unique picture on each side.

So, one coin might be AA, the sides being hammer and screwdriver. Another might be BF, the sides being apple and orange.

You carefully go through the pile and make a prediction for each coin, AA through ZZ, with 50% confidence. You then flip all 26^2 coins once.

Yes — it tells you that you are properly following the Principle of Indifference. If you had assigned a different probability, you would probably be ill-calibrated.

• HeelBearCub says:

@Johannes:
I think the issue with a 50.01% prediction is that the number of those predictions (of binary events) you would need to make before the result became significant would be very, very large.

People are concentrating on 50%, but that is not the correct way to look at this.

Imagine finding every primary contest that had three candidates registered. Say there 1000 such contests. The day before the election, you randomly pick one of the three candidates in each election and say you are 1/3 certain that said candidate will win. You would expect your result to indicate you were “well calibrated” but it does not indicate anything about your predictive abilities.

The problem is predicting 1/n certainty when there are n discrete outcomes.

• HeelBearCub says:

@Troy:
Yes, I am giving examples of the Principle of Indifference. The key point is that the Principal of Indifference applies when you have no reason to pick between the outcomes. Applying the POI means you are saying “I have no information about the probabilities of this set of possible outcomes other than than the enumerated set itself”.

• Johannes says:

@HeelBearCub
Right, I agree that predicting 1/n in the case of n discrete outcomes is the same thing.

But it’s still very surprising to me that predicting 1/n + epsilon would somehow be qualitatively different from predicting 1/n — lets just assume that the sample size is huge, so significance is not an issue. This seems like a very unexpected kind of discontinuity to me, where everything is well defined near 1/n, but becomes degenerate at exactly 1/n.

• Johannes says:

In any case, this “principle of indifference” reasoning seems not apply in any of Scott’s three 50% predictions: Jeb Bush is only one of many candidates, and the number of new girlfriends could have been 0,1,2,…

Finally, for the price of oil, the futures price (at the time of the prediction) will tell us if Scotts > $60 prediction was “non-trivial” or not (EDIT: I saw now that the price was almost exactly at$60 when Scott made his prediction, so I guess this one does qualify for principe of indifference-reasoning)

• Bassicallyboss says:

I don’t know how you generate your predictions, but I’m assuming you generate the prediction first, and then assign the confidence. In that case, it might be as simple as reflecting on the prediction’s inverse when you are about to assign 50% probability. “It looks like I think the chances of [x] are 1/2. Do I really think the chances of [~x] are 1/2?”

Especially if it’s about matters that you might have a strong preference for, like acquiring new girlfriends.

• Anonymaus says:

The problem with 50% predictions is that for every prediction p you could flip a coin to decide whether to predict p or ~p. This would get you close enough to 50% in the long run, with no actual predicting going on. As a side-effect, nobody could prove that you were less than maximally ignorant about the predictions that you were making. (You can cheat at confidence levels >50%, too, but for that your a-priori predictions need to be accurate.)

You can “score” yourself by trying to maximize the sum of log(assigned probability of event that ended up happening). If you give an event and its complement probability 0.5, you end up getting the same score no matter what happens. You end up wasting the opportunity to earn more (expected) points that you could have gotten by assigning non-0.5 probabilities to something you are actually less than completely ignorant about.

There is nothing wrong with 50% confidence predictions, except that you might want to ask yourself “Do I maybe have more information about the outcome in question?” — but this is the question you should ask yourself about every one of your predictions.

• HeelBearCub says:

@Johannes:
I don’t think there is a discontinuity.

The limit for the number of trials required to achieve statistical significance on your calibration approaches infinity as the difference between the prediction and indifference approaches zero.

I agree on your assessment of Bush and the price of oil. I think his prediction for girlfriends takes the same form as the price of oil. He predicted no rise from the current number.

• HeelBearCub says:

I do think there is something going on with the girlfriend prediction that involves status quo.

If you looked at all of those coins as the sat on a tabletop (knowing what side was showing, and knowing what side was down) making a prediction there at 50% confidence that the coin would spontaneously change sides would be weird. But of course if you predicted all of them randomly (flip spontaneously, don’t flip) at 50% confidence, you would still be at 50% right. That seems very wrong.

Hmmmm.

• Jeremy says:

I don’t see a problem here. 50% confidence = I have no idea and should be used in that circumstance.

Flipping the phrasing could be problematic if you did it selectively to deliberately manufacture a 50% success rate, but that would defeat the purpose of self-calibration. If you are doing it to claim great calibration then, great, you can convince people that you are indeed as ignorant as you claim.

• HeelBearCub says:

@Jeremy:
The problem is that you don’t get to repeat trials of the same prediction, and therefore you aren’t calibrating anything. Again, you should get a “perfect” calibration score by predicting these types of questions randomly.

• Jeremy says:

Sorry I think I edited my post after you replied and removed the 100 trials bit. But I still don’t see an issue with what you’re saying. If you’re predicting randomly and you get 50% that’s fine. That’s exactly what 50% means. Pure ignorance, pure coin flip, etc. You aren’t going to get any more meaningful information than that you had no idea about the outcome. 50% is not a prediction, it’s an admission of ignorance.

• Emp says:

Don’t think you can at all.

As some comments below demonstrate, you can generate perfect accuracy at 50% confidence threshold (over a large sample) simply by gaming how you phrase the predictions, irrespective of what it is you are predicting.

This issue doesn’t exist once you move to say, 80 or 90 where you actually need to know something about whatever you’re predicting.

I do think saying ‘there is a 50% chance X will happen’ is the exact formal equivalent of saying “I have no idea at all whether this will happen”.

When phrased this way, I think it’s clearer that trying to grade the quality of your ignorance is meaningless.

• Johannes says:

@HeelBearCub
Yeah ok you are right, there is no discontinuity.

I don’t see how there is anything special with the “status quo” of the number of girlfriends. Let’s say that new girlfriends arrive according to a Poisson process with intensity lambda / year. Then the random variable X = # new girlfriends between 13 Jun and 31 Dec is Poisson distributed with parameter 0.55*lambda, so P(X == 0) = exp(-0.55*lambda). So Scott’s prediction was tantamount to him estimating lamda at around 1.2 (given that one accepts the Poisson model, of course).

@Emp
“saying ‘there is a 50% chance X will happen’ is the exact formal equivalent of saying ‘I have no idea at all whether this will happen'”

This is certainly not true in general — e.g. if Scott had predicted that price of oil > \$100 with 50% confidence, this would have been a very bold prediction, and if that prediction truly was not overconfident, it would imply great predictive power that would be very easy to monetize.

• Emp says:

Johannes: Not so.

It WOULD be a bold statement.

HOWEVER.

Saying “I have no idea if the U.S.A. will exist in 2018 is also a bold statement” (because it seems obvious that it will)

Saying “X will be the case” (50% confidence) is LITERALLY the same thing as saying “I have no idea whether or not X will be the case”.

That sort of statement may or may not be bold or unlikely, but a 50% confidence interval is the exact same thing as saying “I have no idea”.

Edit: And my claim is that trying to assess or calibrate the claim “I don’t know” is pretty absurd.

• HeelBearCub says:

@johannes:
I think what I am getting with the “status quo” statement is that for some binary type predictions that are predicting confidence about a change from the status quo (i.e. things will/won’t continue as they are right now), the confidence level for “things continue as they are” should be fairly high.

IOW, if you look at a bunch of coins lying on a table, your prediction for a minute from now should be fairly confident that they will not flip from one side to the other.

The price of a commodity is volatile, so it doesn’t fall into this. Perhaps what I am pointing at is that, if we have knowledge about whether a current status quo is not typically volatile, then predicting a 50% confidence of change is actually saying something.

Status quo on number of girlfriends is far less volatile than the price of oil. So it seems like their might be some “there” there.

But as I point out, if you completely ignore this and predict random changes from the current state with 50% confidence, then you are still likely to achieve a very good calibration score, which seems to show some limits to what you can accomplish with this approach.

• Chrysophylax says:

@HeelBearCub:

You’re confusing accuracy and calibration. Alice and Carol make predictions about a certain set of outcomes. Accurate Alice predicts outcomes with certainty (P(A)=1 or P(A)=0) and is right 95% of the time. Calibrated Carol predicts outcomes with probability 60% (P(A)=0.6 or P(A)=0.4) and is right 60% of the time. Accuracy is about being right often; calibration is about knowing how confident you should be that your predictions will come true.

Your coins thought experiment is flawed because *calibration is about knowing that coin flips are fifty-fifty*. Imagine a person who has been raised to have an utter, honest faith – a true belief, not just a professed belief – that flipped coins always land on their edges. Imagine their shock when 676 coin flips prove that coins actually come down on their sides about 99.75% of the time. That’s what learning that you are seriously miscalibrated feels like – the shock of discovering that a certainty is not certain, or that an unpredictable thing can in fact be predicted.

Your thought experiment is asking how many coin flips I predict correctly – how accurate I am. What you ought to be asking is how confident I am that my predictions will come true, *compared* to the rate at which they actually come true – how calibrated I am.

Try testing my method! You’ll see that it works. The reason that it’s the only suggestion so far that actually fixes the problem is that flipping A and ~A changes the buckets the predictions go into. If more than 60% of my P=0.6 predictions came true, I can hide it by changing some of the true predictions into P=0.4 predictions of the opposite outcome, which will lower my hit rate in the P=0.6 bucket but raise it in the P=0.4 bucket. The special feature of the P=0.5 bucket is that it is it’s own complement! My method tracks complementary buckets together, ruling out trickery with flipped predictions.

• Chrysophylax says:

@Emp:

I’m afraid you’re totally wrong about the equivalence of “P(A)=50%” and “I have no idea whether A will happen or not”. The equivalent of the latter is called a maximum entropy probability distribution.

The former is a very precise statement of exactly how uncertain I am. The equivalent statement would be something like “If you looked across all possible worlds where this truth claim would be meaningful, I expect it to be true in exactly half of them” or “Of the set of *independent* predictions I have made about which I feel exactly this level of confidence, I expect exactly half to prove true”.

• RCF says:

@Chrysophylax

Can explain how you think Emp is “wrong”?

• James Picone says:

@RCF:
Not Chrysophylax, but I think what he’s getting at is that if you say “Probability I roll a 6 on this d6: 50%” you are saying that you know something about what you’re going to roll.

• Chrysophylax says:

@RCF, @ James Picone:

James has understood correctly. P(A)=X is a strictly stonger claim than 0<P(A)<1. You need extra information to make the stronger claim. A person with no information can't justifiably make any claim stronger than "P(A) is a probability". The equivalent for a distribution, where something could take any of a set of values and you have now information allowing you to justifiably declare some more likely than others, is a maximum entropy distribution.

If you think that P(A)=0.5, you think A and ~A are equally likely. This is not the same as having no idea what P(A) is.

This is all conceptually much easier if you think in terms of the evidence you need to get from maximum entropy to your posterior. If you have absolutely minimal information, your prior and posterior are both the appropriate maximum entropy distribution. Every step away from that requires evidence, either implicit in a less-than-maximum-entropy prior or explicit in your updates. "This event seems like it could go either way" requires evidence that lets us concentrate probability mass around P=0.5.

• Emp says:

@Chrysophlax:

You’re confusing the semantics and thus still non-responsive. This is a great example of a lot of technical knowledge coupled with the wrong questions.

I’m not saying you have no idea what P (A) is if you say P (A)=50%

What I am saying is if you say P (A)=50% that means you have no idea whether or not A will happen and can do no better than guessing randomly.

I really wish you would respond to this obvious core fact instead of twisting the semantics subtly and responding to a different, obviously wrong claim.

• Johannes says:

@Emp

Ok, but then you go on to say that:
“And my claim is that trying to assess or calibrate the claim ‘I don’t know’ is pretty absurd.”

But it’s not absurd: e.g. if you said “I predict rolling a 6, confidence 50%” before every roll with a (fair) die, and after a number of experiments graded yourself, it would soon become evident that your predictions are wildly overconfident!

• Emp says:

@Johannes:

Wow. This really does seem to be less obvious for many than I thought.

You’re all missing the same distinction.

50% probability isn’t an indication of confidence level. It’s a statement to the effect “I don’t know”.

Someone saying 50% on a dice roll is incredibly ignorant, not ‘over-confident’. When the claim itself is that nothing you know makes it more or less likely than chance that the event will happen, that’s an admission you don’t know anything.

This is what you sound like.

You: What is the probability Abhijit Gupta beats Magnus Carlsen in a chess game?

Me: (No idea who these people are) 50%

You: That’s wildly overconfident. If you analyze their ELO ratings there is only a 5% chance of that happening.

Do you see what’s problematic with this interaction? Admitting a lack of knowledge on the VERY specific question (you and Chrysophlax are making the mistake of confusing a 50% prediction with knowledge about a broader question or topic) being asked inherently means you don’t know.

Other than on isolated things like coin tosses, making an assertion with 50% probability is NOT the same thing as asserting with 100% certainty that the true probability of that thing is 50%.

• timujin says:

It doesn’t make sense. Grading yourself on accuracy of 50/50 predictions doesn’t make sense in any case, because it depends not on what actually happens, but on how you formulate the predictions.

Your grade will always be purely random; there is nothing you could have done in advance to improve the expectation of your score (short of learning more about the topic and moving your prediction closer to extreme values).

The score is not how many of the 50% predictions come true, but how close the ratio of true predictions is to 50%. So if 50% are true, that’s perfect success. If either 0% or 100% are true, that’s total failure, since either of these values are at a maximum distance from 50%.

• Alex Z says:

Right, but “There is a 50% chance that I will not find an extra girlfriend” is literally equivalent to “There is a 50% chance that I will find at least 1 extra girlfriend.” So the score depends on the irrelevant phrasing of the predictions, not the actual prediction. If he got 50% of those predictions correct, he could simply rephrase the predictions to be 100% or 0% correct.

• HeelBearCub says:

More to the point, he could have rephrased some of them and changed his accuracy to 68% or 33%.

• Nornagest says:

As long as the wording stays consistent between a year ago and now, I can’t see how it matters. Yes, a 50% prediction of A generates an implicit 50% prediction of !A, but you shouldn’t be scoring yourself on that.

This complaint doesn’t make any sense. Of course you can formulate predictions like these in two equivalent ways, but first of all there isn’t any problem with this, as Nornagest has pointed out, and secondly this supposed problem is not specific to 50% predictions. For any event A and probability p, the statement “The probability of A happening is p” is equivalent to “The probability of not-A happening is 1-p.” For each case, Scott can arbitrarily chose which of these formulations to adopt. In the cases where p!=0.5, he has decided to always use the first formulation in case of p>0.5 and the second in case of p<0.5, and that decision is no less arbitrary than his decision on which formulation to use in cases where p=0.5.

• Jiro says:

I suspect there is some way to compare the degree to which event X confirms Scott’s prediction S, so that if Scott predicts 95%, S is confirmed strongly, if he predicts 55%, S is confirmed weakly, down to 50% where S is not confirmed at all. However, I’m not a mathematician and would have no idea exactly how to calculate this. It would require some concept of having a statement be partially correct and conditional probabilities based on partially correct statements and I don’t know how to do that.

• Chrysophylax says:

The difference comes when you flip the wording of some but not all of the predictions. If you predict P, Q and R all at 50% and are wrong in all three cases, ~P, ~Q, ~R will show just as poor calibration, but ~P, Q, R (et cetera) will give you a better score.

In general, if you make N predictions at 50% each, you can get your score to be exactly 50% if N is even or 50% +- (1/2N)% if N is odd, entirely by flipping P and ~P.

The problem isn’t that you’re making 50% predictions – if it were, we would have a giant mathematical problem! The problem is how you’re scoring yourself. If you used ratios of prob(P)/prob(~P), it wouldn’t be possible to game the scoring mechanism by permuting your explicit and implicit predictions.

• meh says:

I don’t understand how you an be graded at the 50% level anyway. If you flip a coin to decide which side to take on a bunch of predictions, you should be on average correct 50% of the time. So even if you are a perfect 50% on your 50% guesses, this does not mean you are properly calibrated at the 50% level.

This doesn’t seem to be a problem at other levels though… or am I missing something?

Your grade for a prediction of probability p is how close the ratio of “correct” predictions is to p. The situation for p=0.5 is exactly the same as for any other probability, since there are always two equivalent formulations, namely “I predict A with probability p,” and “I predict not-A with probability 1-p.”

• Emp says:

Nope. 50% confidence predictions are the same thing as saying “I don’t know”.

Measuring the accuracy of a guy who says “I don’t know” is patently absurd.

• hnau says:

There’s a trivially easy way to get near-perfect calibration at 50%: choose 100 arbitrary predictions and prepend “it will not be the case that” to each of them with 50% probability. So I’m having a hard time seeing how a 50% confident prediction could mean very much.

I suppose from a Bayesian pespective, assigning a 50% probability to something that the consensus is much closer to 0% or 100% on would be a meaningful prediction. But I’m not sure how one would evaluate that.

• meh says:

Ok, so I think something you can do with the 50% predictions is to compare them to some consensus or prediction market, and phrase them such that they are correct only if they are against the consensus opinion.

• jbay says:

Hi Scott,

I wrote a response to this post in the form of a Less Wrong discussion post, focusing on the issue of predictions at 50% confidence:

I hope you don’t mind.

2. stargirl says:

If you enter a 50% should only affect your accuracy score not your calibration score.

3. SolveIt says:

It looks like you’re wildly underconfident in your predictions for world events and wildly overconfident for personal life.

• onyomi says:

Seems so, and probably applies to many, if not most people. People probably overestimate the amount of control we have over the directions our lives will take in general, I would guess. But it’s interesting here to have some kind of objective confirmation of such, even with a sample size of 1.

4. name says:

Clinton’s poll numbers: shouldn’t one be going off poll averages, which are showing her at around 55% since November? (That poll has her at 50%). Maybe I misunderstand what top in her poll numbers means? Her lead also hasn’t increased from what I see.

• Zach Pruckowski says:

Clinton is leading in the polls nationwide and in 3/4 of the primary states, and is very likely to be the nominee. While technically vague (in that were he more politically inclined he’d have specified the exact metric), the prediction is almost certainly correct unless you extend it to “leading in every early primary state” or something.

5. uuu says:

>17. Trans-Pacific Partnership to pass at least mostly intact:
Hold it … agreement was reached. Isn’t it given that it is going to pass? (Or is it how did you define mostly intact)

• HeelBearCub says:

It has not failed or passed, seems like an incomplete to me.

I took pass to mean “gets ratified before the end of the year”, which hasn’t happened.

I’d say it’s not a given that it’ll pass congress, it’s bloody ugly and if someone wants to make political hay out of that before the election they can. It’s likely, but not given.

6. Luke G. says:

Scott Adams is starting to freak me out with this Trump stuff.

• Calo Cola says:

It’s odd, but out of the current front runners on the republican side, im pro-trump.

Carson…is a reincarnation of huckabee with an M.D, though I wonder how many educated neurologists are creationists.

Cruz…espouses beliefs which are ideological opposites of what seem to be Mr. Adams political alignment, and there’s reason to belief that he would act in favor of those during the presidency.

Trump…out of the three front runners…does not do those things.

Trump is oddly enough, the most “moderate” out of the three, and for one with a liberal orientation, could be supported.

• Virbie says:

> out of the three front runners

? Rubio is in the top 3, and has been every day since October (barring a 2-day period in December where Carson and Cruz crossed over each other).

http://www.realclearpolitics.com/epolls/2016/president/us/2016_republican_presidential_nomination-3823.html

• Julie K says:

> Trump is oddly enough, the most “moderate” out of the three

It’s a common mistake to assume that one can gauge how conservative a Republican politician is based on how much the other side hates him. (E.g., people often assume that Nixon and George W. Bush were very conservative.)

• Le Maistre Chat says:

Nixon is an interesting case. I’d say he was a moderate right-liberal on economics (to the right of a left-liberal anti-Communist like Kennedy, but still a Keynesian rather than a libertarian) and mildly socially conservative in ways that only make sense in the context of being raised evangelical and receiving higher education in the era before the Left achieved academic hegemony (his racism was fairly clearly Prof. Carleton Coon’s).

• “but still a Keynesian rather than a libertarian”

That’s an odd way of putting it. Keynesianism is a theory about macro-economics, not about what governments should do. There is no reason someone could not be a Keynesian and a libertarian, or a monetarist and a socialist.

Imagine someone who believes in the limited government of classical liberalism–which still collects taxes and spends money for police, courts and defense. He also believes in Keynes’ analysis of the causes of involuntary unemployment. So he wants his classical liberal government to run deficits when unemployment his high, surpluses when it is low. No inconsistency, and more libertarian than most people, even if short of anarcho-capitalist.

I have no idea what Nixon’s theory of macro was. I think it’s pretty clear that he knew price controls were a bad idea, but imposed them for political reasons.

• Le Maistre Chat says:

@DavidFriedman:

Imagine someone who believes in the limited government of classical liberalism–which still collects taxes and spends money for police, courts and defense. He also believes in Keynes’ analysis of the causes of involuntary unemployment. So he wants his classical liberal government to run deficits when unemployment his high, surpluses when it is low. No inconsistency, and more libertarian than most people, even if short of anarcho-capitalist.

You are certainly correct. I was just thinking of how his macro theory seems to be correlated with left-liberalism, whereas your father and his monetarism are perceived as on the right, and the Austrian theory as outside the mainstream.

• John Schilling says:

A Keynesian can put his theory to practical use only to the extent that his government A: collects economically significant amounts of money and B: spend a lot of it on things that could plausibly be curtailed when the theory calls for it. If you’re only collecting say 10% of the GDP in taxes, and spending it on things like courts and national defense, that gives limited scope for a Keynsian to do anything. If you’re taking 50% of the GDP in taxes and huge sectors of industry have been nationalized, one can do a lot more Keynesian-type stuff, for better or for worse.

So while David is right to say that a Libertarian can be a Keynesian, it’s not surprising that there’s a correlation between Keynesianism and socialist liberalism. Particularly at the level of openly advocating or implementing Keynesian policies, as opposed to privately holding it as a true theory of limited relevance to a minarchist government.

• Le Maistre Chat says:

Carson…is a reincarnation of huckabee with an M.D, though I wonder how many educated neurologists are creationists.

I have 95% confidence that more educated neurologists are creationists than uneducated neurologists are. Indeed, there are probably even more educated creationist neurologists than uneducated evolutionist neurologists.

• E. Harding says:

• But correct and funny.

On a more serious note, it’s worth remembering that one of the greatest scientists who ever lived was not merely a creationist—back when everyone was—but a religious nut by the standards of his own society.

A religious scientist who doesn’t understand biology is no more surprising than an atheist scientist who doesn’t understand economics.

• Le Maistre Chat says:

@DavidFriedman: While I am generally in favor of “great books” education, the specialized curriculum for MDs is already so long and challenging that I don’t think they should be forced to get that sort of general education too. I’ve read Wealth of Nations and Origin of Species and found the logical arguments persuasive, but when I need to see a doctor I only care about his ability to apply logic and evidence to making me well. If he’s a creationist who thinks international trade just steals surplus value from the poorer country’s laborers, oh well.

7. anon says:

Scott Adam’s predictions are not independent of each other, so I would be cautious about crowning him an infallible forecaster.

http://www.paulgraham.com/charisma.html

• Anonymous says:

The difference Graham’s theory and Adams’ theory is that Graham says that charisma is what makes the difference after policies have been carefully chosen to split the electorate, whereas Adams says that there is something called being a Master Persuader that gives people the power to convince a large part of the electorate that an idea or policy is correct.

Trump will be a test between the two as he is clearly not following the Republican establishment line on policy, so will probably only win if Adams is right rather than Graham.

(Although now that I think about it, Adams did say that part of the Master Persuader thing is basing your statements on what people are secretly thinking, and that you can’t just totally make stuff up, so perhaps the two models actually aren’t entirely different).

• eponymous says:

The other difference is that Scott Adams used his theory to predict something that almost nobody else predicted (Trump’s rise in the polls and continued success).

I guess you could say that Graham’s theory performed well in 2008 and 2012 (Obama was clearly more charismatic than McCain or Romney). I’m not sure it does so well in primaries though.

I’m also not sure that the theories are actually different. “Master persuader” and “Charismatic” seem like different words for the same thing.

• Calo Cola says:

>Obama won

He was far more charismatic.

> Romney won (he would not have, but)

Its all about that gumption on da campaign trail. Everyone just sensed it.

You need much much more specific words then a vague “charisma”

• HeelBearCub says:

@anon:
The problem with only looking at the end results of the process, and then deciding who had more charisma, seems fairly obvious to me.

First off, it fails to distinguish Charisma from “things are going well for my party”. Was George H. W. Bush inherently more charismatic than Dukakis? Would Nixon have won the presidency in 1968 if Wallace didn’t run as an independent?

But it also doesn’t grapple with primary results:
Why didn’t Reagan take the primary in 1972? Why did Mondale beat the primary field in 1984? Why did Gore win the primary in 2000? Why did Howard Dean falter vs Kerry in 2004?

This seems very “hindsight is 20/20”, “fit the model to the data and the data to the model”.

• onyomi says:

“Why did Howard Dean falter vs Kerry in 2004?”

Because he went “yeeeAAH!”

• HeelBearCub says:

True.

But Kerry didn’t become more charismatic than Dean via that one event. Rather, it shows the limits of a charisma only formulation

• Calo Cola says:

Define charisma, and split into sub-traits for any real predictive power.

As far as I can tell in general, this charisma element strongly correlates with a blend of four traits. Youthful looks, bodily aesthetics, height, sportiness(cage match president hypothesis) and general good looks(different from youth)

….all of these bode so so so horribly for sanders. And the problem with Clinton is that she faces an additional critique of dress up too much, don’t dress up enough.

You could probably make a combo blend of looks, height, cage-fight, add an additional weight to a powerful look in “against terror” times, and predict just from there.

• I don’t think any of the things in your list are what I see as charisma. It’s hard to define, but you feel it as the perception “this person is very important.”

8. anon85 says:

I’m a bit skeptical of this success in calibration – it seems pretty easy to achieve (especially when you have so few questions that the difference between 0% and 50% or between 90% and 100% is undetectable). What happens if you ask a reasonably well-educated but non-trained person to make such predictions? Do they do significantly worse?

Anyway, happy new year!

• Anonymous says:

>What happens if you ask a reasonably well-educated but non-trained person to make such predictions? Do they do significantly worse?

Based on what I’ve seen elsewhere in articles talking about calibration, most untrained people are very overconfident and so will do much worse than Scott did here.

• Joe Teicher says:

In those studies do people get to predict whatever they want or are they told what to predict? Getting to pick what you predict makes it a lot easier to be well calibrated.

9. Mair says:

“Of items I marked as 90% confident, 4 were right and 0 were wrong
Of items I marked as 90% confident, 7 were right and 0 were wrong”

Um, whit?

• Rachael says:

The second 90 is a typo for 95.

10. I predict my rationality will be bounded

11. Adams success is not compelling. His claim that the “low energy” shot killed Bush doesn’t even make sense. Bush was having serious trouble with his poll numbers before those comments, and I’m not sure what being killed even means without a number behind it. Note that the obvious definition of killed would be dropping out of the race, which hasn’t happened. Similar remarks apply to some of the other predictions. Overall, I’m not terribly persuaded.

12. Josiah says:

Scott Adams first “correct” prediction is that Trump would win the Republican nomination. Did that happen and I missed it?

• BD Sixsmith says:

Welcome to the year 2017. A lot of things have changed while you have been asleep…

13. JohannesD says:

5. No terrorist attack in the USA will kill > 100 people: 90%

Given the recent events, I can’t help but wonder what your confidence would have been if this prediction had been about the whole West, not just the USA.

• Wrong Species says:

Europe has a far higher rate of muslims than the US so it makes sense that he wouldn’t give a high confidence to his prediction coming true in Europe. 70% still seems pretty high to me though, even if it’s only the US.

This seems like it’s probably true in 9 out of 10 years in the U.S. over historically similar times. Isn’t that what 90% means? He’s eventually going to be wrong one year, but only one year out of ten.

• Exa says:

Only if you’re a frequentist. I’m pretty sure Scott’s at least Bayesian-aligned, so 90% means “given all the information available to me, I think that the probability that I will observe the inherently one-shot event of ‘there not being a terrorist attack in the us with >100 casualties in 2015’ is 90%.” Long-term frequencies have barely anything to do with the probability of a terrorist attack /now/.

14. Randy M says:

Like last year lets turn your impression of how good the result is into a number. Last year you got -0.6 (higher better). This year you got exactly the same! Well done!

“Better at predictions than Gwern for the second year running”.

• Oscar_Cunningham says:

This score measures how calibrated you are but also how much you know about the question you chose to make predictions on. Is there a way of decomposing this score as calibration+knowledge, so that we can measure calibration separately?

Not sure what knowledge means in this context. Maybe if you are very well calibrated then your knowledge is just how high the probabilities you assign to your claims are. If so then this score gives your calibration, and your knowledge could be the average probability assigned?

16. Deiseach says:

I’m sorry to hear “There will be no further ramifications or lawsuits from either side over the flooding of my house: 80%” is a failed prediction. Crappy landlord still at it?

I wish we could send some of this constant rain to California. Going into our fourth continuous month of flooding is not funny anymore. We’ve gone from Storm Abigail to Storm Frank (a naming system only officially adopted in October 2015), and for a while there they were considering if we’d get up to Storm Gertrude, and it’s only just turned 2016!

• Dain says:

Please don’t! I have a leaky roof. There’s a fungus growing right now, in fact.

• Scott Alexander says:

My landlord issues continued after I wrote that predictions post, but have now been resolved in a pretty final fashion (knock on wood)

17. Bram Cohen says:

Part of how Scott Adams got 9/9 is that he made highly correlated predictions. They’re mostly Trump. They’re also mostly things I would have predicted as well just based on a minimal watching of the Republican field and seeing how well or badly the candidates were presenting themselves. The conventional wisdom is (was?) based on a belief that the republican party establishment is capable of picking their own candidate and cramming it down the general public’s throat. That ability seems to have been lost.

• Scott Alexander says:

I agree they’re highly correlated (though not infinitely correlated).

I don’t let myself say “I could have predicted this” unless I in fact did.

18. Muga Sofer says:

Given that the 50% questions could also be considered all “correct”, it looks like you were somewhat overconfident about everything.

• Scott Alexander says:

Do you mean “underconfident”?

• Exa says:

I don’t think he did, because if you count your 50% success rate as being 100% (an act just as valid as declaring it to be 0%, or 33%, or 66%), then you were above the line (overconfident) at every confidence level except 60%, and you were only barely below the line there.

19. Max Bloom says:

Regarding #12 – what about the civil war in Yemen? According to Wikipedia, started on March 19th, which was a few months before your predictions were released, so I suppose you might have discounted it because of that.

• E. Harding says:

The Yemen war began in 2014, when the Houthis captured Sana’a.

20. Banananon says:

Can you confirm/deny/issue no-comment that [project] was/is Unsong?

• onyomi says:

He did already confirm it, I believe.

• Nonnamous says:

So does that mean “Unsong” is already complete, and just being released in installments?

• Scott Alexander says:

Yes, it was.

21. ShemTealeaf says:

I think Scott Adams is significantly misrepresenting his prediction record:

1) Trump has gained some popularity, but he hasn’t won anything yet. I give Adams a lot of credit for his prediction of Trump’s longevity, but he didn’t just predict that Trump would gain popularity, or even that he would win the nomination. He predicted that he would win the general election in a historic landslide. His predicted share of the popular vote (65+%) would be the largest in history. Even if Trump wins the nomination, and easily wins the presidency, Adams could still be wrong. Obviously, if he turns out to be correct, that’s an extremely impressive prediction. (http://blog.dilbert.com/post/131749156346/the-case-for-a-trump-landslide-part-1)

2) Bush’s poll numbers were slowly dropping before Adams said anything about ‘low energy’ (http://blog.dilbert.com/post/127715904536/trump-persuasion-alert-the-bush-slayer-comment). They have continued to drop, but the RealClearPolitics average doesn’t show any particularly dramatic change.

3) Adams claims that Fiorina self-immolated in her debate performance (http://blog.dilbert.com/post/129289732811/carly-fiorina-and-the-wizard-filter). My reading of the poll numbers show that Fiorina was consistently polling in the mid-single digits, had a big boost for about a week as a result of the debate, and then returned to the mid-single digits, and slowly dropped a bit lower.

4) Adams gets some credit for Carson, but he did not actually make a prediction here. The blog post in question (http://blog.dilbert.com/post/133133670296/trump-on-carson) praises Trump’s speech, but it does not make any prediction about the fate of Carson.

5) To his credit, Adams made a specific prediction here (http://blog.dilbert.com/post/135152311211/calling-the-clinton-top-trump-persuasion-series), saying that this was the high point of Clinton’s poll numbers in a matchup against Trump. According to Wikipedia (https://en.wikipedia.org/wiki/Nationwide_opinion_polling_for_the_United_States_presidential_election,_2016), there have been seven polls since this prediction on 12/13. I list the Trump/Clinton margins here, compared to the previous margin from each pollster:

Rasmussen – went from Trump +2 in October to Clinton +1 in late December
CNN/ORC – went from Clinton +3 in late November to Clinton +2 in late December
Ipsos/Reuters – Clinton +12 in late December (no previous data)
Emerson College – went from Trump +2 in October to Clinton +2 in late December
Quinnipiac – went from Clinton +6 in late November to Clinton +7 in late December
Fox News – went from Trump +5 in November to Clinton +11 in late December
Public Policy Polling – went from Clinton +1 in November to Clinton +3 in late December

Everyone can make of this what they want, but I’m not quite seeing how Clinton’s margins over Trump are slipping.

6) I can’t find any prediction about Trump making a ‘nice guy’ move, but perhaps someone with more patience can find the post that Adams is referencing.

7) Adams mentions the ‘President Trump’ thing here (http://blog.dilbert.com/post/135152311211/calling-the-clinton-top-trump-persuasion-series), after it has already happened. He was correct in that it did happen again, but that’s an observation much more than it is a prediction.

8) Adams correctly predicts that he will be ignored by the mainstream media. Given that he is making outlandish predictions (65+% of the popular vote goes to Trump), claiming a 60% chance that he can stop Trump from being elected (http://blog.dilbert.com/post/134861704021/my-offer-to-stop-donald-trump), posting sexual hypnosis stories (http://blog.dilbert.com/post/136330131311/hypnotizing-you-to-have-the-best-new-years-day), and generally not behaving like a serious analyst, that’s not exactly a shock.

9) He also correctly predicts that there will be competing explanations for Trump’s continued success. This seems very obvious to me, but maybe I’m being uncharitable here.

• Thank you for taking the time to analyze these claimed predictions.

• Jeremy says:

My hunch is that Adams is engaging in a bit of optioneering with his Trump predictions. There is very little downside to his predicting a Trump nomination, win or landslide win. But there is big upside. And the more unlikely the general punditry think the claim is the bigger the upside. If he’s wrong he can just say “oh well I was right up to a point and then xyz” but if he’s right then he gets massive, massive credit for being the only one to predict it.

This is even more transparent with his “stop Trump” offer. In the unlikely event that millions of people are willing to contribute money to the prize pool, and in the un/likely event that Trump does not win, he gets exposure to a massive pay-off with little or no downside.

• anon85 says:

The downside to making stupid predictions is that he looks like an idiot. My opinion of him changed from “guy with the name Scott A. that people seem to respect” to “crackpot who believes in hypnosis and can be safely ignored”.

• My guess was that the “stop Trump” post was a joke, and possibly the hypnosis one as well, but I haven’t read enough of his blog to be at all confident of that interpretation.

• ShemTealeaf says:

I agree to a certain extent, but the fact that it’s not clear is part of the problem. He sometimes puts up a disclaimer that his blog is intended as entertainment, and he sometimes complains that people don’t take him seriously. Saying outlandish things in a half-joking tone so that you can easily walk them back if necessary seems like cheating to me.

• Jeremy says:

You and I might write him off after this but I have a feeling his normal readership will give him a lot more leeway. If he’s wrong can easily say he was just putting an interesting theory to the test and that being wrong doesn’t matter cause he was only X% sure anyway. I maintain he loses little by making bit bets like this, but gains a lot if they true. He’s already getting coverage for his predictions.

As for the hypnosis thing I definitely read that as true. And hypnosis per se isn’t fake or pure crackpottery as far as I know, although it a weird thing to have as a hobby.

• anon85 says:

Hypnosis is basically fake and crackpottery, at least to the extent where any persuasion ability or suggestible trance state is claimed. I view it as the single greatest “conventional absurdity” out of all beliefs commonly held by intelligent people.

An example of the absurdity: today’s world revolves around persuasion in various forms (e.g. ads on the internet). Yet somehow, no one seems to care about hypnosis or teach it in marketing classes.

• ShemTealeaf says:

Is hypnosis widely acknowledged to be fake? I thought it was at least somewhat legitimate.

• anon85 says:

Hypnosis can mean various things, but what most people think of (a suggestible hypnotic trance state) is pretty much fake. Here’s a wikipedia article: https://en.wikipedia.org/wiki/Stage_hypnosis

In particular, the idea that there’s a magical persuasion ability is so inconsistent with our everyday observations that it boggles my mind that anyone can believe it. Like, why don’t salesmen use hypnosis? Why don’t con artists? Why don’t presidents get trained in hypnosis resistance? Why don’t advertisements try to hypnotize us on the internet?

• ShemTealeaf says:

I always heard hypnosis described as something that requires willingness on the part of the hypnotized party. If I have to sit perfectly still and try to enter into a hypnotic state for five minutes, it’s going to be very difficult for anyone to hypnotize me against my will. I’ve never heard anyone claim that they can hypnotize people against their will.

On the other hand, I’ve never seen any compelling evidence that hypnosis is real.

• anon85 says:

Okay, if hypnosis is real but requires consent, then at least we agree that Scott Adams cannot use it to stop Donald Trump (and that Trump himself cannot use it).

Still, there are some issues with the real-hypnosis hypothesis: why don’t people use it to treat (say) addiction or other medical ills? Why do we not hear of hypnosis-centered rehab centers (indeed, why aren’t they the *only* rehab centers)? Why don’t business owners encourage their employees to take hypnosis sessions to improve efficiency?

In other words, I observe that hypnosis is almost nowhere to be found in capitalism, and this is strange, since most true things can find *some* application. Relevant xkcd: https://xkcd.com/808/

• Douglas Knight says:

Hypnotherapists do exist. They mainly treat addiction. More smoking than heroin, for whatever reason.

Hypnosis has a pretty good track record for analgesia, but it’s not used widely.

• anon85 says:

Hypnotherapists exist, but they’re not universal, and I don’t know if they’re more effective than things like acupuncture. I’d like a source on the “good track record” – what was the control?

• Ian D Osmond says:

My late grandfather, who was among the early people to use hypnosis as part of dental analgesia, used to say that all hypnosis is self-hypnosis. You can help people relax into a mental state in which pain is less distressing and more ignorable. It’s well-documented that mental state and stress level has a huge effect on self-reported pain levels — from this we can safely infer that, whether or not actual pain messages from nerves are doing anything differently (and I assume SOMEONE’S studied that, but I don’t know what they found), the amount of DISTRESS people feel from stuff differs.

My grandfather was consistently able to reduce the amount of physical analgesia he needed to use to eliminate distress. That may be a very narrow definition of “hypnosis”, but it IS one definition that is used, and is clearly true.

He also taught many of his grandkids how to do this, and I’ve had success with it. I have certain close friends who I can do lots of stuff with — paralyze limbs, forget parts of the alphabet, even occasionally induce mild hallucinations. This is because they WANT to participate in the game. They report a sense of “I can’t do the thing, but if I wanted to I could choose to stop being unable to do the thing, but I don’t want to, and I don’t want to want to.”

They say it’s a lot like that bit where you’re mostly asleep but not completely asleep and you can’t really open your eyes or think clearly, but you’re pretty sure that, if there was a loud noise, you would be able to wake up quickly.

My experience suggests that hypnosis, for some reasonable definition of “hypnosis”, exists. But COMIC BOOK hypnosis does not.

• Ilya Shpitser says:

Dear hypnosis skeptics, please understand that we live in a world in which our nervous system is super complex and hooked into everything, placebo is real, etc. It’s not even in the same league as homeopathy, there is a plausible mechanism.

• jimmy says:

It’s also worth noting that a large number of *practicing hypnotists* say things like “there’s no such thing as hypnosis” and still find hypnosis very worthwhile. They also have very good answers to questions of the type “if hypnosis is real/useful/whatever, then how come ____?”, if you bother to ask.

• Patrick says:

Scott Adams has been saying crackpottish things for years. For one small example, see this post about evolution (http://dilbertblog.typepad.com/the_dilbert_blog/2007/03/fossils_are_bul.html) which shows that at best, Scott Adams has a very inaccurate view of evolution and the scientific evidence for it and puts a lot of confidence in his own accuracy. The fact that he could make such big mistakes on issues where the information is so widely available kind of reduces my willingness to listen to his more extreme claims.

• Bill Friedman says:

The really interesting about his predictions was that when Rubio started to accelerate, he said “it looks like Rubio has hired a Master Persuader to write speeches for him. He may be able to challenge Trump now.”

In other words, I expect that, if at some point Trump loses (or even wins by not as much as he predicted), he’s going to claim “well, that was only because whoever beat him tracked down a first-rate Master Persuader, so my hypotheses that Master Persuaders always win is still true.”

• Jeremy says:

Exactly. There are always ways to spin these things enough to convince enough people (usually the people who want to be convinced).

• ShemTealeaf says:

I agree, and if Trump actually does win 65% of the popular vote in the general election, I will start taking Scott Adams very seriously.

However, he’s indicated that he thinks there’s a 98% chance that Trump wins the presidency (http://blog.dilbert.com/post/127791494211/nate-silver-gives-trump-2-chance-of-getting), and at least a 50% chance that he wins 65%+ of the popular vote. If Trump somehow fails to become president (or god forbid, fails to win the nomination), it’s our job to ensure that there is at least some reputation cost for making very confident and outlandish predictions.

22. E. Harding says:

ISIS expanded in territory in Libya, Afghanistan, Yemen, and Syria. It contracted in Iraq only. On net, I don’t know how much that is, but I’m not at all confident it’s a net loss of territory. And I have typed up 110 predictions for the current year:

https://againstjebelallawz.wordpress.com/2016/01/01/110-predictions-for-december-31-2016/

• Søren E says:

Heretic!

“One who follows Bayesianity will never assign a probability of 1.0 to anything” – Eliezer Yudkowsky, in “A Technical Explanation of Technical Explanation”
/s

• E. Harding says:

I only used intervals of ten percentage points to allow me to make the graph seen in this post. When the probability I estimated was above 95%, I rounded up.

• Søren E says:

🙂

How about reformulating e.g. p(statement7) = 0.2 into p(~statement7)=0.8, like Scott? This will make your calibration graph easier to judge.

(Brave of you to go public with your calibration, by the way.)

• Anon says:

For probabilities, it makes more sense to use a log scale: .9 is as different from .95 as .5 is from .75.

Usually I go with something like .5, .75, .875, .95, .97, .99. Not a perfect log scale, but better than .1-intervals.

• E. Harding says:

Maybe, but a lot of my estimated probabilities were under .75.

• Douglas Knight says:

Discriminating between .6 and .7 is not very important, but it is a lot easier to test than discriminating between .95 and .99 because you need a lot fewer questions to test. Also, if you are grading accuracy, you probably want to weight extreme events more, but in calibration you don’t do any weighting.

• John Schilling says:

ISIS expanded in territory in Libya, Afghanistan, Yemen, and Syria. It contracted in Iraq only.

I assume the second “IS” was a typo or thinko there, but regardless the Islamic State lost more ground in Iraq than it gained in Syria. And while it has greatly expanded the scope of its operations in Libya, Afghanistan, Yemen, and elsewhere, I’m pretty sure its actual territorial control in those places is still negligible. There’s almost no place outside of Iraq and Syria that the IS flag won’t be torn down in shot order, where the Islamic state can set up a permanent recruiting station or courthouse that won’t be shut down by the real authority in those parts (which may just be a better-established local insurgency, e.g. the Taliban). No place where the strongest non-IS military force has to hide from Islamic State patrols, rather than vice versa.

• E. Harding says:

“I assume the second “IS” was a typo or thinko there, but regardless the Islamic State lost more ground in Iraq than it gained in Syria.”

-Scott’s saying “ISIS”, so I was using it here. And I still doubt IS lost territory overall.

Also, the IS territory in Libya is certainly not negligible. Its territory in Afghanistan is certainly more so, and its territory in Yemen is currently insignificant.

https://en.wikipedia.org/wiki/Template:Libyan_Civil_War_detailed_map

• Vorkon says:

This also fails to take into account the various examples of better-established local insurgencies, such as Boko Haram and at least parts of the Taliban, which have, at least nominally pledged fealty to IS. Admittedly, this is only nominal, and IS has no actual control over any of the territories those groups occupy, but when your entire brand is “we are building a Caliphate,” and other groups say, “yeah, we’re part of your Caliphate,” then even though they’d drop you in a second if you ever tried to issue them any actual orders, I’m not sure if you shouldn’t count them under “areas you control.”

Also, I’d argue that while you may be right about courthouses and patrols, there’s a lot of places where IS could maintain a recruiting station despite not properly controlling the territory. That’s kind of beside the point, but still worth pointing out, I think.

Either way, even if you don’t consider those groups as part of IS’ territory (and I’ll admit, that point is a bit of a definitional stretch) I still agree with E. Harding, mostly due to the gains in Libya and the fact that in such a chaotic environment it’s hard to agree on who actually controls what, that I don’t think it’s at all clear enough at the moment to count that prediction as correct. At best it’s inconclusive.

23. Dan T. says:

You interestingly seem to have a greater accuracy rate in predicting the world in general than in predicting things in your own life, despite presumably having more direct knowledge of the latter.

You could have probably made #23 come true by simply posting some really provocative thing about some social-justice issue that brings on the Internet shame mobs, but you’ve stayed away from that stuff lately, it seems.

It kind of feels to me like that whole thing has deescalated. The SJ people seem to have gotten a fair amount of their goals met, so they’re settling down for the moment.

• E. Harding says:

Perhaps, they’ve just gotten bored and their internal schisms have led to a weakening of the energy of the movement. Their purges are mostly over, both external and internal, but world revolution is too far away.

• Technically Not Anonymous says:

What goals are you thinking of? I can think of a few.

Better media representation for women and minorities seems like one of SJs least surprising successes. It gets producers brownie points (read: money) from pro-SJ people, more publicity from left-leaning media outlets, and probably increases the products’ appeal to those underrepresented demographics. It rustles some nerd purist and reactionary jimmies, but nerd purists and reactionaries are far outnumbered by people who like the increased representation or don’t care one way or the other about it. Basically, the SJWs succeeded here because the goal is 100% capitalism friendly.

Gay marriage is an obvious one. And I guess there’s the transgender acceptance thing, but there’s still a lot of progress to be made there. Both of those are quite a bit more surprising than the above IMO.

The things you mention, plus what happened with Gamergate.

• Pete says:

What happened with Gamergate? I stopped following it some time ago.

Anyway, I’m against SJWs, yet I think better representation of minorities and women is clearly a good thing, as is the legalization of gay marriage.

As for trans acceptance, I’m all for that too, but I’m not sure there’s been enough of a shift in public opinion for this to be labelled a victory.

• TheNybbler says:

Well, Gawker is in serious decline, Leigh “Gamers are Dead” Alexander is relegated to obscurity, Anita Sarkeesian was pretty well snubbed at E3 (she started complaining about violence in games, which is a third rail topic), alternative Gamergate-aligned gaming blogs have been formed and some appear to be thriving, and r/KotakuInAction remains active on Reddit. Also Ian Miles Cheong, who once compared Gamergate to the Nazis, switched sides.

• Jiro says:

On the other hand we did have DOA Xtreme 3 not brought over because SJWs would complain about it.

On the third hand, Brianna Wu seems to have been discredited by the SJWs, although this doesn’t seem to have rubbed off into boosting the reputation of Gamergate.

I still see this as “SJWs haven’t gone further, but they’ve kept most of their existing wins”. I’ll believe it’s all over when the Gamergate Wikipedia article is accurate.

• TrivialGravitas says:

There hasn’t actually been any increase in women and minorities in the media in 2015 though (I mean, in comparison to the last few years, compare it to 1990 and I’m sure that would be different). They just wrote a bunch of articles declaring that there was an increase but when you run numbers nothing has changed.

In fact I think that’s probably true for everything except gay marriage, which had nothing to do with them whatsoever and more to do what 9 people wearing robes voted on.

• Vorkon says:

My gut impression is that it has more to do with the fact that Social Justice criticism has gone more “mainstream” lately, with all of the various campus safe space debacles that have been springing up recently, than it does with having acheived their goals, or because of anything Gamergate has done.

I don’t have any concrete evidence, of course, but the impression I get is that these incidents have given the Social Justice community a bit of a black eye, and they’re actively trying not to overplay their hand until the situation has blown over a bit. And even then, those who aren’t trying to do so probably have bigger fish to fry at the moment than a blog which at least attempts to treat their ideas charitably and take time to address their points, even if it doesn’t always do so as well as they’d like.

24. 1. The Brier Score is a simple and solid metric for scoring predictions of this type, if Scott or anyone else wants to rank this year or compare years.
2. For anyone who enjoys this type of forecasting, you’d likely enjoy this recent Econtalk episode featuring Philip Tetlock, an enconomist who ended up designing forecasting frameworks for the CIA, as well as his recent book Superforecasting.

25. Greg says:

Your world events predictions at and above 70% all came true. That’s not so likely, given 15 of them and the probabilities you assigned. Though I’m also pointing this out post-hoc, so who knows. But maybe you should take more risks in that category this year? More qualitatively, the world events predictions mostly have a “status quo” feel to them, I think. And I have to imagine there’s a strong prior on the status quo not changing in a given year, at least for things like laws mostly staying the same or wars not beginning or ending. I’d find predictions about changes more interesting? Another thing that might be fun would be brief, high-level explanations for the predictions, “predicted because nothing changes that much,” or “predicted due to regression to the mean.”

• Scott Alexander says:

I think the way I’m doing this kind of enforces status-quo predictions.

Consider something that’s *not* status quo, like, I don’t know, a revolution in China. That would be a bold prediction. But I certainly don’t think there’s a > 50% chance of a revolution in China just this year in particular. So even if I thought there was a 30% chance of a Chinese revolution in 2015 – a very bold prediction – it would look like “No revolution in China – 70%”

• Steve Sailer says:

A lot of things stay pretty much the same from year to year.

• Steve Sailer says:

The thing about Trump is that he didn’t have the issue that would set him apart from the field and electrify the public — immigration — until the spring of 2015 when he saw Ann Coulter on TV talking about her upcoming book on immigration and Trump asked her for an advanced copy.

• Greg says:

Yes… Very good point. If you do foresee big changes with > 50% probability, those would be very interesting predictions. But it makes sense those would be rarer.

So, we could just as well read the results of the world events section as you predicting the opposite of each of these assertions – many of them big changes – with low probability and getting most of them “wrong”… Why does that feel like a different outcome? 🙂

Also, have you looked into putting money behind your predictions by making markets around these? I.e., offer buy/sell contracts for each prediction. That could somewhat gauge how much “the market” finds your predictions surprising. You’d also convey a bit more information about your beliefs in the bid-offer spread.

26. Wrong Species says:

I didn’t make as many predictions as Scott but I still felt like it’s useful for me to look back at them. Regarding the republican primaries, my odds were 70%(later 60%) to Rubio winning while everyone else was convinced it would be Bush. So on the one hand, that makes me look good because Rubio is clearly the establishment candidate at this point and has pretty good odds. However, I didn’t predict the staying power of Trump and I also didn’t think Cruz would do as well as he has so that has reduced my confidence a little at this point. Right now, I’m at 50% Rubio.

My economic predictions for this year were:

“The standard unemployment rate will be below 5% by January 2016. 80%”

“U6* unemployment will be under 10% by this time next year. 75%”(I made this on February 10)

Unfortunately, I wasn’t clear enough on the first point because that could mean either this report coming up next week(which comes out in January but refers to December) or next months report(which refers to January unemployment). Of course, next week could go below 5% which would make me right either way. I was a little overconfident in the labor market though and thought 4.9% was a very conservative estimate. It’s cutting it a little too close to be 80% confident in something that is this close to not coming true. But on the other hand, the fed has consistently underpredicted the labor market and that’s with all of their sophisticated models.

27. Simon says:

Does recent crap in Turkey make World Events #12 a bit iffy? Not exactly a civil war OR exactly Middle Eastern, but…

28. Stanley Winford says:

When are the predictions for 2016 coming out? I’m on the edge of my seat.

29. James says:

Scott, if you didn’t lose any of the (multiple) girlfriends you had at the start of the year, and you gained at least one new one, how many do you now have?

• Le Maistre Chat says:

This sounds like a job for Count von Count.

30. jjbees says:

If Scott has multiple girlfriends I’m going to stop reading this blog out of jealousy within the next week. Prediction confidence = 60%

If they are attractive, confidence drops to 20%, because maybe I can pick up some tips on getting multiple attractive girlfriends.

My friend you need to visit the San Francisco Bay Area. The rationalists are pretty, poly, and plentiful.

Here’s some predictions. I screwed up my political calls last year really really badly. So lets

world/US events
* The Democratic nominee for president will be Clinton: 90
* The Republican nominee will not be Trump: 80
* The Republican nominee will win the presidency: 50
* All third party candidates other than Trump get less than 10% together: 95
* Trump will not run as a third party candidate: 70
* The house majority, senate majority, and presidency will not all be held by the same party: 70
* No supreme court justice will be appointed in 2016: 80
* No security council permanent member will suffer more than 1000 military casualties: 95
* Total US GDP growth in the first 3 quarters of 2016 will be higher than for 2015: 70
* No quarters of negative US GDP growth: 90
* No terrorist attack in the west kills more than 1000: 95

UK politics
* Labour will lose vote share in local elections: 95
* Corbyn will remain in office until the end of the year: 80
* The EU referendum will take place: 95
* If it does I will vote LEAVE: 95
* If it does the winner will be REMAIN: 60
* Labour will win London Mayoralty: 70
* No general election: 95
* Cameron will remain PM: 90

personal
* I will still be with my girlfriend at the end of the year: 70
* I will get involved in at least one lib dem event in November or December: 80
* ditto event organised by my friend Angie: 90
* I will go to at least one CUS event: 60
* ditto SitP event: 60
* At no point in the year will I have more than £2k on my credit card: 70
* At some point in the year I will have more than £500 on it for a month: 70
* I’ll be employed at some point in November or December: 80

• Rob K says:

The combination of your presidential prediction and your no unified control prediction combine to imply at least a 40% probability that, given a Republican win in the presidential contest, the Republicans lose control of at least one of the house and senate (higher than 40% if you’re assigning any non-negligible probability to unified Democratic control). That’s a surprisingly high number!

• Alex says:

> “The Republican nominee will win the presidency: 50”

Err… isn’t that the equivalent of “The Democratic nominee will win the presidency: 50”? That seems like a pretty milquetoast prediction….

> “No quarters of negative US GDP growth: 90”

Really? I’m practically as confident in the opposite result: that there will be at least one quarter of negative US GDP growth (90% confidence) with probably 75% confident in two or more quarters of GDP decrease.

32. njnnja says:

I really like the fact that you make predictions that are not necessarily likely to happen, and then assign a relatively low probability to them (50-60%). However, in lieu of the plot that you made, the best way to evaluate predictive powers that assign probabilities to events is to use a Receiver Operating Characteristic Curve. And on that scale your predictions look even better, with an area under the curve of almost 90%, which is pretty good even if you were doing a formal experiment.

33. Matt C. Wilson says:

On the topic of grading one’s predictions, Bob X. Cringely has been doing this annually for trends in the tech sector for some time. He historically has around 70% accuracy. His 2015 review scorecard and 1st 2016 prediction just went up yesterday: http://www.cringely.com/2016/01/05/2016-prediction-1-beginning-of-the-end-for-engineering-workstations/

34. eponymous says:

Scott, suggestion (sorry, this is really late):

When you post your predictions for this year, first list all the items you want to forecast, and then give your estimates under the fold. That way, your readers can offer their own guesses (without being influenced by yours, unless they choose to look), and can post them in the comments. We’ll see how we do!