Getting Eulered

Posted on August 10, 2014 by Scott Alexander

There is an apocryphal story about the visit of the great atheist philosopher Diderot to the Russian court.

Diderot was quite the clever debater, and soon this scandalous new atheism thing was the talk of St. Petersburg. This offended reigning monarch Catherine the Great, who was a good Christian woman –

(except for the affair with the horse)

(if you’re not familiar with this story, “affair with the horse” should be taken in the most literal possible way)

(the affair with the horse is totally legendary, but so is the rest of this story, so adding it in doesn’t make things any worse)

– anyway, it offended Catherine the Great, so she asked legendary mathematician Leonhard Euler to publicly debunk and humiliate Diderot. Euler said, in a tone of absolute conviction: “Monsieur, (a+b^n)/n = x, therefore, God exists! What is your response to that?” and Diderot, “for whom algebra was like Chinese”, had no response. Thus was he publicly humiliated, all the Russian Christians got an excuse to believe what they had wanted to believe anyway, and Diderot left in a huff.

This story is very likely false, but it’s something I think about a lot.

I feel like I am a lot better at the sorts of things Diderot was good at – philosophy, history, social science, et cetera – than at math. Sometimes, I will have a belief that seems pretty well-founded based on the sort of arguments Diderot would have been able to come up with – and someone will spout a bunch of very complicated math at me and tells me it disproves my belief. Sometimes if I concentrate hard enough I can understand the math well enough to see if they’re right, but this is very difficult. Otherwise, short of going back to school for ten years and getting a math Ph. D, I’m pretty much stumped.

So I think about Diderot a lot because I want to know what to do in this sort of situation.

The easy out is always to dismiss the math as sophistry – say “Mathematicians like Euler can sound very technical and impressive. But I’m pretty sure of this argument, and math has little to say about this non-mathematical field. So I’m not going to let myself get Eulered.”

But math is systematized rigor, and sometimes you genuinely need a lot of rigor to find flaws in arguments. I’m not totally hopeless at math, and I think of all the things that even my limited amount of mathematical ability allows me to understand that mathless people wouldn’t get. One of my go-to examples here is health insurance and the idea that it is wrong to ever deny someone coverage for anything, no matter how low the probability it will help and how expensive the intervention. I try to make an argument against that here, but it requires a little math and it would have been very hard to express clearly without it. I worry there are people who think I’m just trying to Euler them, and who are going to continue believing what they want about “death panels” because no fancy numbers can change their minds. And I also worry that if I dismiss mathematical arguments above my level of comprehension, I’m doing the same thing.

This is an obvious trade-off. Permament easily-fixable lack of rigor, versus letting anyone with a BA in Mathematics push you around.

There’s a bit of variation depending on the mathematical field. The latest people to make a serious academic attempt to prove the existence of God with math, Tim and Lisa McGrew, used Bayesian probability, a field of math which thanks to the excellent explanations on yudkowsky.net I at least know a little bit about. As a result I was able to avoid getting Eulered and write what I think was a pretty devastating rebuttal.

But there are other mathematical arguments I find much less tractable. And by far the most dangerous is statistics.

With apologies to Rutherford, all science is statistics or stamp-collecting. It is very well known that entire fields of science are permanently messed up because their statistics aren’t good enough. I have tried very hard to outperform most of my famously statistically illiterate profession, but there will always be people far better than I. There will always be people making extremely detailed and Byzantine methodological critiques. And those people will always be able to present me with arguments that are either ultra-important debunkings of things I believed religiously, or else shameless attempts to Euler me. And I will always have trouble figuring out which ones are which.

I am especially reminded here of Fisher’s work on smoking and lung cancer. Fisher was (according to Wikipedia) “a genius who almost single-handedly created the foundations for modern statistical science”, and launched a bunch of very sophisticated critiques against the idea that smoking caused cancer. His basic argument was that proving causation was very very hard (which it is) and that none of the appropriate statistical work had been displayed in cancer research. For example:

He uses a device to magnify the difference and its importance in the case-control studies by transforming the percentages into observed versus expected figures (using a chi-square analysis). He then suggested that, if the cases had inhaled, 45 lives could have been saved.

Intriguingly, his work on the subject became the foundation of the modern truism that “correlation does not imply causation”. Also intriguingly, he was taking money from the tobacco industry to serve as their “consultant” while he was doing it.

It is easy to imagine being a biologist back then, thinking you had lots of good studies showing a tobacco/lung cancer link, then getting pummeled by this statistical genius and backing off from your original claim.

And it’s easy to imagine another statistical genius arguing against him, and they’re both throwing out a lot of formulae and equations and the whole thing is super confusing.

And if, like me, you can only remember what a “chi square analysis” is on a good day, and you have enough trouble remembering the difference between case control studies and cohort studies, you’re probably not going to be able to follow the entire debate and pick apart exactly where one of them goes wrong.

But you know, on a good day I remember my chi-square analyses, I get the difference between case control and cohort studies right, and then I can read an R. A. Fisher paper and think “No, smoking is still bad.”

What about Glymour on IQ?

There’s a consensus among researchers in the field that IQ is useful and means what people think it means. And there’s a lot of research backing that up, same way as there’s a lot of research backing up the link between smoking and cancer.

On the other hand, Glymour seems to be well respected and intelligent, and he says things like:

Factor models assume that observed variables that do not influence one another are independent conditional on all of their common causes, an assumption that is a special case of what Terry Speed has called the Markov condition for directed graphical models. The rank constraints – of which vanishing tetrads are a special case – used in factor analysis are implied by conditional independencies in factor models, conditional independencies guaranteed by the topological structure of the graph of the model, no matter what values the linear coefficients or factor loadings may have. To exclude more latent variables when fewer will do, Spearman needed only to assume that the vanishing tetrads do not depend on the constraints on the numerical values of the linear coefficients of factor loadings, but are implied by the underlying causal structure. It is known that the set of values of linear parameters (coefficients and variances) that generate probability distributions unfaithful to a directed graph is measure zero in the natural measure on parameter space.

Even on my absolute best day, if I swallowed like an entire jar of modafinil, and then another jar of piracetam, and I looked up every one of those words in a dictionary, and took two or three hours to puzzle it out, I’m pretty sure I couldn’t bring myself to generate an understanding of that paragraph and sustain it for more than thirty seconds.

But there are thirty pages of that kind of thing, and then at the end it says “therefore, you should disbelieve in IQ and probably also all other research in the social sciences.”

Also, it mentions how research on IQ must be rejected because it might encourage the Republicans, whose plans will lead to a nation where “Ku Klux Klan schools, Aryan Nation schools, the Nation of Northern Idaho schools, Farrakhan schools, Pure Creation schools, Scientiology schools, and a thousand more schools of ignorance, separation, and hatred bloom like some evil garden, subsidized by taxes.” So clearly there’s some political motivation at work as well.

(Also: Anissimov! Did you realize you could get your Northern Idaho secessionist schools to be tax-subsidized? You should totally look into that!)

So I have to ask – am I being informed of deep methodological truths that are being neglected? Or am I being Eulered?

I don’t have a good way of answering this. The way I try to deal with it in practice is seeing if I can route around the objection.

Like it’s clear that Diderot’s best option wasn’t to try to argue that (a+b^n)/n didn’t equal x. Far better for him would have been to ask why, if (a+b^n)/n = x, this necessarily proved God. Even if Diderot wasn’t smart enough to understand the precise algebra involved, he might have been able to at least get the impression that what it was doing was defining X in terms of other quantities. So he might have been able to ask “Why does a certain definition of the meaningless quantity X disprove God?” even if, for example, he didn’t know what exponentiation was and couldn’t parse “b^n”.

My reaction to the Glymour paper was to try to figure out what it was trying to prove with all its statistics. My conclusion was that it was trying to prove that doing correlations adjusted for confounders didn’t always remove all the confounders.

I don’t have the mathematical ability to know whether Glymour’s argument is correct, but luckily I already don’t believe adjusting for confounders does a good job of removing confounders:

I will come out and say it: I do not trust the practice of “adjusting for confounders”, at least not the way this study does it. You are adjusting for an imperfect measurement of the confounders you can think of. If you find that there is lingering correlation, then either your hypothesis is true, or you didn’t adjust for confounders well enough.

So I tried to route my argument around Glymour’s objection. I said that even assuming Glymour had discovered something terrible and shameful about the way correlations and regressions were done in the social sciences, this doesn’t come close to debunking all research on IQ. My particular argument was:

The example I gave of good IQ research, which you said you’re not convinced is actually being done, is the connection between lead poisoning and poor life outcomes, mostly proven through IQ. Let me discuss what this research looks like and why it’s not just one guy running a correlation through SPSS without any awareness of possible confounders.

First of all, there’s a LOT of evidence that growing up in neighborhoods with high lead concentration is correlated with lower IQ as an adult. This is all regressed for the usual things like socioeconomic status. Fine. That seems vulnerable to exactly the problems you describe. For example, maybe rotting houses expose people to more lead, and poor people are more likely to live in rotting houses, and poor people’s kids go to poorly funded schools that don’t teach them test-taking skills, so their IQ looks low

Then they found a dose-dependent effect – ie the more lead you were exposed to, the worse the IQ drop was. Still pretty confoundable – if for some reason poor people used more lead (for example), poorer people might use even more lead (and have factors causing lower IQ test scores)

Then they found that when different states removed lead from gasoline, childhood outcomes rose in a very predictable pattern. There was a dramatic improvement a certain number of years after the lead was banned – for example, maybe California banned lead in 1960, and in 1965 outcomes started to rise dramatically; Oregon banned lead in 1965, and in 1970 outcomes started to rise dramatically; Washington banned lead in 1970, and in 1975 outcomes started to rise dramatically. Once again, this could be confounded. Maybe liberal states were more likely to ban lead first, and also more likely to increase school funding first.

Then they found that levels of lead in the air at time T was correlated suspiciously closely with crime at time t+1 – like if you line up the two graphs, every tiny little uptick and downtick match perfectly.

Then they found that lead exposure during pregnancy decreases the head circumference of infants, which seems a little less malleable by things like poor school funding than IQ is.

Then they found like thirty other things.

I admit every one of those pieces of evidence is a correlation. But even though the correlation between lead levels in a neighborhood and crime in that neighborhood could be confounded by unobserved factors, lead levels in an era and crime in that era could be confounded by unobserved factors, lead regulatory regimes and crime in the area covered by that regulatory regime could be confounded by unobserved factors, and lead exposure during pregnancy and head circumference could be confounded by unobserved factors – at some point you have to say that we’re starting to rack up a lot of coincidences, and maybe we should just admit the theory has a point.

And once you come up with some solid result, like the one with lead – then that becomes your basis for other results. Childhood lead poisoning causes brain damage thus lowering IQ? That lends credence to the idea that IQ is a useful measuring tool for some kind of brain health. Lead both decreases IQ and increases crime in a dose-dependent way? That lends credence to causal interpretations of the observation that IQ and crime are closely correlated.

Once you’ve gone through this process enough times and you find that all of your results kind of fit together, you have what’s starting to look like a pretty impressive scientific edifice. So I think the criticism that IQ research (and social science in general) is just based on drive-by correlations and regressions, then accepting whatever they say, is a big oversimplification.

Obviously this tactic would not have worked if the point I had wanted to defend was that the particular statistical practice of correlation and regression used in the social sciences was valid.

But the whole point of this Eulering issue is that I am not a statistician. I should not be in the business of trying to defend regression unless I know enough about it to do so coherently and intelligently.

The problem here only occurs when sophisticated math is used to attack nonmathematical ideas, like the existence of God, or lead causing increases in crime. And presumably these ideas should be complicated and diverse enough that hopefully no one mathematical argument knocks down the entire edifice. True things should usually reveal their truth through multiple different arguments, and it would be very odd if math could demolish all of them at the same time.

I admit this is not a very satisfying solution to worries about Eulering. I don’t think there will be any general solution, but rather a toolkit of different useful tricks, some of which I will try to go into further in the future.

This entry was posted in Uncategorized and tagged rationality. Bookmark the permalink.

124 Responses to Getting Eulered

Reverse order

kappa says:

August 10, 2014 at 11:05 pm

…Wait, modafinil makes your brain go more? I thought it was for making your tired go less.

(I would have phrased this comment more eloquently if it weren’t so far past my bedtime. My tired is going lots.)
- Anonymous says:
  
  August 10, 2014 at 11:17 pm
  
  most things that make tired go less also make brain go more, at least a bit. also when use brain hard tired goes more.
  - kappa says:
    
    August 11, 2014 at 9:41 am
    
    Huh.
    
    When I tried it for the purpose of making my tired go less, it didn’t make my brain go more at all, except very indirectly. It sure did make my tired go less, but only the physical tired. (Which made it easier to do things like eat and exercise, which in turn made brain go more.) (Then it stopped working at all after like two weeks. I was sad.)
    
    Perhaps I just had too much tired.
    - Nancy Lebovitz says:
      
      August 11, 2014 at 12:05 pm
      
      Possibly. I knew someone who found that modafinil made him sleep for 18 hours, so there’s definitely some individual variation.
    - ozymandias says:
      
      August 11, 2014 at 12:55 pm
      
      Modafinil makes my brain go more (I write SO MUCH when I’m on modafinil) and, weirdly, it treats dissociation.
    - Hainish says:
      
      August 11, 2014 at 8:12 pm
      
      For me, my tired going more is extremely tightly correlated with my brain going less.
Toby Bartels says:

August 10, 2014 at 11:47 pm

One option is to cultivate some mathy friends who can examine the math for you. In your case, post your conundrums to your blog. (You’ll probably want to make a specific request for mathematical analyses to avoid getting too many comments on the broader issue.)
- Qiaochu Yuan says:
  
  August 11, 2014 at 4:43 am
  
  Agreed; this seems like the first thing I would try in this situation. Certainly it would have immediately solved Diderot’s problem. Among LWish people I would recommend talking to Jacob Steinhardt in particular.
- Franz Panzer says:
  
  August 11, 2014 at 7:19 am
  
  “Mathy friends” is not necessarily enough. You’d need friends who do the right kind of math. I have MSc in maths, working on my Ph.D. but apart from 2 classes years ago I never had anything to do with statistics. Neither has any of my friends who are mathematicians. Maths is HUGE. Everyone needs to specialise in some field. And if you encounter a sophisticated argument from a completely different field you’re (nearly) as clueless as everyone.
  (Of course, if I want to understand that argument, I could learn and understand the necessary concepts and connections a lot faster than someone with no understanding of mathematics. But it could still take quite some time)
  
  All of which does not take away anything from your main point, of course.
  - Toby Bartels says:
    
    August 12, 2014 at 4:38 pm
    
    Yes, cultivate a variety of mathy friends. Preferably ones who understand the limits of their own expertise.
anon says:

August 11, 2014 at 12:15 am

Is it really necessary for you to go back to school to learn detailed math or statistics?
Douglas Knight says:

August 11, 2014 at 12:24 am

My conclusion was that it was trying to prove that doing correlations adjusted for confounders didn’t always remove all the confounders.

I have not read Glymour, but I have read Shalizi. They are objecting to IQ long before it gets to the point of computing correlations between IQ and anything else. This is an objection to IQ as a measure of test-taking ability, even as a measure of IQ-test-taking ability. Shalizi objects that IQ does not have a reductionist account. There must be a better model and thus we should not use IQ. Specifically, Spearman proposed a causal model to justify his factor analysis and the tetrad condition is a method of model checking. Spearman’s data passed this test, but later data does not. So the IQ tests are not a perfect measure of a single axis, but mix together several abilities.
- Vilhelm S says:
  
  August 11, 2014 at 9:47 am
  
  Glymour mentions this critique, but (at the end of section 1) concludes that it doesn’t really matter: if we can manipulate things to increase intelligence, and that in turn causes improvements in economic wellbeing etc, then it doesn’t really matter if we are manipulating a single factor or several.
  
  His main argument is in section two, where he says it is not sufficiently established that there is a causal connection between intelligence and wellbeing.
  - Douglas Knight says:
    
    August 11, 2014 at 11:22 am
    
    So Scott correctly determines what Glymour’s argument is, but then quotes math from before he says “ha ha just kidding”?
    - Vilhelm S says:
      
      August 11, 2014 at 1:15 pm
      
      Yeah. (But I got the section numbers wrong, the “just kidding” paragraph is the last one in section 3, the causal connection section is section 4).
- Nancy Lebovitz says:
  
  August 11, 2014 at 12:10 pm
  
  I’m dubious about g because my verbal abilities are much better than my math abilities.
  - ozymandias says:
    
    August 11, 2014 at 12:57 pm
    
    Remember that g just means that they’re correlated. No one’s saying that there aren’t verbal-ability-specific factors in addition to g. It seems anecdotally not uncommon for people to have really high scores on some subsections and lower scores on others. (I have one friend who is in the 99th percentile on one section and the second percentile on another, but she is probably IQ Georg.)
  - Sam Rosen says:
    
    August 11, 2014 at 1:04 pm
    
    I’m the same, but these abilities do tend to correlate.
    
    I would bet extravagant amounts of money that having strong triceps and having strong calf muscles correlate. In principle, one could have extremely strong triceps and weak calves or vice-versa, but usually people don’t.
    
    Suppose we had a concept like “SQ” (strength quotient) that was measured by various fitness activities, e.g., how far can you carry a 100-pound cube. People with a higher SQ would, on average, do better in different athletic events. There is no muscle that corresponds to strength. It’s not a thing you can point to. It’s the aggregate of hundreds of different muscles that, in principle, could be completely independent, yet, for many reasons, these individual muscle strengths do correlate.
    - Gilbert says:
      
      August 11, 2014 at 5:21 pm
      
      That SQ metaphor is great, so I’ll just milk it for some analogies for IQ-thinking going wrong:
      
      – Probably there would be good evidence of leg exercises increasing SQ (because leg muscles enter into the index). There would also be a negative correlation between SQ and back pain (because sometimes weak back muscles are at fault for back pain). Someone who thought SQ was a real thing might conclude leg exercises will help back pain.
      
      – In a society depending on manual labor, strength is a high-status trait. You could easily imagine entire clubs of people with high measured SQs. Because of selection effects, these clubs would mainly consist of people who wouldn’t have great actual physical labor results to boast of. Still, a lot of people would think high-SQ failures must have had a great potential for physical labor that somehow got thwarted. In reality though, they would be mostly people with single very strong muscle useless individually, i.e. people for whom the correlation giving rise to the status wouldn’t hold in the first place. Of course that kind of people would be most vocal about the s-factor being a real thing proven by science(TM).
      
      – There would be studies showing that pushups increase SQ, but only to a point (because short of steroid arm muscles will not become that much stronger than all other muscles in the same body.) Also there would be studies showing that forcing pre-schoolers to do pushups increases SQ in the short term, but the effect is undectable by the time they reach college. (I certainly used to be a lot fitter as a conscript than I am now.) Uncareful people would take this as proof that the malleability of strength is basically negligible.
      
      – There would be basically no way to tell the difference between people who are weak because of physical disabilities and people who are weak because they don’t like sports.
      
      Of course people could still sometimes make useful judgments with SQ where no better data was available. That would basically be a way to milk the correlations going into it. In other words, for some applications s-factors would be an approximation better than nothing.
      
      Still, thinking of them as real would be a mistake that would make false things look like proven ones.
    - LRS says:
      
      August 11, 2014 at 7:56 pm
      
      Sam and Gilbert, the SQ metaphor was illuminating for me. Thanks for helping develop my understanding of g and its implications.
    - AR+ says:
      
      August 11, 2014 at 11:24 pm
      
      Someone who thought SQ was a real thing might conclude leg exercises will help back pain.
      
      [Milking intensifies]
      
      And if such a person were to test this empirically by studying the effects of doing squats on back pain, they might indeed find that it helps, as well as improving fitness outcomes not previously known to be associate with the legs. (Because squats are a free-weight compound-lift that strengthens the back and other core muscles.) Subsequently, some people with back pain would hear “study finds link between leg exercises and reduced back pain,” and start doing leg extensions.
  - Joseph Gnehm says:
    
    August 11, 2014 at 1:15 pm
    
    It seems like there’s a substantial genetic component to the math-reading ability correlation–at least from what I can see in this recent paper:
    
    http://www.nature.com/ncomms/2014/140708/ncomms5204/full/ncomms5204.html
    
    “Here we show, using twin and genome-wide analysis, that there is a substantial genetic component to children’s ability in reading and mathematics, and estimate that around one half of the observed correlation in these traits is due to shared genetic effects (so-called Generalist Genes).”
Watercressed says:

August 11, 2014 at 12:52 am

One of my go-to examples here is health insurance and the idea that it is wrong to ever deny someone coverage for anything, no matter how low the probability it will help and how expensive the intervention. I try to make an argument against that here, but it requires a little math and it would have been very hard to express clearly without it.

Does “If you never deny coverage, you will run out of money and more people will be denied care” only sound clear because I’ve already read the post?
- Adam Long says:
  
  August 12, 2014 at 12:04 am
  
  Unfortunately, I think the short answer to your question is “yes” BECAUSE, in my experience at least, the argument that “you will run out of money” is something that many people simply cannot accept. A typical response, again just in my random experience talking with people at dinner parties, is that they will acknowledge that there are not infinite resources BUT that “if we have money for all these fat cats to live in mansions, and I saw on the news that some tycoon in china bought a 10 million dollar bed for his dog” then SURELY we have more than enough money for health care “if only people would stop being so selfish.” In other words they believe that, or, more accurately in my view TALK AS IF they believe that the practical problem is that WE are WASTING resources on nonsense, and thus don’t have enough money to pay for “needed” medical care.
Doug Clow says:

August 11, 2014 at 2:06 am

It seems that your approach to ‘avoiding being Eulered’ is, essentially, to learn just-enough about the maths, which is extremely admirable, and one I’d encourage many people to pursue.

It’s not the solution taken by most smart people, though. The smart strategy is not to hone your own expertise, but to hone your ability to assess the expertise of others, and deploy them to your purposes. This requires social smarts rather than more academic ones. And, tragically, more academic/book-smart/geeky experts tend to massively under-rate the importance of social intelligence and capability, and thus don’t practice it and don’t get good at it. So they’re typically easy for highly socially-smart people like Catherine the Great to use – whether for good, or as pawns for their amusement. Pursuing social intelligence is the more general route to success.

(Oh, and totally agree on science being statistics or stamp-collecting!)
- Ialdabaoth says:
  
  August 11, 2014 at 11:38 am
  
  One of the problems of pursuing social intelligence is that people with high social intelligence will be able to convince you that their method works, whether or not it does. How do you vet people for teaching a skill that includes how to disrupt / co-opt vetting?
  - Nancy Lebovitz says:
    
    August 11, 2014 at 12:24 pm
    
    Read what they say and give yourself time to think about it?
  - Zathille says:
    
    August 11, 2014 at 12:34 pm
    
    It’s probably useful to have an idea of the social context in which the person who gave you the advice belongs to, much of this stuff can be very context-dependent, though there may be general rules that apply with relative generality.
    
    As for my advice: Take what people say with a grain of salt, particularly strangers, especially anonymous ones. Doubly so if it’s over the internet.
  - Anonymous says:
    
    August 11, 2014 at 1:56 pm
    
    …how separate is their method for learning/teaching from their method for doing? This sounds like it could be the kind of problem that solves itself.
- ADifferentAnonymous says:
  
  August 11, 2014 at 2:42 pm
  
  ‘Honing your ability to assess the expertise of others’ sounds fraught with its own perils. In particular, it’s not obvious how to know if you’re getting it right. Do you have something in mind for that?
  
  Otherwise it seems like you’d end up just dismissing conclusions you don’t like.
Ilya Shpitser says:

August 11, 2014 at 2:45 am

Hi Scott,

I feel like I should reply to this. I am with Pearl here: you should not have to be a statistician to judge things on their merits in this case. Causal and statistical issues are separate, and we are talking about causal issues here. I think it’s possible to develop sound causal intuitions about things and not know a single thing about regression models, for example. Let me see if I can help to translate (or at least give my interpretation to) what Clark is trying to say here.

“The Bell Curve” is actually doing two major problems in causal inference: figuring out causal effects, and “causal discovery.” Clark criticizes it on both of these.

The first part of the paper is about what is known as “causal discovery”: we want to learn the causal structure of something. For example, we have gene expressions, and we want to learn what gene turns on or off what other gene. Or, we have time series data, but we didn’t record temporal order, and want to recover it. These come up all the time, because we often want to know where causes and effects are, but rarely have time and money to set up proper experiments. Instead, we have observational data.

Learning causal structure from observational data is very difficult. This is a very well known, very studied problem. There is a huge literature on it, many papers and books. One of the books is by Clark, Peter Spirtes, and Richard Scheines. There are lots of well known hard limits on this problem. For example, many quite different causal structures can generate the same observational data. So we can’t differentiate from them based on observational data alone. We need more assumptions, or we need to do experiments.

In the case of psychometrics, we have lots of variables we observe (how people do on tasks thought to require smarts of various types), and we want to see what hidden causes affect these things we observe. This is a causal discovery problem, a thing we know a lot about. What Clark says is that folks who like “g” used a particular method to solve this problem. The way in which they used this method got them a solution where a single unobserved cause (call it “g”) is the causal structure responsible for the data. Clark says that they are wrong. Specifically, the method that they use in fact does not rule out other causal structures which do not have a single “g”. So, if you want to postulate “g”, you cannot use this method that they use as evidence, or at least the kind of evidence people think it is. (Maybe there is weaker Bayesian evidence here, but then you have to set up a Bayesian story, and that is not so simple here).

The second part of the paper is about learning causal effects. That is, if we were to give a sailor on a Royal navy ship some lemon juice every day, how likely is that sailor to develop scurvy vs a sailor who got diluted sulfuric acid, or some other crazy thing.

If we can set it up so we give people treatments directly, life is great because we can compare outcomes directly, as was done with scurvy treatments. Usually, we can’t set it up like that, and instead our data will be severely confounded. People have thought about confounding for a long long time, and have lots of clever ways around it. But, all these ways assume you _know what the causal structure is_ that generated the data. That, we must be able to say “we observe X, Y and Z, and we know there is probably an unobserved W causing X and Z” and so on. If we know there is confounding somewhere, but can’t specifically write down where it is, we are stuck. We can propose a particular way to adjust for confounding, but we can’t be sure it’s right. You are well aware of this, of course.

Clark says that when the authors of the Bell Curve propose to use a regression model to talk about causal effects, they are implicitly proposing a particular method of adjusting for confounders without really knowing what the underlying causal structure is. So, we should not believe their effect numbers.

—

I tried to phrase the objections here in a language that avoids using words like “correlation coefficient” and “chi-squared test.” I think these are unnecessary to see what the problem is.

—

“at some point you have to say that we’re starting to rack up a lot of coincidences”

I think it’s certainly possible to gather convincing evidence for a causal hypothesis from observational data alone.

Let me ask you this, though, what would lead you to believe “g” is not a real thing? What would lead you to be more in favor of “g” being a real thing (e.g. out there in the real causal model of the world). For example, I don’t think this is really evidence for “g” as a real thing:

“Lead both decreases IQ and increases crime in a dose-dependent way? That lends credence to causal interpretations of the observation that IQ and crime are closely correlated.”

I can invent an abstraction called “hit points” that is a function of your vital life signs you measure in a modern hospital. Then I point out that it varies in a dose dependent way with being hit with a hammer. In what sense is this evidence that hit points are real?
- Watercressed says:
  
  August 11, 2014 at 3:25 am
  
  What would it mean for hitpoints to be real?
  - Ilya Shpitser says:
    
    August 11, 2014 at 3:34 am
    
    “Hit points” is me applying the modeling philosophy behind “g” to healthcare (something Scott knows quite a bit about). It is safe to replace “hp” by “g” in any sentence.
    - rich says:
      
      August 11, 2014 at 4:08 am
      
      Well, the hit point model implies there are some people you can hit with a large axe, and they will still be standing to hit you back.
      
      Furthermore, with D&D style hitpoints, much of the observed outcomes of a few thousand 1 on 1 fights to the death will come down to that measurable axe-immunity.
      
      So experimental disproof is comparitvely easy, although getting it past the ethics panel may be trickier.
    - Watercressed says:
      
      August 11, 2014 at 4:26 am
      
      I’m not confused by the hit points analogy. I’m confused in general about what it means for a modeling philosophy to be real.
      
      When I think of the “real causal model of the world”, I think of physics. Everything else is humans’ causal model of the world, and judged by how closely it approximates physics, not whether it is really out there.
      
      It may be that the mathematical assumptions of whatever model are violated, and so all the calculations and theory that create a guarantee of accuracy are invalid. But, even without that, we still have induction–a model can fail to be proven in theory and still be useful in practice.
    - Watercressed says:
      
      August 11, 2014 at 4:38 am
      
      Let me try to put this more succinctly: below, you say that “Confounding gremlins will eat you if you use predictions to do policy.”
      
      We are using strongly-IQ-correlated SAT scores for college admissions. Companies have used actual IQ tests for employment decisions. Where are the gremlins?
- Harald K says:
  
  August 11, 2014 at 5:42 am
  
  I mentioned this in an earlier thread, but Cosma Shalizi, who teaches causal inference at Carnegie Mellon, did a great “little explicit math” explanation of some of the pitfalls of model inference in his blog posts on IQ, back in 2007 :
  
  http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/523.html
  - B.B. says:
    
    August 11, 2014 at 5:48 am
    
    Harald K says:
    I mentioned this in an earlier thread, but Cosma Shalizi, who teaches causal inference at Carnegie Mellon, did a great “little explicit math” explanation of some of the pitfalls of model inference in his blog posts on IQ, back in 2007 :
    
    Dalliard wrote a critical response to Shalizi’s article at Human Varieties.
    - Gilbert says:
      
      August 11, 2014 at 12:23 pm
      
      I wanted to reply to that, but I don’t have to:
      Everything gjm says in that thread. (Pattern there, too. That guy is in general one of only a couple of folks who make Less Wrong worth reading.)
  - Chris Stucchio says:
    
    August 11, 2014 at 8:23 am
    
    Shalizi is also using the term “statistical myth” to mean something *very different* from what you probably think it means. Essentially, his argument is claiming that if a bunch of independent factors lead to a single aggregate factor, then the aggregate is a “statistical myth”.
    
    For example, consider a bunch of particles each moving statistically independently of each other. You can put a piston adjacent to these particles and the motion of the piston can be predicted. You might postulate that there is a single underlying factor called “pressure” which is driving the piston.
    
    Shalizi argues that “pressure” is a statistical myth because in reality, P = sum(particle velocity if particle velocity > 0).
    
    Seriously – that is *literally* the argument he is making when he discusses 2766 independent variables.
    
    Similarly, the (incorrect) inference people draw from Shalizi’s article is obvious when framed this way. It might be true that statistical mechanics explains thermodynamics (and thus T is a statistical myth), but that doesn’t mean you shouldn’t say “my coffee is hot, better let it cool before drinking”.
    - Gilbert says:
      
      August 11, 2014 at 12:46 pm
      
      Actually he’s arguing that if a bunch of independent factors lead to a single aggregate factor, that is no proof of a causal factor and short of other evidence that causal factor would be a statistical myth.
      
      In your pressure analogy that’s basically equivalent to arguing there is no inherent tendency steering those gas particles against a wall – which is a true and a priory surprising lesson of thermodynamics.
      
      As for the lessons, he doesn’t say IQ test are never useful. Actually, I’ll let him speak for himself on that:
      
      There should be no dispute that, when we lack specialized and valid instruments, general IQ tests can be better than nothing. Claims that they are anything more than such stop-gaps — that they are triumphs of psychological science, illuminating the workings of the mind; keys to the fates of individuals and peoples; sources of harsh truths which only a courageous few have the strength to bear; etc., etc., — such claims are at present entirely unjustified, though not, perhaps, unmotivated.
    - Scott Alexander says:
      
      August 11, 2014 at 6:54 pm
      
      Can I get someone more pro-Shalizi or anti-IQ to confirm this claim that the article doesn’t mean “IQ isn’t real” any more than the interaction of many different molecules means “pressure isn’t real” (or, I would assume, “temperature isn’t real”)?
    - oneforward says:
      
      August 11, 2014 at 11:35 pm
      
      @Scott
      
      I’m not sure where I fall on the pro/anti IQ scale, but yes, the analogy is appropriate. Shalizi describes his article as “11,000 words on the triviality of finding that positively correlated variables are all correlated with a linear combination of each other, and why this becomes no more profound when the variables are scores on intelligence tests.”
      
      He’s not arguing that ‘g’ contains little useful information, or that there is no ‘g’ in the true causal model, just that you can’t get causal structure like gravity and electromagnetism just by measuring velocities and combining them into a single ‘temperature’ factor.
      
      (The analogy works less well on a technical level – the definition of temperature is not at all a linear combination of velocities.)
    - lambdaphage says:
      
      August 12, 2014 at 4:24 am
      
      Well it is a linear combination of the squares of the velocities.
    - Chris Stucchio says:
      
      August 12, 2014 at 9:11 am
      
      Oneforward, it was a mistake for me to mention temperature. I should have stuck to pressure, which is a linear combination of forces.
      
      Gilbert, indeed the correlations do not prove there is a single causal factor. All they prove is that measuring an aggregate is a useful thing to do.
      
      The quote you provide from Shalizi is odd. The first sentence admits g is valid, the second asserts that it is not. I have no idea what to make of it – the best I can tell is that he’s playing a Krugmanesque game, trying to make no untrue statements, while carefully affiliating his mood in a particular direction.
    - oneforward says:
      
      August 12, 2014 at 12:33 pm
      
      lambdaphage – No. Mean kinetic energy is closely related to temperature in many systems, but I was referring to the “partial derivative of energy with respect to entropy at constant volume and particle number” definition.
    - Toby Bartels says:
      
      August 12, 2014 at 5:10 pm
      
      I expect that ‘fiction’ would be a better word here. It’s more neutral in its connotations than ‘myth’, and it has precedents in phrases like ‘legal fiction’.
      
      Of course, you’re not Shalizi’s editor (or so I presume), but at least it may help us to read Shalizi if we make this substitution.
    - Gilbert says:
      
      August 13, 2014 at 6:09 pm
      
      @Chris Stucchio
      The first sentence admits a motte, the second one denies the (actually pseudoscientific) bailey.
      
      For a much more detailed explanation you could read my long-winded response to Scott’s next post. Or not, because it actually is excessively long and unlikely to change anyone’s mind.
- Vilhelm S says:
  
  August 11, 2014 at 10:05 am
  
  I think Scott mention this somewhere else, but the “hitpoints” idea is already how much medical research works. For example, experiemental antidepressants are evaluated using things like a “Hamilton Rating Scale”, which contains 21 questions like
  
  DEPRESSED MOOD – Sad, hopeless, helpless, worthless
  0 = Absent
  1 = Gloomy attitude, pessimism, hopelessness
  2 = Occasional weeping
  3 = Frequent weeping
  4 = Patient reports virtually only those feeling states in his / her spontaneous verbal and non-verbal communication
  
  You asses each question, add them all together for a total score, and then do a controlled trial to see if the antidepressant reduces the score or not.
  
  This seems exactly analogous to intelligence research, if you do an intervention to see if e.g. less childhood lead, or more folic acid during pregnancy, will affect IQ score. This can be useful even if you don’t believe that the IQ number measures one particular biological mechanism (glucose levels in the forebrain, or whatever).
kenzo says:

August 11, 2014 at 3:06 am

I’d guess that you, like most people, really do underestimate how important it is to be scrupulous about causality in social science and economics. Here are some slides from a talk called “Bloopers: How (Mostly) Smart People Get Causal Inference Wrong” that has some examples of important papers making incorrect assumptions that sounded reasonable on their face. Many of them needed semi-technical arguments to debunk. If you don’t immediately see the problems with all of the papers discussed, you might want to lower your confidence in your ability to make causal inferences from observational data.

“You can prove anything with statistics” largely comes down to the fact that any interesting thing you can prove contains implicit or explicit claims about causal mechanisms, and that social science is rich enough that you can always make assumptions about causal structures such that some set of statistics supports a given conclusion. If nothing else, finding such sets of “contradictory” statistics is good for telling you to think harder about causation.
Jon says:

August 11, 2014 at 3:12 am

Hi Scott,

I work with statistics on a daily basis, and think the social sciences have little to feel ashamed of: in general the sciences don’t have a good grasp of statistics either.

I am often dismayed to read a big headline-making medical paper, only to find that the statistics was bodged or nobbled in some way.

Personally, I think the “correlation” vs “causation” issue is a red herring. In practice, what people want are reasonable predictions — and the actual factors involved are less important.

For example, if I am depressed, I would like to know my personal likelihood of a certain drug helping me cope better — while it is interesting knowing that factor X has some weak correlation and factor Y has some strong correlation, these offer little help with my problem as they may interact in some strange way: I want a prediction.

What I want is a solved problem: we use machine learning in our phones, games consoles, internet, voice recognition, and various critical systems; but for some reason it hasn’t caught on as a tool for scientific research of health policy as far as I can tell.

I guess the issue is that if you train, say, a neural network, to predict the likelihood that a certain drug will help you with depression, given, say, thirty factors about your life; the predictive power may be excellent, but it doesn’t tell us much about the mechanisms involved, and is purely heuristic, so we can’t say “we are using a p-value of P and the rigorous meaning of that is…”. Instead, we can just say “we’ve seen some patterns in the data — based on that, here are our predictions”.

But my point is that this is exactly what you want.
- Ilya Shpitser says:
  
  August 11, 2014 at 3:31 am
  
  “I want a prediction.”
  
  I want to know what to do. Confounding gremlins will eat you if you use predictions to do policy.
  - Jon says:
    
    August 11, 2014 at 3:39 am
    
    Please explain?
    
    I thought the issue of cofounders was in trying to use some statistical finding you have for a single factor in the real world: where a huge number of factors interfere, and you need to do a lot of work to exclude cofounders to try and isolate one dimension of the mechanism.
    
    Using the prediction-based methods I allude to sidesteps this issue by taking into account a vast number of factors at once: you don’t care about the underlying mechanism because you’re working at a high-level above that where you are able to make real-world predictions but say nothing about the underlying mechanism. Similarly because we have no need to isolate individual factors (we WANT the effect of all the factors interacting) we don’t need things like control-group data or anything like that.
    - Ilya Shpitser says:
      
      August 11, 2014 at 3:42 am
      
      If we observe everything, absolutely everything, then we are ok. But we never do. The FDA asks for randomized controlled trials for a reason: they don’t use what you call “prediction methods” because they worry about confounding in the face of millions of causally relevant factors we have no hope of observing.
    - Doug S. says:
      
      August 11, 2014 at 7:43 pm
      
      Goodhart’s Law and Campbell’s Law.
      
      Standardized test scores predict college success tolerably well. Colleges use standardized tests as admissions criteria. Test-prep companies appear, and teach people how to game the tests. The predictive ability of the tests drops.
    - Douglas Knight says:
      
      August 11, 2014 at 8:33 pm
      
      Doug, can you provide evidence that the predictive value of SATs, for academic achievement, job performance, or, say, car crashes, has dropped after the advent of test prep?
Jon says:

August 11, 2014 at 4:48 am

I see what you mean. I think there’s a distinction here between doing a safety trial, and trying to make balanced decisions on the basis of the information you have. This mirrors, for example, the distinction in England between criminal and civil court proceedings: in criminal proceedings they optimise to avoid punishing innocent people (hence the “beyond reasonable doubt” phrase); whereas in civil proceedings, you they optimise the fraction of the time a ruling is correct (hence the “balance of probabilities” phrase).

Similarly, hypothesis testing is optimised for avoiding dangerous mistakes: so good for a study checking a drug isn’t going to kill lots of people.

However, once we are reasonably sure a drug/treatment isn’t going to kill lots of people, it seems like you then want to optimise for average-best-outcome instead. Both valid, but I think prediction-based approaches should get more of a look-in.
- Ilya Shpitser says:
  
  August 11, 2014 at 5:35 am
  
  I think getting prediction right and getting causality right are sort of orthogonal.
  
  We could apply the vast machinery smart machine learning people have developed for questions that arise in causal inference, or for prediction questions. The reasons the former isn’t done very much are perhaps related to what you mentioned in your original post:
  
  ‘Personally, I think the “correlation” vs “causation” issue is a red herring.’
  
  It just seems that ML people, for cultural/historical reasons, do not think about these issues very much. I think they should.
  - stubydoo says:
    
    August 11, 2014 at 7:14 pm
    
    Doing conditional predictions properly does require a proper angle on the causation.
JK says:

August 11, 2014 at 5:34 am

It’s a shame that The Bell Curve didn’t include analyses where the effect of IQ differences is examined within families. Years later, Charles Murray used the same NLSY data that The Bell Curve relied on to show that IQ correlates just fine with various outcomes within families (between siblings), too. In fact, the within-family parameter estimates are almost identical to the SES-adjusted estimates reported in The Bell Curve. The sib results invalidate about 90 percent of the criticisms against The Bell Curve, including Glymour’s alternative models.

While Herrnstein and Murray’s analyses were not genetically informative — the NLSY sibling data weren’t available when The Bell Curve was written — their causal assumptions were not conjectural but were based on the well-known results of twin and adoption studies. In other words, the book was an explication of Herrnstein’s syllogism.

As far as I understand the ideas of Glymour, Pearl, and other “causal philosophers”, they seem to think that the only way to discover causal relations is to have data with very specific properties and then check if they conform to certain mathematical expectations. However, it appears that this way of thinking (directed acyclic graphs or whatever) has contributed absolutely nothing to scientific progress in any field, perhaps because when the exacting assumptions about data in these models are met, causality is obvious anyway.

So I tend to think that the correct way to discover causal relations in social science is what Scott describes (although I don’t necessarily agree with his specific example). For example, the reason why the g model is supported by many psychometricians is not (only) that g can be reliably extracted from all correlation matrices of cognitive tests. The reason for the g model’s popularity is that it is supported by several independent lines of evidence, e.g., confirmatory factor analysis, multivariate behavioral genetics, predictive validity. The evidence for the model does not come from a single slam-dunk experiment, but from the cumulative force of different kinds of evidence. Arthur Jensen quite explicitly thought about the g model as a Lakatosian research programme, and I think it’s fair to say the g model has been very progressive in the Lakatosian sense. Shalizi’s article ignores all this and concentrates on factor analytic straw men.
- Ilya Shpitser says:
  
  August 11, 2014 at 5:44 am
  
  “has contributed absolutely nothing to scientific progress in any field”
  
  How do you measure contribution?
  
  “when the exacting assumptions about data in these models are met, causality is obvious anyway.”
  
  I have been studying this stuff for over a decade now, and it is often not obvious to me. I can give you a causal problem containing 6 variables, where all assumptions are exactly specified, and I am almost certain you will get it wrong.
  
  The entire point is that even _if_ all the assumptions are met, causal inference is _still_ very difficult.
  
  The fact that people need to publish papers, and the fact that people demand policy answers does not obligate Nature to be convenient.
  - JK says:
    
    August 11, 2014 at 6:28 am
    
    “How do you measure contribution?”
    
    I mean resolutions to long-running scientific disputes, for example. Can you mention any examples?
Ian Creasey says:

August 11, 2014 at 6:23 am

All mathematical arguments depend upon some kind of underlying assumptions. In pure mathematics, those are axioms of some kind. In applied mathematics or statistics, there’s some kind of data set on which calculations are performed.

If someone puts forward a mathematical argument, and the argument itself is too difficult to comprehend (so you need to take it on faith), then it is still reasonable to look at the underlying assumptions and data, and ask if they are relevant and sufficient.

It’s the old computing proverb: Garbage In, Garbage Out. It doesn’t matter how sophisticated the computer program is, if it runs on garbage data. And so if someone is trying to dazzle you with the sophistication of their methodology, you can still question the starting assumptions and data.
Armstrong For President 2020 says:

August 11, 2014 at 7:52 am

Not to nitpick here, because I do agree with the general form of the argument, but shouldn’t the immediate drop in crime after anti-lead laws were passed be strong evidence against lead poisoning as the cause?

By way of explanation, imagine that the hypothesis was that poor children were being regularly hit in the heads with hammers decreasing their IQs due to brain damage in their formative years. After passing the anti-hammer laws the hammering immediately ceases… but logically, the brain damage of the up-and-coming generation of children already hammered wouldn’t simply go away. In that case it would make much more sense if the effect was delayed by 10-20 years as the new un-hammered cohorts came up, or was at least a slow progressive improvement as less-hammered generations succeeded more-hammered ones. But if the graph is essentially a discontinuity just a few years down the line that should imply that being hit with hammers wasn’t the (primary) problem.

And in the real world, since the observed drop in crime was a global decrease rather than being pronounced in younger offenders that means that the lead explanation seems even less likely. It is plausible, even if odd, that young people could rapidly recover soon after trauma stopped but it is very unlikely that older generations would show the same vigour.
- Jesse says:
  
  August 11, 2014 at 10:15 am
  
  It wasn’t an immediate decrease, it was delayed by about 20 years. Kevin Drum has written a few articles about this, e.g. http://www.motherjones.com/environment/2013/01/lead-crime-link-gasoline
pwyll says:

August 11, 2014 at 9:29 am

Scott, I’m willing to believe that you’re bad at math relative to your verbal ability. But, Ashkenazi Jews, as a group, have the highest verbal ability of any well-known ethnic group. So, I suspect that your math ability is just fine on an absolute basis.

As for Glymour, one heuristic that may help you (beyond the obvious “does the author seem to have a very large ax to grind?”) is whether the author seems to be trying to explain something as clearly as possible, or just using jargon to intimidate. Glymour’s prose sounds very similar to Stephen Jay Gould’s disingenuousness in “The Mismeasure of Man”, which is very suspicious to me.

Math was my best subject in school, I was extremely good at it, and I currently depend on my understanding of statistics to make a living. However, I have no idea what a “vanishing tetrad” is. Glymour is a philosopher, and it would not surprise me in the least if he also has no idea what a “vanishing tetrad” is, and is merely generating a cloud of squid ink.

Ditto as well to the comments above criticizing Shalizi.
- Anonymous says:
  
  August 11, 2014 at 10:25 am
  
  Your primary source of evidence for Scott having high verbal IQ is the fact that he’s a Jew, a group with a verbal IQ elevated up at 110-120…and you post this on his blog, which is clearly written by someone who would be getting a near-ceiling score for pretty much any verbal IQ test.
  
  This is kind of like saying, “I’m totally willing to believe Michael Jordan is better at basketball than swimming…because he’s tall, you see?”
  - pwyll says:
    
    August 12, 2014 at 10:14 am
    
    This is kind of like saying, “I’m totally willing to believe Michael Jordan is better at basketball than swimming…because he’s tall, you see?”
    
    Not exactly. Let’s see if we can stretch your analogy further. Suppose Michael Jordan came to you in 1994, after having played for a while with the Chicago White Sox’s minor-league team, complained about not getting into the major leagues, and stated that he “wasn’t very good at baseball.”
    
    Well, sure… compared to how good he is at basketball! But being a AA-league baseball player still makes you better than the vast majority of the population. And furthermore, I’d bet money that NBA players *as a group* are far better at baseball than the average American, even if they’re much worse at baseball than they are at basketball.
- The Do-Operator says:
  
  August 11, 2014 at 2:02 pm
  
  Clark Glymour is not that kind of philosopher. He is the author of The Glymour Manifesto and he is one of the founders of modern causality theory. When he uses the term “vanishing tetrad” it has a precise meaning and refers to well-defined mathematical objects.
  - Douglas Knight says:
    
    August 11, 2014 at 3:37 pm
    
    General principle: do not judge CMU departments by their names.
    - pwyll says:
      
      August 12, 2014 at 10:18 am
      
      Fair enough… I’m completely unfamiliar with Glymour. Thanks for the clarification; I will attempt to quell my suspicion.
Troy says:

August 11, 2014 at 11:20 am

It’s Lydia McGrew, Scott.

Also, your linked critique seems to largely ignore the McGrews’ arguments for a low value of P(people make stuff up about Jesus), filling in for “people,” “the early Christian disciples who died martyrs’ deaths in attestation to Jesus’ resurrection.” I also think (with Lydia) that there are obvious significant differences between the what later obviously apocryphal writers and writers who knew Jesus or knew eyewitnesses to Jesus’s ministry. (The McGrews are clear that they are assuming in their paper that the Gospel writers are such: see p. 597.)

At any rate, the McGrews would be the first to admit that this paper does not answer all the questions one might reasonably ask about the historical truth of Christianity. You’re quite right that that paper comes close to taking the truth of the Gospels as a premise; the McGrews wanted to show that Humean skepticism about the resurrection is unjustified given certain historical assumptions about the New Testament. They’re both able and willing to defend those assumptions elsewhere; Tim, for example, has a comprehensive series of lectures on the Reliability of the Gospels, linked to at http://www.apologetics315.com/2012/11/audio-resources-by-tim-mcgrew.html, defending the claims that the Gospels were written by their traditional authors, that they are substantially confirmed by both external and internal evidences, and that they contain neither egregious errors or contradictions. The second through fourth lectures in that series in particular substantially answer the kinds of worries you set out that proceed by analogy from the admittedly apocryphal gospels to skepticism about the canonical gospels.
- Mark says:
  
  August 11, 2014 at 2:04 pm
  
  Does Tim McGrew address the observation selection effect problem that Scott also raised? I.e., that we wouldn’t have expected high-quality evidence that disconfirmed the resurrection to have survived.
  - Troy says:
    
    August 11, 2014 at 2:43 pm
    
    I think Scott is right to criticize the argument that the Gospels are true because if not someone would have exposed them. But I also don’t think the McGrews place as much weight on this argument as Scott implies. I’d have to reread their essay again to see exactly what they say, but I think their point against the hypothesis of intentional lying was that this would have been unlikely to succeed if the lying was easily exposed. This both makes it less likely that the disciples would attempt it in the first place (at least, given the bad consequences that would likely follow — i.e., death) and less likely that they would have been successful in spreading the lies if they did attempt it. I don’t think the McGrews are saying that it’s likely that, say, we would still have some document today saying “and then Joseph of Arimithea produced the dead body of Jesus.” (There are other arguments against a hoax theory, too, of course — it’s a cumulative case — but this is the one that seems closest to what Scott had in mind.)
    
    At any rate, the McGrews are in general quite critical of arguments from silence, and (rightly, I think) point out their overuse by skeptics of the historicity of the New Testament (see p. 598 of the essay). They would not disagree with Scott’s skepticism of this mode of argument.
    - Mark says:
      
      August 11, 2014 at 4:42 pm
      
      I think their point against the hypothesis of intentional lying was that this would have been unlikely to succeed if the lying was easily exposed. This both makes it less likely that the disciples would attempt it in the first place (at least, given the bad consequences that would likely follow — i.e., death) and less likely that they would have been successful in spreading the lies if they did attempt it.
      
      The point is that if the disciples were lying and there indeed were cases where their lies were exposed – something that probably only could’ve happen during a brief window after Jesus’ death – we wouldn’t necessarily know about it, so it’s hard to say definitively that their lies were successful at a stage where anyone was in a position to see through them. While the disciples were, of course, ultimately highly successful in promulgating Christianity, the vast majority of converts would’ve been won over long after concrete, indisputable disproof of the resurrection became impossible.
    - Troy says:
      
      August 11, 2014 at 11:22 pm
      
      The point is that if the disciples were lying and there indeed were cases where their lies were exposed – something that probably only could’ve happen during a brief window after Jesus’ death – we wouldn’t necessarily know about it, so it’s hard to say definitively that their lies were successful at a stage where anyone was in a position to see through them.
      
      Our primary evidence that the disciples were successful in spreading Christianity in the 1st century is not the absence of testimony to the contrary but the presence of testimony either explicitly or implicitly documenting this success. Remember that the McGrews are assuming the substantial historical reliability of the book of Acts and the Pauline epistles on non-miraculous matters here. One can, of course, be skeptical about this, but again, this paper was not meant as an argument against this particular skepticism (even if the McGrews do explain in the essay why they think such skepticism is implausible).
    - Mark says:
      
      August 12, 2014 at 11:10 am
      
      Our primary evidence that the disciples were successful in spreading Christianity in the 1st century is not the absence of testimony to the contrary but the presence of testimony either explicitly or implicitly documenting this success. Remember that the McGrews are assuming the substantial historical reliability of the book of Acts and the Pauline epistles on non-miraculous matters here.
      
      I’m not sure what difference that makes. Acts has the apostles converting some people, but suppose they failed to convince vastly more people. Does this make them successful or unsuccessful? Do we expect frauds to make literally zero converts, rather than just a few? Anyone in the world can see that Scientology is fraudulent through the most cursory online investigation, but people still become Scientologists.
      
      Also, the majority of stuff in Acts and especially the Pauline epistles would’ve happened after the point where Jesus’ corpse decomposed beyond recognition, or at least to a point of plausible deniability. Even Acts 4:4, the most compelling counterpoint one could probably make if Acts is to be believed about everything, has no date attached. Again, if the claim is that the apostles would’ve been unlikely to succeed if they were lying since their lies would otherwise be exposed, it doesn’t help to cite success that occurred after clear debunking became impossible.
    - Troy says:
      
      August 12, 2014 at 1:10 pm
      
      Also, the majority of stuff in Acts and especially the Pauline epistles would’ve happened after the point where Jesus’ corpse decomposed beyond recognition, or at least to a point of plausible deniability.
      
      I am not knowledgeable enough about 1st century burial practices to know whether or not this is true. But I don’t think that the disciples would have exactly had “plausible deniability” for an on the face of it extraordinary claim had, say, Joseph of Arimathea, a respected leader in the Jewish community, produced the mostly decayed body of a crucified man that he had in his possession (cf. the McGrews, p. 621). More generally, a public denial of the disciples’ claims by someone who would be presumed to know whether they were lying need not involve production of Jesus’ body at all to be convincing; Scott’s own example was an alleged witness (by the New Testament accounts) of the resurrection coming forward to say “No, we didn’t see this at all, shut up.”
      
      I reiterate that this is a very minor part of the McGrews’ argument. In fact, skimming back over the paper, the page referenced above, in which they address Jeffrey Lowder’s hypothesis that Joseph of Arimathea put Jesus’ body in his own tomb, is one of the only places where it seems to come up at all. Their main argument against a conspiracy is not that, say, named witnesses of the resurrection didn’t come forward to deny it, but that named witnesses of the resurrection publicly attested to it in the face of violent threats, and that many of them in fact gave up their lives in attestation of this fact. They also observe that the prior plausibility of a conspiracy is low, but here again, they rely not only on the fact that a conspiracy would have been unlikely to be successful, but that they had no clear motive for it, especially in a hostile environment in which spreading this message would most likely lead to death. The conspiracy hypothesis also does not explain the other facts the McGrews argue are highly confirmatory of the resurrection (the conversion of St Paul and the women who found Jesus’ tomb empty and claimed to have spoken to him).
    - Mark says:
      
      August 12, 2014 at 8:58 pm
      
      But I don’t think that the disciples would have exactly had “plausible deniability” for an on the face of it extraordinary claim had, say, Joseph of Arimathea, a respected leader in the Jewish community, produced the mostly decayed body of a crucified man that he had in his possession (cf. the McGrews, p. 621).
      
      Why is that? It clearly wouldn’t have been helpful to the early Christians, but I don’t see why they should’ve seen it as fatal to their proselytization efforts. Many, many determined cults have thrived in the face of worse embarrassment.
      
      Keep in mind that “plausible deniability” here really means “plausible enough to not completely dissuade all existing believers, and temporary enough that it’ll turn into mere hearsay that I can go on to dismiss.” It’s not like Joseph of Arimathea in your example would’ve been able to upload a video of his findings to YouTube for anyone to find at any time.
      
      More generally, a public denial of the disciples’ claims by someone who would be presumed to know whether they were lying need not involve production of Jesus’ body at all to be convincing; Scott’s own example was an alleged witness (by the New Testament accounts) of the resurrection coming forward to say “No, we didn’t see this at all, shut up.”
      
      Sorry, I’m confused about what you’re suggesting here. Like, who aside from the disciples was present on the mountain where Jesus appeared to give the Great Commission? Which publicly named individuals are you imagining were present to debunk the sightings?
      
      Their main argument against a conspiracy is not that, say, named witnesses of the resurrection didn’t come forward to deny it, but that named witnesses of the resurrection publicly attested to it in the face of violent threats, and that many of them in fact gave up their lives in attestation of this fact.
      
      Imagine you hear dozens of independent reports from eye witnesses that Bob the psychic is successfully predicting their precise futures. Indeed, all of the witnesses endure terrible hardships to uphold the veracity of their claims. But then you learn that, for whatever reason, you’ve only been allowed to hear positive reports. This isn’t to say you’ve heard that negative reports exist, just that if they do exist, they’ve been deliberately filtered out.
      
      In this scenario, you have the same evidence for the reality of Bob’s powers as you attribute to the McGrews’ case for the resurrection: lots of very probably sincere reports of events that are rather tricky to explain naturalistically. But given the observation selection effect, it should still be obvious that you should remain skeptical, since you no longer have (much of) an idea what the rate of correct predictions is. The psychic could be guessing randomly, yet given enough trials, you’d expect to observe some eerie successes.
      
      Now that imagine one of the disciples denied that they experienced seeing Jesus in a literal way. He agrees that they had some powerful but vague “collective spiritual experience,” and that consequently he’s indeed wholly convinced that Jesus rose from the dead – but he also thinks that the other disciples are exaggerating what happened a bit. What’s the probability that his testimony would’ve survived? What’s the probability that he would’ve even bothered to voice his hesitation, given that he had nothing to lose by letting others exaggerate events to more persuasively reflect an underlying reality he already believes? I’d say “not very high.”
      
      If people like James Randi or Joe Nickell were there to interview everyone and subsequently preserve those interviews for posterity – Scott’s “1st century AD Judean skeptical community” – I’d be less worried about this kind of misattribution. Sadly, though, the documentation we actually have doesn’t come very close to that standard. So I think there’s an observation selection effect problem that you’re still sort of sweeping under the rug: you’re assuming that the evidence we have of the disciples’ testimony isn’t itself victim to selection effects. To be fair, this problem probably applies to a lot of history, but then again, most history doesn’t ask us to revise our entire metaphysics.
      
      They also observe that the prior plausibility of a conspiracy is low, but here again, they rely not only on the fact that a conspiracy would have been unlikely to be successful, but that they had no clear motive for it, especially in a hostile environment in which spreading this message would most likely lead to death.
      
      Minor point, but what’s the problem with a low prior for a conspiracy? They grant (for the sake of argument) that the resurrection has a low prior, which doesn’t lead them to rule it out.
    - Troy says:
      
      August 12, 2014 at 11:53 pm
      
      Hi Mark,
      
      I agree with you that public disconfirmation needn’t have been fatal to Christian proselytization in general. But I do think that events like the conversion of 5000 Jews in Jerusalem in Acts 4 would have been much less likely in the kind of scenario I’ve described. Compare, for example, Muhammad’s success in gaining converts in the first several years of his ministry (in Mecca, before taking Medina by force).
      
      Sorry, I’m confused about what you’re suggesting here. Like, who aside from the disciples was present on the mountain where Jesus appeared to give the Great Commission? Which publicly named individuals are you imagining were present to debunk the sightings?
      
      I was referencing what Scott says in his linked post on LessWrong. There he summarizes part of the McGrews’ argument as follows:
      
      The Gospels say many people saw Jesus die on the Cross and then saw him alive later, and that natural explanations … are all unconvincing; therefore Jesus really was resurrected. According to the Gospels, this was seen by many witnesses, including luminaries like St. Peter, and none of them later came forward to say “No, we didn’t see this at all, shut up”.
      
      As far as I know the only publicly named individuals who saw the risen Christ were the 11 disciples plus Matthias and “Joseph called Barsabbas” (Acts 1:23), and the women at the tomb; Acts 1 implies and 1 Corinthians 15 say that there were many more, but their names are not given. It is likely that various named apostles elsewhere in Acts (e.g., Barnabas, John Mark) were among the 500 mentioned by Paul and the 120 mentioned by Luke in Acts 1, but we don’t know that. I suppose that other named individuals in the Gospels could have been in a position to debunk miracle claims made by early Christian disciples as well (e.g., Lazarus); although I wouldn’t put as much weight on that.
      
      Whether these people are potential “debunkers” depends on whether they’re supposed to be part of the conspiracy to lie about the resurrection, and whether or not that’s the case depends on how you formulate the conspiracy hypothesis. The more you add to the conspiracy, the less intrinsically plausible it is; the fewer you have, the more others are in a position to debunk it.
      
      But at any rate, I’ve already said that I don’t think that the fact that, say, those disciples who did not pen Gospels (or Letters) didn’t debunk the resurrection isn’t a very significant fact in the broader historical context; what’s significant is that they (according to the historical record) actively affirmed the resurrection, in the face of certain or probable death. I think that Scott’s summary, quoted above, is misleading in that respect. (He does then mention the disciples’ martyrdom in the rest of his summary, but doesn’t address it in his critique.)
      
      Imagine you hear dozens of independent reports from eye witnesses that Bob the psychic is successfully predicting their precise futures. Indeed, all of the witnesses endure terrible hardships to uphold the veracity of their claims. But then you learn that, for whatever reason, you’ve only been allowed to hear positive reports. This isn’t to say you’ve heard that negative reports exist, just that if they do exist, they’ve been deliberately filtered out.
      …
      Now that imagine one of the disciples denied that they experienced seeing Jesus in a literal way. He agrees that they had some powerful but vague “collective spiritual experience,” and that consequently he’s indeed wholly convinced that Jesus rose from the dead – but he also thinks that the other disciples are exaggerating what happened a bit. What’s the probability that his testimony would’ve survived? What’s the probability that he would’ve even bothered to voice his hesitation, given that he had nothing to lose by letting others exaggerate events to more persuasively reflect an underlying reality he already believes? I’d say “not very high.”
      
      I’m not entirely sure what scenario you’re describing in this second paragraph, but it seems to be some kind of “well-meaning error” hypothesis. There are different arguments against the plausibility of that – you need some kind of implausible collective hallucination (or over a dozen individual hallucinations), you need some way in which that could eventually get built up into the concrete descriptions of the physically risen Christ we find in the Gospels, you need to explain the disciples’ universal willingness to give their lives in attestation of their “exaggeration,” and so on. But supposing things happened this way, I definitely disagree about the second question. Your hypothetical disciple would have had his life to lose by sticking with the exaggerators.
      
      About the first question I agree with reservations. The probability that any ancient document will survive to the present day is not especially high. But your earlier analogy suggests you have something stronger in mind than this: that “if negative reports exist, … they’ve been deliberately filtered out.” Obviously knowledge of deliberate suppression of any contrary evidence that exists would raise suspicions in both the Christian and psychic cases; but this knowledge is tantamount to knowledge of collusion. And in the case of the early Christians I don’t think we have this knowledge. If there were negative reports, I think it’s more likely that the Biblical writers would have mentioned them and responded to them. For example, Paul mentions and denounces various heresies in his letters; and Acts describes conflict within the early church between Peter and Paul. The New Testament does not portray all Christians as having agreed about everything.
      
      Minor point, but what’s the problem with a low prior for a conspiracy? They grant (for the sake of argument) that the resurrection has a low prior, which doesn’t lead them to rule it out.
      
      It’s a cumulative case; a low prior plus low explanatory power leads to low posterior probability. They think the resurrection has a low prior but extremely high explanatory power. (Although they’re both sympathetic to several natural theological arguments which would presumably raise the prior of theism and so of the resurrection.)
    - Mark says:
      
      August 13, 2014 at 2:49 am
      
      Let me try to more clearly restate the scenario I was gesturing at. Imagine that the disciples didn’t hallucinate anything, just had a number of shared euphoric religious experiences shortly after Jesus’ death. (Not something very uncommon in small cults facing terrible crises.) They interpret their experiences to mean that Jesus conquered death and is now running things in heaven. So they all decide to double down on their beliefs despite the loss of their leader, and go about preaching Christ’s resurrection with renewed fervor.
      
      Soon, one of the disciples – say, Peter – starts to say that he and the other disciples actually saw Jesus. This could be taken rather figuratively, so they let it slide. Peter starts embellishing vivid details, however, and it quickly turns into full-blown, graphic visions. The other disciples grumble a bit, but Peter is doing an awesome job converting people, so they’re like “yeah, okay, whatever, we ‘saw’ Jesus, fine” and keep doing what they were already doing.
      
      Word spreads. Before long, everyone has heard about Jesus’ literal appearances in front of the disciples. Christian chroniclers naturally choose to retell the most fantastic and self-serving versions of events, while gentle attempts at correction by the disciples fail to circulate because they’re boring or disappointing. The myths grow so widespread that it gets to the point where if any of the original disciples denies that he actually saw Jesus in the flesh like all the rumors say, he risks completely fracturing the embryonic Christian community. So they all decide they have to just let it go. Which is O.K., because it doesn’t interfere in any way with their mission, which is to spread the word of Jesus’ divinity. Quite the contrary, in fact.
      
      The Romans or Jews or whoever eventually get mad at the disciples and martyr them. But the disciples die not for their attestation of having seen Jesus, but for their attestation of Jesus being the messiah. The tall tales Peter spread were merely incidental to their core convictions. They allowed a convenient lie to become popular, but they didn’t lay down their lives for a lie.
      
      I don’t know about you, but this story doesn’t seem shockingly implausible to me. It’s compatible with human psychology, it supplies motives, it explains why everyone would endure martyrdom and it involves no deliberate suppression, just purely passive observation selection effects.
      
      Also, I think you’re coming dangerously close to an argument from silence. “If there were disagreements, Acts or Paul would’ve recorded them.” There’s undoubtedly a lot of Paul’s writing we don’t have, and there’s no indication Acts would’ve mentioned every disagreement that popped up, especially if it was sufficiently embarrassing to the Church, like “half the disciples seem kind of iffy on the whole ‘seeing the resurrected Jesus’ thing when pressed on it.”
      - Toby Bartels says:
        
        August 13, 2014 at 4:10 am
        
        @Mark:
        
        Yes; and more so, cognitive dissonance may set in over time, so that many if not most of the original apostles will come to believe that the exaggerations are literal truths.
        
        This all sounds very plausible, although I wonder if we can do even better by examining the course of events in contemporary new religious movements.
    - Troy says:
      
      August 13, 2014 at 11:56 am
      
      Let me try to more clearly restate the scenario I was gesturing at. Imagine that the disciples didn’t hallucinate anything, just had a number of shared euphoric religious experiences shortly after Jesus’ death. (Not something very uncommon in small cults facing terrible crises.) They interpret their experiences to mean that Jesus conquered death and is now running things in heaven. So they all decide to double down on their beliefs despite the loss of their leader, and go about preaching Christ’s resurrection with renewed fervor.
      
      Soon, one of the disciples – say, Peter – starts to say that he and the other disciples actually saw Jesus. This could be taken rather figuratively, so they let it slide. Peter starts embellishing vivid details, however, and it quickly turns into full-blown, graphic visions. The other disciples grumble a bit, but Peter is doing an awesome job converting people, so they’re like “yeah, okay, whatever, we ‘saw’ Jesus, fine” and keep doing what they were already doing.
      
      N.T. Wright argues that this is not a likely response of 1st century Jewish disciples to the death of their rabbi. Although Jews at that time did believe in a general resurrection at the end of time, they did not expect individuals, even the Messiah, to be resurrected even now. (See chapter 4 of Wright’s The Resurrection of the Son of God.)
      
      I don’t think “shared euphoric religious experiences” are themselves terribly implausible, mostly because this is sufficiently vague as to encompass a number of possibilities. But I think that when you add enough detail to the experiences in this case to make it reasonable that the disciples would draw the above conclusions from it, the probability of such experiences becomes very small. The McGrews also list several reasons to be skeptical of hallucination theories in this particular case on pp. 625-26.
      
      Word spreads. Before long, everyone has heard about Jesus’ literal appearances in front of the disciples. Christian chroniclers naturally choose to retell the most fantastic and self-serving versions of events, while gentle attempts at correction by the disciples fail to circulate because they’re boring or disappointing.
      
      This comes close to denying the McGrews’ assumption (p. 597) “that we have a substantially accurate text of the four gospels, Acts, and several of the undisputed Pauline epistles (most significantly Galatians and I Corinthians); that the gospels were written, if not by the authors whose names they now bear, at least by disciples of Jesus or people who knew those disciples – people who knew at first hand the details of his life and teaching or people who spoke with those eyewitnesses – and that the narratives, at least where not explicitly asserting the occurrence of a miracle, deserve as much credence as similarly attested documents would be accorded if they reported strictly secular matters. Where the texts do assert something miraculous – for example, Jesus’ post-resurrection appearances – we take it, given the basic assumption of authenticity, that the narrative represents what someone relatively close to the situation claimed.”
      
      If we take the (unanimous) witness of the early church fathers seriously, Matthew and John were both written by disciples of Jesus, Mark was written by a disciple of Peter, and Luke and Acts were written by the companion of Paul by that name mentioned in Acts, who spoke to eyewitnesses of the events of Jesus’ life in composing his gospel. Under this assumption, Matthew and John are positively reinforcing Peter’s exaggerations, and Paul and Luke are either extremely careless in their recounting of events or have been misled by those they’ve talked to. If we deny that Matthew and John are the authors of the books that bear their names, but still assume that they were written by people close to the disciples in question, the general problem remains.
      
      More generally, this hypothesis has a hard time explaining the Gospel accounts of the resurrection, which are not “hey, we had this experience, and oh yeah, we actually saw Jesus,” but recount specific, concrete interactions with the risen Jesus (Jesus eating fish and cooking food, Thomas touching Jesus’s wounds, etc.), who according to Acts 1, interacted with the disciples for over a month. At the very least this account must deny that, say, Thomas believed that he had touched Jesus’s wounds; and even if you jettison the McGrews’ assumption “that the narrative represents what someone relatively close to the situation claimed” you’re still left with explaining how this gradual process of legendary accretion led to such concrete descriptions of events.
      
      The myths grow so widespread that it gets to the point where if any of the original disciples denies that he actually saw Jesus in the flesh like all the rumors say, he risks completely fracturing the embryonic Christian community. So they all decide they have to just let it go. Which is O.K., because it doesn’t interfere in any way with their mission, which is to spread the word of Jesus’ divinity. Quite the contrary, in fact.
      
      From the New Testament record, it does not appear that early Christian leaders were afraid of airing disagreements within the Christian community if they thought the matter important. (See, e.g. Galatians 2:11-14 and Acts 15.)
      
      Also, I think you’re coming dangerously close to an argument from silence. “If there were disagreements, Acts or Paul would’ve recorded them.”
      
      I am not saying anything that strong. I’m saying that if there were disagreements it’s more likely that Acts or Paul would’ve recorded them than that they would’ve suppressed them. I’m not saying it’s likely that they would have recorded them, or that any recordings would have survived. What I was denying is that there’s a high probability of intentional suppression.
    - Mark says:
      
      August 14, 2014 at 1:20 pm
      
      N.T. Wright argues that this is not a likely response of 1st century Jewish disciples to the death of their rabbi. Although Jews at that time did believe in a general resurrection at the end of time, they did not expect individuals, even the Messiah, to be resurrected even now. (See chapter 4 of Wright’s The Resurrection of the Son of God.)
      
      I have a lot of respect for N.T. Wright, but I find this argument, or at least the conclusions being drawn from it, pretty specious. First, humans are actually, well, creative, and very regularly come up with bizarre, novel belief systems that do not fall perfectly in line with their particular cultural milieu. So it’s important to distinguish the unusual from the implausible. If we discovered a text that attested to a tiny first century BC Jewish sect that believed in something totally foreign to local culture, far more so than short-term resurrection – e.g., that Zeus was a real, lesser god – classicists wouldn’t necessarily be in a rush to dismiss it as a forgery. They’d more likely just regard it as an interesting footnote.
      
      Second, I think this argument ignores the circumstance of the hypothetical I proposed. I’m not suggesting that the disciples dispassionately discussed Jesus’ most likely fate after his death and decided it’s more probably that he rose from the dead a few days later. I’m imagining instead that they had an intensely powerful (but not hallucinatory) religious experience and were left searching for explanations thereof. You implicitly agree that they already had some possible belief in messianic resurrection; all they had to do was give it a different schedule in order to make sense of what happened to them. (After all, how would Christ’s resurrection in the distant future explain their present experiences?) Frankly, this isn’t a huge conceptual leap at all.
      
      Re: the proximity of the Gospel authors to the disciples. Most scholars agree that the post-resurrection appearances in Mark are later additions to the text. I’m not sure what the McGrews’ views are, but I’m comfortable ignoring this one regardless of authorship. Regarding Luke and Paul being careless and not cross-checking all of the resurrection appearances stories with all the disciples, well, again – maybe that’d seem unusual to you, but it’s still not too implausible. To reuse this rhetorical device, if we discovered a new Pauline epistle where Paul was like, “I heard from Peter that Jesus appeared to all the disciples at event X and I just took his word for it,” this would hardly shatter everything we thought we knew about inter-apostolic relations. It’d just be an interesting footnote that filled in a couple of current gaps in our knowledge. I don’t think you have an ironclad case at all that Paul and Luke couldn’t have been misled.
      
      That leaves us with Matthew and John, just two people. If Matthew, say, was the one who originated the myths about who Jesus appeared to, that means you only have one active collaborator, John. Intriguing, but still not earth-shatteringly implausible.
      
      More generally, this hypothesis has a hard time explaining the Gospel accounts of the resurrection, which are not “hey, we had this experience, and oh yeah, we actually saw Jesus,” but recount specific, concrete interactions with the risen Jesus (Jesus eating fish and cooking food, Thomas touching Jesus’s wounds, etc.), who according to Acts 1, interacted with the disciples for over a month.
      
      Not sure I’m seeing the problem here. All apocryphal Gospels contain highly detailed and wholly mythical accounts of Jesus’ life. Once myths start, it’s easy to develop their specifics.
      
      From the New Testament record, it does not appear that early Christian leaders were afraid of airing disagreements within the Christian community if they thought the matter important. (See, e.g. Galatians 2:11-14 and Acts 15.)
      
      But if you reread my hypothetical scenario, you’ll see that I was imagining the other disciples didn’t view the lies of physical appearances as very important in the scheme of things. (Unlike, say, the fundamental issues of religious praxis that you point to here.)
      
      I am not saying anything that strong. I’m saying that if there were disagreements it’s more likely that Acts or Paul would’ve recorded them than that they would’ve suppressed them. I’m not saying it’s likely that they would have recorded them, or that any recordings would have survived. What I was denying is that there’s a high probability of intentional suppression.
      
      All right. But I didn’t mean to suggest that active suppression was happening. My psychic example was simply intended to make the more general point that, contra your original responses to me, observation selection effects can still pose huge problems even if you’re not explicitly making arguments from silence.
    - Troy says:
      
      August 14, 2014 at 3:36 pm
      
      I’m imagining instead that they had an intensely powerful (but not hallucinatory) religious experience and were left searching for explanations thereof.
      
      Right, and I was arguing that this explanation is not the one that would have most likely come to mind.
      
      I agree with you that it’s not impossible; in history we’re dealing with degrees of plausibility, and so pointing out that the hypothesis doesn’t fit well with our background expectations from the culture doesn’t disprove it; it’s just a point against it as we evaluate its overall plausibility.
      
      Most scholars agree that the post-resurrection appearances in Mark are later additions to the text.
      
      Yes, the original ending of Mark was probably lost (the text we have appears to end in the middle of a sentence), and what we have currently was almost certainly added later. The original text does mention the empty tomb and the resurrection (the women are told by a “young man … in a white robe,” presumably an angel, that Jesus is risen), but not the post-resurrection appearances to the women. So Mark does attest to a resurrection, but admittedly not in the kind of detail as the other gospels.
      
      Regarding Luke and Paul being careless and not cross-checking all of the resurrection appearances stories with all the disciples, well, again – maybe that’d seem unusual to you, but it’s still not too implausible. To use this rhetorical device again, if we discovered a new Pauline epistle where Paul was like, “I heard from Peter that Jesus appeared to all the disciples at event X and I just took his word for it,” this would hardly shatter everything we thought we knew about inter-apostolic relations. It’d just be an interesting footnote that filled in a couple of current gaps in our knowledge. I don’t think you have an ironclad case at all that Paul and Luke couldn’t have been misled.
      
      Here we disagree. First, all other indications (internal evidences, archaeological confirmations, agreement with other ancient sources [especially in Acts]) suggest that Luke was a very careful historian. Luke also explicitly says (Luke 1:2) that he consulted multiple eyewitnesses in writing his Gospel. Second, Paul interacted with many of the disciples and others to whom Jesus reportedly appeared. Given the centrality of the resurrection to the early church in general and Paul’s theology in particular, it would be very surprising to find out that Paul “just took Peter’s word for it,” and didn’t talk with anyone else at the events.
      
      Not sure I’m seeing the problem here. All apocryphal Gospels contain highly detailed and wholly mythical accounts of Jesus’ life. Once myths start, it’s easy to develop their specifics.
      
      This gets us into the specifics of reasons to think that the Gospel accounts are (in general) historically reliable. Apocryphal gospels “say lots of stuff,” sure. But where their material is not obviously copied from the canonical Gospels, the various events do not contain the kinds of specificity – e.g., contextualization of events, mention of specific places or times beyond the most general or obvious (e.g., Jerusalem) – present in the canonical Gospels.
      
      More significantly, the canonical Gospel accounts, unlike the apocryphal accounts, confirm each other through the presence of what Tim calls “undesigned coincidences” – accounts that fill in details left out by the other that are highly unlikely to result from either forgery, misreporting, or repetition of someone else’s lie. For example, take Luke (9:10-17) and John’s (6:5-15) reports of the feeding of the five thousand. From Luke we learn that the feeding took place in Bethsaida, something John doesn’t tell us. From John 6:5 tells us that when Jesus saw the crowds, he said to Philip – a marginal disciple who is hardly mentioned elsewhere in the Gospels except in the calling of the 12 – “Where are we to buy bread, so that these people may eat?” Elsewhere in John (1:44 and 12:21) we independently learn that Philip was from Bethsaida – explaining why Jesus was asking him and not one of the other disciples. This kind of coincidence is very unlikely to result except in the case of competent historians recording actual events independently. It is not at all easy to “develop the specifics” of lies or legendary accretions in the way that leads to this kind of interlocking description. One example hardly establishes this point, but there are many, many more examples (see, e.g., Tim’s discussion in this video: https://www.youtube.com/watch?v=9wUcrwYocgM).
      
      But if you reread my hypothetical scenario, you’ll see that I was imagining the other disciples didn’t view the lies of physical appearances as very important in the scheme of things. (Unlike, say, the fundamental issues of religious praxis that you point to here.)
      
      Given the importance of the resurrection in early Christian theology (see, e.g., 1 Corinthians 15: 12ff), and that later disputes in the early church almost all centered around Christology, I find it very hard to believe that the disciples would have viewed this kind of lie as not very important.
      
      My psychic example was simply intended to make the more general point that, contra your original responses to me, observation selection effects could still pose huge problems even if you’re not explicitly making arguments from silence.
      
      I think it depends on what you mean by observation selection effects. As I pointed out, we do not have strong reason to believe that the conditional “If there were negative reports, the early Christians deliberately filtered them out” is true. If observation selection effect just means that we don’t know whether the reason we only have positive testimony is because no negative testimony exists or because it’s been suppressed, then the observation selection effect is present almost any time we only have positive testimony. (Yes, you’ve heard from all your friends that Bob is a nice guy, but you’ve never met him personally and maybe the people to whom he’s a jerk just haven’t talked to you.)
    - Mark says:
      
      August 15, 2014 at 5:00 pm
      
      Right, and I was arguing that this explanation is not the one that would have most likely come to mind.
      
      Then I don’t understand why you say that. The “Jesus was messiah after all but will be resurrected in the distant future” explanation does worse at explaining the disciples’ hypothetical religious experiences.
      
      I agree with you that it’s not impossible; in history we’re dealing with degrees of plausibility, and so pointing out that the hypothesis doesn’t fit well with our background expectations from the culture doesn’t disprove it; it’s just a point against it as we evaluate its overall plausibility.
      
      Agreed. But it’s important to recognize that it’s not a very strong point. That might be O.K. on its own, but I think the same can be said for most of your other points, and so they don’t add up to anything that can compensate for the massive prior improbability of the resurrection that the McGrews think they can overcome.
      
      Yes, the original ending of Mark was probably lost (the text we have appears to end in the middle of a sentence), and what we have currently was almost certainly added later. The original text does mention the empty tomb and the resurrection (the women are told by a “young man … in a white robe,” presumably an angel, that Jesus is risen), but not the post-resurrection appearances to the women. So Mark does attest to a resurrection, but admittedly not in the kind of detail as the other gospels.
      
      O.K., but we’re not discussing the disciples’ belief in the resurrection. We’re discussing the disciples’ belief in Jesus’ post-resurrection appearances.
      
      Here we disagree. First, all other indications (internal evidences, archaeological confirmations, agreement with other ancient sources [especially in Acts]) suggest that Luke was a very careful historian. Luke also explicitly says (Luke 1:2) that he consulted multiple eyewitnesses in writing his Gospel.
      
      All right, but that doesn’t mean that every part of his Gospel was attested by multiple witnesses, so you’re still relying on speculation. I agree with you that it’s plausible speculation, but still not strong enough that learning it’s wrong would cause us to overturn much of our knowledge, nor would it require us to believe that Luke was lying about anything or misled by a conspiracy.
      
      Also note that I’m not holding Luke to a very unreasonable or difficult-to-live-up-to standard. It’s really easy to just say, “I learned about this via independent confirmation of multiple disciples who were there. Their names were X, Y and Z.” I can understand why Luke might not have wanted to do this for literary purposes, or perhaps he actually did somewhere and the documentation was lost, but still.
      
      Second, Paul interacted with many of the disciples and others to whom Jesus reportedly appeared. Given the centrality of the resurrection to the early church in general and Paul’s theology in particular, it would be very surprising to find out that Paul “just took Peter’s word for it,” and didn’t talk with anyone else at the events.
      
      If I remember correctly, Paul didn’t even go to Jerusalem until years after his conversion. I don’t think getting a plurality of independent confirmation on the resurrection was at the top of his priority list. And no, it wouldn’t be very surprising if they didn’t discuss it, merely peculiar. The nature of Christ’s resurrection was the event of prime theological significance, not the nature of his supernatural appearances. Christ appearing to disciples after death is compatible with a wide variety of early Christological views (perhaps you could name one during the 30’s-60’s AD that it wasn’t compatible with?), including non-physical resurrection, so it needn’t have been an essential topic of discussion between Christians who already believed some form of resurrection occurred anyway.
      
      If observation selection effect just means that we don’t know whether the reason we only have positive testimony is because no negative testimony exists or because it’s been suppressed, then the observation selection effect is present almost any time we only have positive testimony. (Yes, you’ve heard from all your friends that Bob is a nice guy, but you’ve never met him personally and maybe the people to whom he’s a jerk just haven’t talked to you.)
      
      Not so. It’s only present if we have reason to suspect that negative testimony about Bob’s personality is less likely to be passed on than positive testimony – i.e., that you’re less likely to talk to people to whom he’s a jerk. In some cases, this will be true. In others, it won’t. More relevantly, though, if one of your friends (say, Jill) says that all your other friends agree that Bob is nice, there’s an OSE if you wouldn’t expect your other friends to pipe up if Jill is mistaken, or if you expect their voice messages telling you that Jill is mistaken to be accidentally deleted or something. In that case, what first appears like extremely strong independent confirmation of Bob’s character really isn’t. Because if Jill was exaggerating about how others feel (something that people are sometimes known to do), you wouldn’t expect to know either way.
      
      Under the not-completely-crazy hypothesis that I’ve mentioned – everyone has powerful non-hallucinatory experiences, maybe one disciple starts embellishing stories of visions, maybe one or two decides to collude but most just let it slide because it’s helping the cause more than it’s hurting, Luke and Paul don’t cross-check all their facts perfectly because they don’t feel they need to (and are personally motivated to believe in evidence for Christ’s resurrection) – we simply wouldn’t expect with a significant degree of probability to hear murmurings from the other disciples that their experiences have been grossly exaggerated.
      
      You may still think my little story is initially improbable and leaves a few ends loose. That’s fine, but it beggars belief to imagine that it has a prior or Bayes factor of 10^-46 or something completely insane like that.
    - Troy says:
      
      August 15, 2014 at 7:11 pm
      
      All right, but that doesn’t mean that every part of his Gospel was attested by multiple witnesses, so you’re still relying on speculation.
      
      I’m “speculating” in the sense that my claims do not follow with certainty from my premises. But such Cartesian standards for non-speculation cannot be met anywhere in life, including the physical sciences. We are left with probabilities. The probability that Luke was careless in reporting the resurrection or took the whole story on one person’s say so, I claim, is extremely low conditional on the other facts about Luke’s scholarship I mentioned (that Luke was in general a careful historian, that he states that he spoke to multiple eyewitnesses, etc.). The resurrection was the central religious belief of the early Church. Jesus’ passion, death, and resurrection is one of the only events found in all four Gospels (with the feeding of the five thousand and John’s baptism). It would be extremely surprising for the New Testament authors – assuming that they were competent, sincere historians – to have been careless in their recording of such an important event.
      
      Also note that I’m not holding Luke to a very unreasonable or difficult-to-live-up-to standard. It’s really easy to just say, “I learned about this via independent confirmation of multiple disciples who were there. Their names were X, Y and Z.” I can understand why Luke might not have wanted to do this for literary purposes, or perhaps he actually did somewhere and the documentation was lost, but still.
      
      This is a textbook argument from silence – “if X were the case, Y would have said so” – and it is not a reasonable one. Ancient authors did not cite their sources. The kind of citation practices used today were not part of the cultural milieu, and would at any rate not have been practical for authors handwriting on scrolls with space constraints. Luke’s telling us that he consulted multiple sources in general is itself more than we get from most ancient historians.
      
      If I remember correctly, Paul didn’t even go to Jerusalem until years after his conversion.
      
      This is incorrect. According to Acts 9, after his conversion in Damascus Paul spent “many days” preaching there, and then went straight to Jerusalem to join the disciples there.
      
      I don’t think getting a plurality of independent confirmation on the resurrection was at the top of his priority list.
      
      Paul has just had a dramatic conversion experience to a religion he has been mercilessly persecuting. I would think that learning more about this religion and talking to the people who knew Jesus about his life, death, and resurrection would be one of his utmost priorities.
      
      The nature of Christ’s resurrection was the event of prime theological significance, not the nature of his supernatural appearances. Christ appearing to disciples after death is compatible with a wide variety of early Christological views (perhaps you could name one during the 30’s-60’s AD that it wasn’t compatible with?), including non-physical resurrection, so it needn’t have been an essential topic of discussion between Christians who already believed some form of resurrection occurred anyway.
      
      I think this is splitting hairs. The nature of the appearances cannot be separated from the nature of the resurrection. Take 1 Corinthians 15, which begins by listing those to whom Christ appeared and then continues,
      
      12 Now if Christ is proclaimed as raised from the dead, how can some of you say that there is no resurrection of the dead? 13 But if there is no resurrection of the dead, then not even Christ has been raised. 14 And if Christ has not been raised, then our preaching is in vain and your faith is in vain. 15 We are even found to be misrepresenting God, because we testified about God that he raised Christ, whom he did not raise if it is true that the dead are not raised. 16 For if the dead are not raised, not even Christ has been raised. 17 And if Christ has not been raised, your faith is futile and you are still in your sins. 18 Then those also who have fallen asleep in Christ have perished. 19 If in Christ we have hope[b] in this life only, we are of all people most to be pitied.
      
      Paul clearly sees the (physical) appearances as important for establishing the (physical) resurrection (cf. Acts 1:3 for the same idea), saying of both the resurrection and the appearances that they are “of first importance” (v. 3). It’s clear that Paul is speaking of a physical resurrection of Jesus in v. 12ff because the general resurrection of the dead to which he refers is a physical one. Arguing that the general physical resurrection is false if Christ was not spiritually resurrected would make no sense.
      
      Verse 15 also clearly shows that Paul would not have seen “exaggeration” for the sake of his holy cause permissible, a sentiment I expect most of the disciples would have shared. On that note, here’s a passage in 2 Peter 1 that suggests that the author of that epistle clearly believed in a physical resurrection, and considered it central to Christianity:
      
      16 For we did not follow cleverly devised myths when we made known to you the power and coming of our Lord Jesus Christ, but we were eyewitnesses of his majesty. 17 For when he received honor and glory from God the Father, and the voice was borne to him by the Majestic Glory, “This is my beloved Son,[a] with whom I am well pleased,” 18 we ourselves heard this very voice borne from heaven, for we were with him on the holy mountain.
      
      [An observation selection effect is] only present if we have reason to suspect that negative testimony about Bob’s personality is less likely to be passed on than positive testimony
      
      I think the force of the “filter” in your original example is that it connoted deliberate suppression: “if [negative reports] do exist, they’ve been deliberately filtered out.” Deliberate suppression is much more likely given that the positive reporters are lying; hence it is evidence for lying and so evidence against Bob’s being psychic (or whatever). In other words, the evidential force of the filtering comes from the way in which its stipulated to have been done (if it was necessary) — intentionally.
      
      If we have an observation selection effect in your weaker sense here that doesn’t work via deliberate filtering, then I think it depends on the case whether it’s important. How much more likely positive testimony is to get passed on than negative testimony also clearly matters. I’ve argued that the answer in the case at hand is “not much,” because the New Testament authors weren’t reticent to mention and respond to those who disagreed with them.
      
      You may still think my little story is initially improbable and leaves a few ends loose.
      
      I think this is very much an understatement. History is about the details; you have to look at all the available evidence to see how your hypothesis stacks up. Your hypothesis doesn’t explain two of the McGrews’ three salient facts: the empty tomb and Paul’s conversion. It also renders mysterious Matthew 28:11-15, which says that the chief priests spread the story that Jesus’s disciples stole his body while the guards were asleep. Matthew would hardly make this accusation up if the chief priests had not made this charge, and why would they make this charge if they had the body?
    - Mark says:
      
      August 16, 2014 at 12:48 am
      
      The probability that Luke was careless in reporting the resurrection or took the whole story on one person’s say so, I claim, is extremely low conditional on the other facts about Luke’s scholarship I mentioned (that Luke was in general a careful historian, that he states that he spoke to multiple eyewitnesses, etc.). The resurrection was the central religious belief of the early Church. Jesus’ passion, death, and resurrection is one of the only events found in all four Gospels (with the feeding of the five thousand and John’s baptism). It would be extremely surprising for the New Testament authors – assuming that they were competent, sincere historians – to have been careless in their recording of such an important event.
      
      I actually see no reason to assume that historians, especially ancient ones, are more rather than less careful about trusting stories that are (already) absolutely fundamental to their personal religious identities; asking for further evidence after you’ve acquired a little is a potential invitation to cognitive dissonance, since you might not hear what you desperately want to hear. Maybe Luke was better than that, but, well, I honestly don’t know. It sure would’ve helped if he’d at least said how many sources he used for this particular part of his narrative. Absent that, I just can’t bring myself to feel as extremely surprised as you think I should. Definitely not, like, P = 10^-10 surprised.
      
      This is a textbook argument from silence – “if X were the case, Y would have said so” – and it is not a reasonable one. Ancient authors did not cite their sources. The kind of citation practices used today were not part of the cultural milieu, and would at any rate not have been practical for authors handwriting on scrolls with space constraints. Luke’s telling us that he consulted multiple sources in general is itself more than we get from most ancient historians.
      
      First, I didn’t make an argument from silence. If you reread my comment, you’ll see I never stated that Luke would have claimed X had Y happened. In fact, I made concessions that perhaps Luke had valid reasons for not wanting to enumerate his sources even if he did have a multiplicity of them, and then went further to acknowledge the possibility that perhaps he did enumerate them somewhere lost to us.
      
      Second, the fact that most ancient history fails to meet this standard doesn’t help you. For Bayesians, it’s only the absolute strength of evidence that matters, not the relative strength, and what counts as a reasonable standard of historical evidence applies to all periods under study modulo selection effects. (We don’t give poorly-sourced miracle stories coming from Afghan jihadists in the 80’s a pass just because all Afghan jihadist miracle stories from the 80’s are poorly sourced.) If you keep in mind the resurrection hypothesis’ truly minuscule prior allowed by the McGrews, you needs to do much better than this to rule out alternatives.
      
      This is incorrect. According to Acts 9, after his conversion in Damascus Paul spent “many days” preaching there, and then went straight to Jerusalem to join the disciples there.
      
      I would say this is significantly less like given what Paul says in Galatians 1.
      
      Paul clearly sees the (physical) appearances as important for establishing the (physical) resurrection (cf. Acts 1:3 for the same idea), saying of both the resurrection and the appearances that they are “of first importance” (v. 3).
      
      You’re absolutely correct, and I badly overstated my point, so thank you. Paul did regard the appearances as important. However, it still doesn’t follow that we can be extraordinarily certain that it was important to Paul to get multiple eyewitness testimony. He was already sure that Christ was in the business of appearing to people, given his own experience outside Damascus; testimony of further appearances by one of Jesus’ original disciples could easily have been enough to seem definitive.
      
      Again, I don’t need this to be probable, just not crazy, which I don’t feel it is. You evidently feel otherwise, though. How might we be able to settle this? Would it be enough to find uncontroversial examples of early apostolic figures from other religions who exhibited imperfect fact-checking of central dogmas?
      
      Your hypothesis doesn’t explain two of the McGrews’ three salient facts: the empty tomb and Paul’s conversion.
      
      I have other things to say about that, but I feel this discussion is already too long to be in a comments section, so I’m afraid I’ll just concede this for now unless you want to continue via email or the like. Re: Matthew 28, I don’t particularly trust that this episode was historical.
    - Troy says:
      
      August 16, 2014 at 11:15 pm
      
      I would say this is significantly less like given what Paul says in Galatians 1.
      
      I wasn’t aware of that passage; thanks. Since Paul would presumably have known his own travels and would have no reason to lie here, it seems pretty clear from that that either Luke was mistaken about the timeframe or that my earlier reading of the passage as communicating that Paul went straight from Damascus to Jerusalem was mistaken. Without looking into the Greek it’s hard to say with any confidence.
      
      Again, I don’t need this to be probable, just not crazy, which I don’t feel it is. You evidently feel otherwise, though. How might we be able to settle this? Would it be enough to find uncontroversial examples of early apostolic figures from other religions who exhibited imperfect fact-checking of central dogmas?
      
      Evidence about similar cases is certainly relevant in doing history, but much depends on the details. I would want to qualify “early apostolic figures” and “central dogmas” considerably: what would be relevant would be figures writing histories intended to convey central historical facts to their religious communities, and the events in question would need to be publicly witnessed ones about which they could perform the appropriate kind of checking. Religionists who clearly converted for political reasons would be less relevant than those who converted because of sincere belief. Historians who were obviously otherwise sloppy would be less relevant than ones who were otherwise competent, but messed up on this point (presumably because of confirmation bias or some such).
      
      I should also say that much of the force of the McGrews’ argument rests on how many independent events skeptical hypotheses need to account for. Admitting carelessness in one otherwise careful writer might be okay; but when we need carelessness or equally improbable events leading to apparently positive testimony from, say, 5 authors, multiplying the initially somewhat low probabilities of the individual events together gives us an extremely low probability for their jointly occurring. Individual elements of a skeptical explanation might be not crazy, but their conjunction might be.
      
      I actually see no reason to assume that historians, especially ancient ones, are more rather than less careful about trusting stories that are (already) absolutely fundamental to their personal religious identities; asking for further evidence after you’ve acquired a little is a potential invitation to cognitive dissonance, since you might not hear what you desperately want to hear. Maybe Luke was better than that, but, well, I honestly don’t know.
      
      I think we do have good reason to think he was better than that, i.e., the other evidences of his general reliability I mentioned (see, for example, this list: http://truthbomb.blogspot.com/2012/01/84-confirmed-facts-in-last-16-chapters.html). At any rate, if you’re only going to trust historians in recording events they don’t already have some tendency to believe happened, then you’re going to throw out history, since presumably most historians will have reason to believe an event happened before they investigate it. If you restrict your skepticism to events that they want to have happened then if there’s some event that would make most everyone happy – and it seems plausible that anyone who believed the disciples’ reports of a resurrection would have seen it as Good News – then you could never get evidence for that event. So it seems like you’re ruling out (strong) testimonial evidence for Christianity on a priori grounds here.
      
      Second, the fact that most ancient history fails to meet this standard doesn’t help you. For Bayesians, it’s only the absolute strength of evidence that matters, not the relative strength, and what counts as a reasonable standard of historical evidence applies to all periods under study modulo selection effects.
      
      If you want to argue that Luke’s not citing his sources is strong evidence against Luke’s reliability, you need to claim that P(Luke does not cite his sources | Luke is reliable) << P(Luke does not cite his sources | Luke is not reliable). The practice of other ancient historians who we have good reason to think are reliable gives us data relevant to assessing the first probability. (Similarly, the inherent impracticality of constantly citing one’s sources before the age of the printing press lowers the probability.)
      
      It may be that you’re willing to be skeptical about ancient historians in general. It’s very hard to argue conclusively for claims about overall plausibilities in a way that all will find rationally compelling – at least, outside of the hard sciences where it’s easier to place the observed evidence into uncontroversial reference classes and run repeated experiments to get as much confirmation as we want. At this point further debate may well just devolve into our digging into our prior commitments.
      
      I’m starting teaching in a few days and so need to stop engaging in arguments on the Internet at any rate, alas. I do thank you for the civil debate and patiently argued points, both commodities far too rare in these discussions.
Moshe Zadka says:

August 11, 2014 at 11:33 am

[Slightly off-topic]
When i read the story about Euler, I just assumed the actual equation was lost in the telling, and that the equation was originally “e^{i\pi}+1=0” which I can at least see how, in Euler’s mind, is proof for God arranging math just so this beautiful equation falls out.

Now let’s try to imagine the story unfolding with this equation. Euler pompously quotes it, and assumes that anyone with eyes to see can understand that here lies an equation which is about as good as God signing his name on creation. Diderot sees an equation about as meaningful as “(a+b^n)/n = x”. What should Diderot have done?
- Bugmaster says:
  
  August 11, 2014 at 1:29 pm
  
  If it were me, I’d say, “all right, I don’t understand your equation. Can you explain it in terms that a layman like me could grasp ?” If Euler answers “no”, then that’s perfectly fair. But in that case, the best God that Euler can demonstrate is the kind of God who is accessible only to a handful of highly educated elites. Regular people cannot comprehend this God in any way, and thus, cannot pray to him, understand his commandments (if any), and generally engage with this deity in any way. This concept of God becomes pretty much irrelevant to anyone who is not a professional mathematician (or a theologian).
  
  By contrast, consider Feynman’s claim that there is no way to understand how magnets work without some grounding in physics and math. While this is true, we can very easily demonstrate the fact that magnets do, in fact, work. All we need are some magnets, and possibly some iron. Thus, while the explanation of how magnets work is inaccessible to laymen, the fact that they work is. This is not the case with Euler’s God.
  - Ialdabaoth says:
    
    August 11, 2014 at 2:27 pm
    
    I now want to edit that Feynman video and draw Juggalo makeup over his face, then repost it to youtube and buy out my local supply of popcorn.
    - pwyll says:
      
      August 12, 2014 at 10:23 am
      
      For the benefit of any readers confused by Ialdabaoth’s reference to Juggalos: http://knowyourmeme.com/memes/fucking-magnets-how-do-they-work
  - Douglas Knight says:
    
    August 11, 2014 at 3:41 pm
    
    It is common that arguments for God define God in ways that do not seem to match common usage, but that does not apply to Moshe’s argument. Certainly the argument does not appear to single out the Resurrection of Jesus. Maybe it is no more than the Deist God. But it’s not like he has defined God as Nature!
    - Moshe Zadka says:
      
      August 15, 2014 at 11:35 am
      
      To clarify — not *my* argument, but my (admittedly speculative) assumption of what Euler’s argument was. I don’t think it makes a good God argument, but I can at least appreciate why Euler did.
nostalgebraist says:

August 11, 2014 at 1:26 pm

I may say some more about this later — I’m in a hurry now — but I do want to mention this:

Saying that Glymour thinks “you should disbelieve in IQ” is a bit misleading. His paper is nominally about The Bell Curve, and its conclusion is that the causal claims made by the authors of that book were not justified in the book. He does mention that a great deal of social science research uses the same methodology, so if you buy his argument, you should be equally doubtful about all that research, too.

IQ research that doesn’t suffer from the problems mentioned by Glymour needn’t be discarded. He’s not making an all-out argument against IQ research in general. The reason that I personally brought up Glymour’s paper to you was that I thought of The Bell Curve in particular as a book that fairly clearly divides “pro-IQ” and “anti-IQ” social factions in online debates and even in academia. If your response is “okay, maybe The Bell Curve is shot, but there is still some good IQ research, such as the stuff about lead,” you’ve then staked out a position substantially different from the “standard” pro-IQ belief set, and I think you’ll find yourself disagreeing with most people who openly talk a lot about IQ online or in academia.

(I’m also skeptical, for other reasons which I mentioned in the tumblr conversation, about even relatively “Glymour-compliant” IQ research, but that’s a distinct issue.)
- Douglas Knight says:
  
  August 11, 2014 at 1:40 pm
  
  Could you identify some Glymour-compatible IQ research? Do you really mean to imply that the lead research is Glymour-compatible?
- Scott Alexander says:
  
  August 11, 2014 at 10:46 pm
  
  Possibly we have different opinions of what the Bell Curve says? (I’ve never read it).
  
  Wikipedia describes it as:
  
  “Its central argument is that human intelligence is substantially influenced by both inherited and environmental factors and is a better predictor of many personal dynamics, including financial income, job performance, chance of unwanted pregnancy, and involvement in crime than are an individual’s parental socioeconomic status, or education level.”
  
  To me all these statements seem on very firm ground even taking Glymour’s regression-correlation argument into account, which made me think he thought his factor arguments dealt the whole field a fatal blow somehow.
  - Ialdabaoth says:
    
    August 11, 2014 at 10:48 pm
    
    I have often suspected that the Left’s problem with books like the Bell Curve isn’t really like the Right’s problem with global warming, but is more like Quirrel’s problem with nuclear physics.
    
    I.e., “you discovered WHAT? And you gave this knowledge to WHO? DO YOU KNOW WHAT THEY ARE GOING TO DO WITH THAT, YOU IDIOTS!?”
    - AR+ says:
      
      August 11, 2014 at 11:52 pm
      
      Well, to quote Fakey Fakeson, “If you believe true things when it improves your life, what credit is that to you? Even hypocrites do that.”
    - Toby Bartels says:
      
      August 12, 2014 at 6:41 pm
      
      That’s a great insight. But I don’t think that the Right’s response to global warming is that different either.
  - nostalgebraist says:
    
    August 11, 2014 at 11:00 pm
    
    The claims of the form “IQ is a better predictor of [something] than [some other predictor]” are exactly what Glymour attacks in Section 4, which is the meat of his critique. (As someone mentioned above, the factor analysis stuff is kind of a fake-out: see the paragraph spanning pp. 10-11 in the paper.)
    
    Glymour’s points in Section 4 are about how the analyses used in The Bell Curve to support these claims depend on assuming that the causal graph looks a certain way. If you allow — not implausibly — for other unobserved variables to come into play, the same methods could give erroneous results.
    - Scott Alexander says:
      
      August 11, 2014 at 11:10 pm
      
      Then I must not have understood his critique very well.
      
      My understanding of Glymour was that we can’t say “IQ causes wealth” or something like that.
      
      But “IQ is a better predictor of wealth than education level” seems very simple and empirical. Just give some people education data and IQ data and see who can predict wealth better.
      
      Like, “given our current level of understanding on the subject, X is a better predictor than Y” seems like the sort of thing that couldn’t possibly be wrong except trivially, where somebody comes back and says “Actually, you forgot to carry the one, education level works better than IQ here”
    - nostalgebraist says:
      
      August 11, 2014 at 11:27 pm
      
      (I can’t figure out how to reply below a certain comment depth; this is supposed to be a reply to Scott’s reply, not to my own comment)
      
      Oh, I think Wikipedia is being unclear. I haven’t read the book either, but I’m pretty sure Herrnstein and Murray explicitly say they’re trying to get at causation. An Amazon look-inside turns up a passage in which they call regression analysis “the basic technique for discussing causation in nonexperimental situations,” and Glymour quotes a passage in which they say “regression analysis tells us how much each cause [of several proposed causes] affects the result.” More broadly, both fans and critics of the book talk about it as though it is about causality, and use this when judging political implications.
      
      Wikipedia should say something like “is a stronger influence on” rather than “is a better predictor of.”
    - Nancy Lebovitz says:
      
      August 12, 2014 at 10:37 am
      
      No one can reply beyond a certain comment depth.
      
      The current set-up is a mixed system– threaded (to that comment depth), and chronological single thread below that depth.
Liskantope says:

August 11, 2014 at 4:39 pm

As a math student who is starting his sixth year of a PhD program, I feel like I should address this post from the point of view of someone who has been doing research-level theoretical math.

I don’t think having a math degree helps much to avoid getting Eulered. Certainly having a firm grasp on basic college-level math gives one a major edge in thinking critically about statistical arguments. (I imagine that lot of even that type of critical thinking ability, the aspects of the arguments that don’t consist of pure number crunching, can be honed from studying areas of the humanities, though.) Once you approach graduate-level math, distinct areas of math start to diverge, and at the research level are far enough apart that researchers in one field generally can’t hope to understand what those in other fields are doing. My general area is in number theory, but at even the number theory seminar I go to, fairly often there are professors who can’t really follow the statements of the results being presented by the speaker, let alone the ideas behind the proofs. Not to mention, statistics isn’t itself considered an area of math (although of course there’s a lot of mathematical theory behind it), and at most university, they are separate departments. I would say that have a hazy view of even fairly basic statistics.

In the excerpt from Glymour, for instance, I understand some terms such as “directed graph” (while on the other hand, I have no clue what “factor loadings” are), know a little about what it means to study the topology of a graph, and understand the basic concept behind sets of measure zero (although I’ve forgotten a lot of details, since I learned it a long time ago and it isn’t used much in my field). But the paragraph as a whole might as well be a jumble of some familiar and a few unfamiliar-sounding terms thrown together randomly as far as I’m concerned. And I imagine that’s probably the case for most of those who don’t study the particular type of object that Glymour studies.

The problem is that — except perhaps for some people who do seem to be much savvier than I am when it comes to understanding mathematical and statistical tools used in a wide variety of areas — we are all in danger of being Eulered when hearing from an expert in an area different from our own. It’s something that worries me whenever I’m tempted to accept some scholarly argument because the writer is an expert in their field and seems to know what they’re talking about. And meanwhile, my own area of research is too abstract to be relevant to scholarly arguments concerning any issue outside of pure math.
- nostalgebraist says:
  
  August 11, 2014 at 4:51 pm
  
  But the paragraph as a whole might as well be a jumble of some familiar and a few unfamiliar-sounding terms thrown together randomly as far as I’m concerned. And I imagine that’s probably the case for most of those who don’t study the particular type of object that Glymour studies.
  
  My field of specialization is pretty far from Glymour’s, but I know enough math to understand the paper. The paragraph Scott quoted may be misleading as to the rigor of the overall paper. For instance, when he talks about sets of measure zero, he’s not making some sophisticated point involving measure theory — he’s just talking about how if you have the following two options
  
  1) a parameterized class of theories that robustly produces certain patterns no matter what the parameters are, just because of the structure of causation described by the theory
  2) another class of theories which can produce the same pattern, but only if the parameters are set exactly right, e.g. to produce the pattern perfectly some parameter would have to be 3 and not 3 + epsilon
  
  then you can typically feel safe about choosing (1) over (2), especially if the pattern gets more and more exact the more data you get. This is an intuitive methodological point, not a rigorous math point.
  
  If you’re interested in the subject matter, I recommend trying to read the paper from the beginning. The quoted paragraph may be crystal clear to you by the time you reach it in context.
  - Liskantope says:
    
    August 12, 2014 at 11:09 am
    
    Good point. The problem may be that I was trying to interpret it out of context.
Will says:

August 11, 2014 at 4:58 pm

I think its really, really important to keep in mind the order of magnitude of claims being made. The best estimates of the correlation between IQ and criminality are between -0.17 and -0.20. That is a pretty weak correlation, explaining less than 4% of the variance.

Even the IQ/gpa link is pretty weak, SAT and college GPA correlations are like 0.2-0.3, stronger than criminality, but still only moderate at best.

The (hypothesized) effect of lead on criminal behavior is much larger than this- lead must effect criminal behavior through a different channel.
- Douglas Knight says:
  
  August 11, 2014 at 5:11 pm
  
  Actually, the original paper on lead and crime argued that the effect went entirely through IQ.
  - Will says:
    
    August 11, 2014 at 5:27 pm
    
    But research since then has suggested the effect is larger than be explained through the IQ channel. I see things like impulse control bandied about.
    
    Again, the IQ/criminality connection explains something like 4% of the variance, its quite weak.
    - Douglas Knight says:
      
      August 11, 2014 at 7:09 pm
      
      No, I don’t think that later research has claimed larger effects. The original paper claimed that the 50% drop in crime from 1975 to 1995 was of the right order of magnitude to be due entirely to IQ. Could you point to a later paper that estimates a larger effect than that?
buckwheatloaf says:

August 11, 2014 at 5:25 pm

i heard of diderot from rousseau. he writes about him a lot in his memoir. they started off as friends. at one point they’re the closest friends ever. diderot goes to jail for some months and rousseau is really stricken by this and the recurring thought of his friend being in jail takes a toll on him. and on the first day when he gets out of jail rousseau goes to see him and they run up to each other hug and cry and and you just see how much he means to rousseau. but the whole time he writes about diderot he always makes it ominous saying how it wasn’t to last, and yeah their friendship doesn’t. it goes really bad and it’s so upsetting to rousseau. they’re not even living near each other at that point but they’re communicating by post. but diderot writes some not very nice things in his letters to him and rousseau is really dismayed by how his friend can act this way towards him and use these stupid persuasive arguments on him and think he will fall for any of it. worst of all is he has to wait WEEKS to get a reply because that’s how slow correspondence went back in those days. its not like they could just talk it out over some emails or over the phone. so it just sends him into a really bad place all the time he is waiting for his replies. what happened was that his friends got jealous and confused by his behavior and they want him to be normal like them, but he wants to live in the country by himself with the girl he likes. but they won’t have it and keep scheming and entreating him to get him to come back and join them in paris. but he doesn’t want to. he hates paris by then and everything it stands for! he said it’s not that he got more famous than them that made them turn against him, or that he succeeded in the very things his friends were trying to succeed in (which he said would’ve been okay) but that they couldn’t stand that he was different. that he got success by means they could never match (by this one opera that got really praised) and that he acted too self assured and like a loner who didn’t need them anymore when they had been sources of so much encouragement to him. except that wasn’t exactly it, he still liked them but he had always wanted to live in a really simple manner if that was possible, and his success and the ideas he developed only furthered this desire of his. that’s what he said made them lose their former warmth for him. but these were some of his most enduring and cloest friends. i really felt for him. and that diderot guy seemed to be lacking some compassion and sense. a lot of these guys were just ridiculously smart but they treated each other bad. they were great companions for each other at one time but then later worst enemies 🙁
- Douglas Knight says:
  
  August 11, 2014 at 7:17 pm
  
  Don’t just read one source and trust that its author was a nice guy mistreated by everyone else.
  - Anonymous says:
    
    August 14, 2014 at 3:21 pm
    
    i didn’t. i read four books. all of them his. but that’s four sources isn’t it. they’re all him but its four times himself and one times four is four. how am i doing. i think im doing good. and from them i concluded he is one if the best guys that ever lived that he was among the most kind and noble hearted people to ever exist. then i closed his books and moved on to this other guy and his books, and thought the same exact things about him from reading what he wrote. I CANT HELP IT IF I LOVE SOME PEOPLE A LOT OKAY.
  - buckwheatloaf says:
    
    August 14, 2014 at 3:22 pm
    
    i didn’t. i read four books. all of them his. but that’s four sources isn’t it. they’re all him but its four times himself and one times four is four. how am i doing. i think im doing good. and from them i concluded he is one of the best guys that ever lived that he was among the most kind and noble hearted people to ever exist. then i closed his books and moved on to this other guy and his books, and thought the same exact things about him from reading what he wrote. I CANT HELP IT IF I LOVE SOME PEOPLE A LOT OKAY.
    - Multiheaded says:
      
      August 14, 2014 at 4:29 pm
      
      looks at Douglas… looks at buckwheatloaf
      
      Now there’s a culture clash!
Leon says:

August 11, 2014 at 7:41 pm

The problem here only occurs when sophisticated math is used to attack nonmathematical ideas, like the existence of God, or lead causing increases in crime. And presumably these ideas should be complicated and diverse enough that hopefully no one mathematical argument knocks down the entire edifice. True things should usually reveal their truth through multiple different arguments, and it would be very odd if math could demolish all of them at the same time.

Maybe you have too high a view of maths. The more complicated and diverse a phenomenon is, the less likely a mathematical argument will provide real insight about it. (Even in statistical mechanics and statistics, the insight comes from simplicity-when-complexity-is-judiciously-ignored. Not every system has this property.)

Mathematicists could (and perhaps should more often) have the opposite worry: their views are based on rigorous logical arguments, but also on a paucity of raw material. Someone with experience in philosophy, history, social science, etc. (or even experimental science) comes along and demolishes their views with a few “soft” facts. Possible case in point: your libertarianism FAQ.

I agree that some level of maths is helpful but I don’t think your insurance example demands much of it. Just start with an extreme case — something that obviously should not be covered — and slowly tweak the parameters.

Euler-ers have often Euler-ed themselves. Possible cases in point: Robert Aumann/the McGrews/the “Definability of Truth in Probabilistic Logic” MIRI paper.
- Anonymous says:
  
  August 12, 2014 at 2:59 am
  
  Could you elaborate on the last paragraph, please?
Alexander Stanislaw says:

August 12, 2014 at 8:28 am

Was any of this inspired by Chris Hallquist’s “statistician’s fallacy”?

I also follow the general policy of – don’t try to decipher the details of the argument, but figure out what it is trying to say and why it is relevant. For example in Shalizi’s case he arguing against a casual g factor. But even if his argument succeeds g could still have predictive validity. His argument is orthogonal to whether g is a useful thing to measure.
Douglas Knight says:

August 12, 2014 at 1:47 pm

I don’t think that your response to the McGrews is a good example. Sure, you didn’t get dumbfounded, but basically you rounded off their mathematical model to a qualitative argument and addressed that. In particular, the strength of their claim (10^-39) comes from independence, which you seem to ignore. I think a much better response is “Mormonism: The Control Group For Christianity,” partly because it explicitly addresses independence (“bottleneck”), but mainly because it’s a general purpose response: it doesn’t require understanding the math of any particular argument.

Of course, finding a qualitative approximation to a quantitative model is very important, both to see if there is a real argument, and for understanding it. But if the math is to be useful, it should contribute something beyond the qualitative argument.
David Colquhoun says:

August 13, 2014 at 2:44 pm

I’m right with you on the IQ question. It’s an area in which people with fair maths have been fooling innumerate psychologists and politicians ever since the 1930s.

In pedantic mode I must disagree when you refer to ” the modern truism that “correlation does not imply causation” “. There’s nothing modern about that. It was aready well-understood in 1756 Dr Johnson said “It is incident to physicians, I am afraid, beyond all other men, to mistake subsequence for consequence, to use the fallacious inference post hoc, ergo propter hoc“. The problem is that, if it were taken seriously, it would seriously impair the publication rate of epidemiologists.
Steve says:

August 15, 2014 at 1:06 pm

My favorite fictional example of Eulering is in DC Comics: the Anti-Life Equation, a mathematical proof that life is morally worthless. The bad guys study the equation and gain power; the good guys…just kinda avoid thinking about the implications.
Pingback: Self Tracking Week 1 | Hopefully This Helps

Blogroll

Economics

Effective Altruism

Rationality

Science

SSC Elsewhere

Archives

Getting Eulered

124 Responses to Getting Eulered

Meta