That Chocolate Study

Posted on May 30, 2015 by Scott Alexander

Several of you asked me to write about that chocolate article that went viral recently. From I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here’s How:

“Slim by Chocolate!” the headlines blared. A team of German researchers had found that people on a low-carb diet lost weight 10 percent faster if they ate a chocolate bar every day. It made the front page of Bild, Europe’s largest daily newspaper, just beneath their update about the Germanwings crash. From there, it ricocheted around the internet and beyond, making news in more than 20 countries and half a dozen languages. It was discussed on television news shows. It appeared in glossy print, most recently in the June issue of Shape magazine (“Why You Must Eat Chocolate Daily,” page 128). Not only does chocolate accelerate weight loss, the study found, but it leads to healthier cholesterol levels and overall increased well-being. The Bild story quotes the study’s lead author, Johannes Bohannon, Ph.D., research director of the Institute of Diet and Health: “The best part is you can buy chocolate everywhere.”

I am Johannes Bohannon, Ph.D. Well, actually my name is John, and I’m a journalist. I do have a Ph.D., but it’s in the molecular biology of bacteria, not humans. The Institute of Diet and Health? That’s nothing more than a website.

Other than those fibs, the study was 100 percent authentic. My colleagues and I recruited actual human subjects in Germany. We ran an actual clinical trial, with subjects randomly assigned to different diet regimes. And the statistically significant benefits of chocolate that we reported are based on the actual data. It was, in fact, a fairly typical study for the field of diet research. Which is to say: It was terrible science. The results are meaningless, and the health claims that the media blasted out to millions of people around the world are utterly unfounded.

Bohannon goes on to explain that as part of a documentary about “the junk-science diet industry”, he and some collaborators designed a fake study to see if they could convince journalists. They chose to make it about chocolate:

Gunter Frank, a general practitioner in on the prank, ran the clinical trial. Onneken had pulled him in after reading a popular book Frank wrote railing against dietary pseudoscience. Testing bitter chocolate as a dietary supplement was his idea. When I asked him why, Frank said it was a favorite of the “whole food” fanatics. “Bitter chocolate tastes bad, therefore it must be good for you,” he said. “It’s like a religion.”

They recruited 16 (!) participants and divided them into three groups. One group ate their normal diet. Another ate a low-carb diet. And a third ate a low-carb diet plus some chocolate. Both the low-carb group and the low-carb + chocolate group lost weight compared to the control group, but the low-carb + chocolate group lost weight “ten percent faster”, and the difference was “statistically significant”. They also had “better cholesterol readings” and “higher scores on the well-being survey”.

Bohannon admits exactly how he managed this seemingly impressive result – he measured eighteen different parameters (weight, cholesterol, sodium, protein, etc) which virtually guarantees that one will be statistically significant. That one turned out to be weight loss. If it had been sodium, he would have published the study as “Chocolate Lowers Sodium Levels”.

Then he pitched it to various fake for-profit journals until one of them bit. Then he put out a PR release to various media outlets, and they ate it up. They ended up in a bunch of English and German language media including Bild, the Daily Star, Times of India, Cosmopolitan, Irish Examiner, and the Huffington Post.

The people I’ve seen discussing this seem to have drawn five conclusions, four of which are wrong:

Conclusion 1: Haha, I can’t believe people were so gullible that they actually thought chocolate caused weight loss!

Bohannon himself endorses this one, saying bitter chocolate was a favorite of “whole food fanatics” because “Bitter chocolate tastes bad, therefore it must be good for you” and “it’s like a religion.

But actually, there’s lots of previous research supporting health benefits from bitter chocolate, none of which Bohannon seems to be aware of.

A meta-analysis of 42 randomized controlled trials totaling 1297 participants in the American Journal of Clinical Nutrition found that chocolate improved blood pressure, flow-mediated dilatation (a measure of vascular health), and insulin resistance (related to weight gain).

A different meta-analysis of 24 randomized controlled trials totalling 1106 people in the Journal of Nutrition also found that chocolate improved blood pressure, flow-mediated dilatation, and insulin resistance.

A Cochrane Review of 20 randomized controlled trials of 856 people found that chocolate improved blood pressure (it didn’t test for flow-mediated dilatation or insulin resistance)

A study on mice found that mice fed more chocolate flavanols were less likely to gain weight.

An epidemiological study of 1018 people in the United States found an association between frequent chocolate consumption and lower BMI, p < 0.01. A second epidemiological study of 1458 people in Europe found the same thing, again p < 0.01. A cohort study of 470 elderly men found chocolate intake was inversely associated with blood pressure and cardiovascular mortality, p less than 0.001, not confounded by the usual suspects.

I wouldn’t find any of these studies alone very convincing. But together, they compensate for each other’s flaws and build a pretty robust structure. So the next flawed conclusion is:

Conclusion 2: This proves that nutrition isn’t a real science and we should all just be in a state of radical skepticism about these things

What we would like to do is a perfect study where we get thousands of people, randomize them to eat-lots-of-chocolate or eat-little-chocolate at birth, then follow their weights over their entire lives. That way we could have a large sample size, perfect randomization, life-long followup, and clear applicability to other people. But for practical and ethical reasons, we can’t do that. So we do a bunch of smaller studies that each capture a few of the features of the perfect study.

First we do animal studies, which can have large sample sizes, perfect randomization, and life-long followup, but it’s not clear whether it applies to humans.

Then we do short randomized controlled trials, which can have large sample sizes, perfect randomization, and human applicability, but which only last a couple of months.

Then we do epidemiological studies, which can have large sample sizes, human applicability, and last for many decades, but which aren’t randomized very well and might be subject to confounders.

This is what happened in the chocolate studies above. Mice fed a strict diet plus chocolate for a long time gain less weight than mice fed the strict diet alone. This is suggestive, but we don’t know if it applies to humans. So we find that in randomized controlled trials, chocolate helps with some proxies for weight gain like insulin resistance. This is even more suggestive, but we don’t know if it lasts. So we find that in epidemiological studies, lifetime chocolate consumption is associated with lifetime good health outcomes. This on its own is suggestive but potentially confounded, but when we combine them with all of the others, they become more convincing.

(am I cheating by combining blood pressure and BMI data? Sort of, but the two measures are correlated)

When all of these paint the same picture, then we start thinking that maybe it’s because our hypothesis is true. Yes, maybe the mouse studies could be related to a feature of mice that doesn’t generalize to humans, and the randomized controlled trial results wouldn’t hold up after a couple of years, and the epidemiological studies are confounded. But that would be extraordinarily bad luck. More likely they’re all getting the same result because they’re all tapping into the same underlying reality.

This is the way science usually works, it’s the way nutrition science usually works, and it’s the way the science of whether chocolate causes weight gain usually works. These are not horrible corrupt disciplines made up entirely of shrieking weight-loss-pill peddlers trying to hawk their wares. They only turn into that when the media takes a single terrible study totally out of context and misrepresents the field.

Conclusion 3: Studies Always Need To Have High Sample Sizes

Here’s another good chocolate-related study: Short-term administration of dark chocolate is followed by a significant increase in insulin sensitivity and a decrease in blood pressure in healthy persons.

Bohannon says:

Our study was doomed by the tiny number of subjects, which amplifies the effects of uncontrolled factors…Which is why you need to use a large number of people, and balance age and gender across treatment group

But I say “Short-term administration…” is a good study despite having an n = 15, one less than the Bohannon study. Why? Well, their procedure was pretty involved, and you wouldn’t be able to get a thousand people to go through the whole rigamarole. On the other hand, their insulin resistance measure thing was nearly twice as high in the dark chocolate group as the white chocolate group, and p < 0.001. (Another low sample size study that was nevertheless very good: psychiatrists knew that consuming dietary tyramine when taking a MAOI antidepressant can cause a life-threatening hypertensive crisis, but they didn't know how much tyramine it took. In order to find out, they took a dozen people, put them on MAOIs, and then gradually fed them more and more tyramine with doctors standing by to treat the crisis as soon as it started. They found about how much tyramine it took and declared the experiment a success. If the tyramine levels were about the same in all twelve patients, then adding a thousand more patients wouldn’t help much, and it would definitely increase the risk.)

Sample size is important when you’re trying to detect a small effect in the middle of a large amount of natural variation. When you’re looking for a large effect in the middle of no natural variation, sample size doesn’t matter as much. For example, if there was a medicine that would help amputees grow their hands back, I would accept success with a single patient (if it worked) as proof of effectiveness (I suppose I couldn’t be sure it would always work until more patients had been tried, but a single patient would certainly pique my interest). You’re not going after sample size so much as after p-value.

Conclusion 4: P-Values Are Stupid And We Need To Get Rid Of Them

Bohannon says that:

If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result…the letter p seems to have totemic power, but it’s just a way to gauge the signal-to-noise ratio in the data…scientists are getting wise to these problems. Some journals are trying to phase out p value significance testing altogether to nudge scientists into better habits.

Okay, take the “Short-term administration” study above. I would like to be able to say that since it has p < 0.001, we know it's significant. But suppose we're not allowed to do p-values. All I do is tell you "Yeah, there was a study with fifteen people that found chocolate helped with insulin resistance" and you laugh in my face. Effect size is supposed to help with that. But suppose I tell you "There was a study with fifteen people that found chocolate helped with insulin resistance. The effect size was 0.6." I don't have any intuition at all for whether or not that's consistent with random noise. Do you? Okay, then they say we’re supposed to report confidence intervals. The effect size was 0.6, with 95% confidence interval of [0.2, 1.0]. Okay. So I check the lower bound of the confidence interval, I see it’s different from zero. But now I’m not transcending the p-value. I’m just using the p-value by doing a sort of kludgy calculation of it myself – “95% confidence interval does not include zero” is the same as “p value is less than 0.05”.

(Imagine that, although I know the 95% confidence interval doesn’t include zero, I start wondering if the 99% confidence interval does. If only there were some statistic that would give me this information!)

But wouldn’t getting rid of p-values prevent “p-hacking”? Maybe, but it would just give way to “d-hacking”. You don’t think you could test for twenty different metabolic parameters and only report the one with the highest effect size? The only difference would be that p-hacking is completely transparent – if you do twenty tests and report a p of 0.05, I know you’re an idiot – but d-hacking would be inscrutable. If you do twenty tests and report that one of them got a d = 0.6, is that impressive? No better than chance? I have no idea. I bet there’s some calculation I could do to find out, but I also bet that it would be a lot harder than just multiplying the value by the number of tests and seeing what happens. [EDIT: On reflection not sure this is true; the possibility of p-hacking is inherent to p-values, but the possibility of d-hacking isn’t inherent to effect size. I don’t actually know how much this would matter in the real world.]

But wouldn’t switching from p-values to effect sizes prevent people from making a big deal about tiny effects that are nevertheless statistically significant? Yes, but sometimes we want to make a big deal about tiny effects that are nevertheless statistically significant! Suppose that Coca-Cola is testing a new product additive, and finds in large epidemiological studies that it causes one extra death per hundred thousand people per year. That’s an effect size of approximately zero, but it might still be statistically significant. And since about a billion people worldwide drink Coke each year, that’s a ten thousand deaths. If Coke said “Nope, effect size too small, not worth thinking about”, they would kill almost two milli-Hitlers worth of people.

Yeah, sure, you can never use p-values again, and run into all of these other problems. Or you can do a Bonferroni correction, which is a very simple adjustment to p-values which corrects for p-hacking. Or instead of taking one study at face value LIKE AN IDIOT you can wait to see if other studies replicate the findings. Remember, the whole point of p-hacking is choosing at random form a bunch of different outcomes, so if two trials both try to p-hack, they’ll end up with different outcomes and the game will be up. Seriously, STOP TRYING TO BASE CONCLUSIONS ON ONE STUDY.

Conclusion 5: Trust Science Journalism Less

This is the one that’s correct.

But it’s not totally correct. Bohannon boasts of getting his findings in a couple of daily newspapers and the Huffington Post. That’s not exactly the cream of the crop. The Economist usually has excellent science journalism. Magazines like Scientific American and Discover can be okay, although even they get hyped. Reddit’s r/science is good, assuming you make sure to always check the comments. And there are individual blogs like Mind the Brain run by researchers in the field that can usually be trusted near-absolutely. Cochrane Collaboration will always have among the best analyses on everything.

If you really want to know what’s going on and can’t be bothered to ferret out all of the brilliant specialists, my highest recommendation goes to Wikipedia. It isn’t perfect, but compared to anything you’d find on a major news site, it’s like night and day. Wikipedia’s Health Effects Of Chocolate page is pretty impressive and backs everything it says up with good meta-analyses and studies in the best journals. Its sentence on the cardiovasuclar effects links to this letter, which is very good.

Do you know why you can trust Wikipedia better than news sites? Because Wikipedia doesn’t obsess over the single most recent study. Are you starting to notice a theme?

For me, the takeaway from this affair is that there is no one-size-fits-all solution to make statistics impossible to hack. Getting rid of p-values is appropriate sometimes, but not other times. Demanding large sample sizes is appropriate sometimes, but not other times. Not trusting silly conclusions like “chocolate causes weight loss” works sometimes but not other times. At the end of the day, you have to actually know what you’re doing. Also, try to read more than one study.

This entry was posted in Uncategorized and tagged medicine, statistics. Bookmark the permalink.

185 Responses to That Chocolate Study

Reverse order

Pingback: That Chocolate Study | Beyond Labels
Pingback: Chocolate and Disillusionment | closetpuritan
Pingback: A week of links | EVOLVING ECONOMICS
William O. B'Livion says:

June 2, 2015 at 12:19 am

“Bitter chocolate tastes bad, therefore it must be good for you” and “it’s like a religion.

Nonsense.

I like my Chocolate like I like my women–bitter, cold and with alcohol.
RCF says:

June 1, 2015 at 11:10 pm

So, how does having 17 other studies affect the validity of the weight study? In another universe in which he only tested weight, would this study be stronger evidence?

On a somewhat related note, I once mentioned to someone that I would consider one test with an alpha value of .001 to be more convincing than three tests, each with an alpha value of .1, and he was quite convinced that there was a “mathematical proof” that I was “wrong”.
- William O. B'Livion says:
  
  June 2, 2015 at 12:20 am
  
  It wasn’t 17 studies, it was *one* study that looked at 18 variables.
  
  The idea being that ONE of those variables would change in the same direction for a significant number of participants.
  - RCF says:
    
    June 4, 2015 at 1:32 am
    
    It can be viewed as 18 studies performed concurrently. And you didn’t answer me question. How does having several variables make the weight one invalid?
    - Adam says:
      
      June 4, 2015 at 3:13 pm
      
      The basic idea is that a study purposefully looking for one thing that attains a p < 0.05 has only a 5% chance of incorrectly rejecting the null. A study looking at 18 hypotheses, on the other hand, has a 60% chance of incorrectly rejecting at least one null (assuming iid binomial). Hence the point of using FWER to find a p-value that brings that probability back down to 5%.
      
      You're right, however, that it's also true that out of any 18 studies, there is a 60% chance that at least one of them incorrectly rejected a null, which is basically what this author is taking issue with in the first place.
Floccina says:

June 1, 2015 at 3:20 pm

Conclusion 2: This proves that nutrition isn’t a real science and we should all just be in a state of radical skepticism about these things

You may be right to say that the above is wrong but I am not sure that we know anything significant today about nutrition than we knew in the 1960’s. We know you need minimum amounts of certain vitamins, minerals and 12 essential amino acids. We know that if you consume more calories than you use you will gain weight. If you eat fewer calories than you expend you will loose weight. We do not know much more that that.
Pingback: Monday Miscellany: Superheroes, Social Surveys, Sketches | Gruntled & Hinged
Stuart Armstrong says:

June 1, 2015 at 7:57 am

On a slightly related note, I’d like to mention the Holm-Bonferroni method for estimating the familywise error rate. http://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method

It’s strictly superior to the Bonferroni correction in every single way (except simplicity) and allows you to detect more subtle effects when you’re looking at a large number of variables.
Smooches says:

June 1, 2015 at 7:38 am

Thanks. You clearly articulated a number of my own thoughts on the matter.

Also, bitter chocolate tastes good. What’s wrong with people?
- William O. B'Livion says:
  
  June 2, 2015 at 1:46 am
  
  Clearly they have bad taste.
Pingback: Interesting Links for 01-06-2015 | Made from Truth and Lies
A bit obsessive over stats says:

June 1, 2015 at 3:43 am

“[H]e measured eighteen different parameters (weight, cholesterol, sodium, protein, etc) which virtually guarantees that one will be statistically significant”

It’s not an essential point of your arguments, but I think you got that wrong. If each parameter has a .05 probability of yielding “significant” (ugh, that word) results under H0, i.e. its true value being zero, then, if you measure eighteen parameters, there’s still a .95^18 = .40 probability that none of them will appear significant. You suggested that probability is close to zero.

I also agree with most criticism on p-values. The fact that these are so widely misused, misunderstood, misinterpreted, and faked in a variety of ways persuades me that – to use the terms of the 2014 Edge Annual Question – p-values are a scientific idea that must die.
- Exa says:
  
  June 1, 2015 at 3:53 am
  
  At least P-values are better than “significant”/”nonsignificant”
  Honestly, just always reporting the actual p-values would be a step forward.
  - RCF says:
    
    June 1, 2015 at 10:55 pm
    
    Well, technically, picking your alpha value before collecting data is part of rigorous application of hypothesis testing.
Nornagest says:

May 31, 2015 at 10:33 pm

The first thing I thought when I saw the phrase “good, assuming you make sure to always check the comments” is that no one has ever had that thought before. Sure enough, it’s a Googlewhack. Or effectively so — there are two results, but they both point to this post.

(No results for “good if you always check the comments” or “good assuming you always check the comments”.)
- Exa says:
  
  June 1, 2015 at 3:52 am
  
  “always check the comments” gets 82500 results. Not quite the same phrase, but similar intent.
  - jeorgun says:
    
    June 1, 2015 at 6:49 pm
    
    That gets noise from things like “I don’t always check the comments”, though.
  - Nornagest says:
    
    June 1, 2015 at 6:54 pm
    
    Ah, right, forgot this community’s prone to taking things literally. Let me be clearer. I thought the Googlewhack was funny enough in context to point out, not that literally no one has ever thought that (though I do think it’s probably extremely rare).
    - RCF says:
      
      June 1, 2015 at 10:53 pm
      
      And technically, a googlewhack is exactly two words. Also, the paradox of a googlewhack is that talking about it on the internet invalidates it; once you talk about it, there are now two instances of it: the original, and your comment about it.
Steve Sailer says:

May 31, 2015 at 5:16 pm

Say that eating a particular diet is good for 5% of the population, bad for 5% of the population, and indifferent for 90%. Any kind of general population test will show it as having no effect, but if there was a quick way to notice whether it was good or bad for you before any permanent harm done, then it might make sense for people to try it.
- gwern says:
  
  May 31, 2015 at 8:09 pm
  
  You can test whether the chocolate subjects have larger variance than the controls with an F-test, but why would one expect chocolate in particular to have that sort of complicated effect?
  - Smooches says:
    
    June 1, 2015 at 7:42 am
    
    Nobody would for something as common as chocolate. But, it’s possible, I suppose. Hypothetically this could occur in many areas. A real life example is that some people can’t eat a vegan diet because of how they process certain nutrients. That’s a small number of people. But, if someone in that group goes vegan, they become sick fairly quickly.
Anon says:

May 31, 2015 at 3:32 pm

The original article also makes the traditional mistake when describing p-values: it describes them as being P(null hypothesis | data), instead of P(data | null hypothesis). (“The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation.”)

This is especially egregious in an article about bad science journalism.
- Douglas Knight says:
  
  May 31, 2015 at 4:07 pm
  
  That sentence could be interpreted either way. But the sentences that follow it are unambiguously correct.
  - Raoul says:
    
    June 1, 2015 at 6:07 am
    
    I read it how Anon read it. But even if it is ambiguous rather than just wrong, it’s probably a bad idea to give an ambiguous definition when introducing people to an important but frequently misunderstood concept.
    
    I’ve seen the same mistake in other articles about bad science journalism too.
Kevin S. Van Horn says:

May 31, 2015 at 12:47 pm

A different take on this chocolate study:

http://andrewgelman.com/2015/05/29/i-fooled-millions-into-thinking-chocolate-helps-weight-loss-heres-how/
gwern says:

May 31, 2015 at 12:44 pm

But wouldn’t getting rid of p-values prevent “p-hacking”? Maybe, but it would just give way to “d-hacking”. You don’t think you could test for twenty different metabolic parameters and only report the one with the highest effect size? The only difference would be that p-hacking is completely transparent – if you do twenty tests and report a p of 0.05, I know you’re an idiot – but d-hacking would be inscrutable. If you do twenty tests and report that one of them got a d = 0.6, is that impressive? No better than chance? I have no idea. I bet there’s some calculation I could do to find out, but I also bet that it would be a lot harder than just multiplying the value by the number of tests and seeing what happens. [EDIT: On reflection not sure this is true; the possibility of p-hacking is inherent to p-values, but the possibility of d-hacking isn’t inherent to effect size. I don’t actually know how much this would matter in the real world.]

As I understand it, d or posterior hacking isn’t problematic in the same way p-hacking is, because statistical-significance is a one-sided dichotomous pass/fail filter which can only reject, and every test you run gives at least a 5% chance of getting a reject. So in, for example, an optional stopping scenario you can test after every new datapoint and sooner or later you will reject the null. In the Bayesian setup, if you try such a trick, you’ll simply wind up accumulating ever more evidence for the null (or more concretely, the parameter estimate of 0). You might try hoping that a small sample will throw up an extreme point value but then the BF/LR/posterior takes into account the weakness of the data as evidence for that extreme point value and give you an appropriately lame increase to talk about; in contrast, p-values are a weird sort of constant funtion which takes into account sample size and scales down the acceptable effect (it might take a d of d=1.5 on a very small sample to get your coveted p=0.05, but if you take a much larger sample then the NHST machinery will only demand, say, d=0.2. Hence the surprising result that you can convert a p-value into a Bayes factor without having to take into account sample size or anything, which I recently brought up in http://lesswrong.com/lw/m8h/prior_probabilities_and_statistical_significance/ceo1 – a p-value of 0.05 is always a BF of 2.45 or less, a p-value of 0.01 is always a BF of 7.9 or less, etc).

That said, you can see p-hacking as a particular kind of publication bias, and Bayesian methods won’t save you from other kinds. If people simply never report datasets which go the way that they don’t want, or don’t yield large enough BFs, then any attempt to combine such reports which ignores this filtering will yield systematically biased estimates.
- Jacob Steinhardt says:
  
  May 31, 2015 at 12:51 pm
  
  “So in, for example, an optional stopping scenario you can test after every new datapoint and sooner or later you will reject the null.”
  
  Is this really true? Given that each of these tests is correlated, I would e.g. expect more than 20 tests to be needed to have high probability rejecting the null (I would believe it happens eventually, just not in the linear way one might intuit).
  - gwern says:
    
    May 31, 2015 at 1:05 pm
    
    expect more than 20 tests
    
    Assuming non-correlation/independence, don’t you need only 14 tests to have better than even odds of at least one statistically-significant result? ((1-0.05)^14 =0.48)
    
    I would believe it happens eventually, just not in the linear way one might intuit
    
    Never said it was linear, but there’s a lot of simulations in papers on optional stopping.
- anon85 says:
  
  May 31, 2015 at 4:36 pm
  
  I don’t think it makes sense to convert a 0.05 p-value to a Bayes factor of only 2.45 or less. A p-value of 0.05 means “to believe the null hypothesis, you must believe a 1/20 event just happened”. Getting a Bayes factor of only 2.45 means that you are responding with “yeah, but under the alternative hypothesis, an unlikely event, say 1/8, still happened”, so we didn’t get as much evidence as we thought. But that doesn’t have to be true.
  
  I looked at the source you cited:
  https://faculty.washington.edu/jonno/SISG-2011/lectures/sellkeetal01.pdf
  
  and it indeed restricts the alternative hypotheses to a weird class in which they all induce a Beta distribution over the p-values. That choice is completely arbitrary, isn’t it?
  
  Just to emphasize the point, using the calculations you suggest, p=0.0005 gives Bayes factor of only 97, suggesting that under the alternative hypothesis, a p<0.05 event still happened; and this is asserted without knowing what the alternative hypothesis is!
  
  Let me know if I'm misunderstanding something here.
  - gwern says:
    
    May 31, 2015 at 8:07 pm
    
    The restrictions are a bit arbitrary, yeah, but beta is pretty general. I find it interesting that this conversion works at all.
    - anon85 says:
      
      May 31, 2015 at 10:04 pm
      
      I would normally agree that beta is pretty general, but the conclusion in this case seems so counterintuitive that there must be some bad assumptions hidden (or else I’m misunderstanding).
- Kevin S. Van Horn says:
  
  May 31, 2015 at 5:58 pm
  
  A Bayes factor is useless in this situation, as it is vanishingly improbable that the effect is *exactly* zero, making the prior for the null hypothesis very, very, very small.
Kevin S. Van Horn says:

May 31, 2015 at 12:42 pm

I disagree with you on (4). Some of the problems with p values:

1. They answer the wrong question. You want to know what is a reasonable range of plausible values for the effect, and how certain you can be that the effect lies within that range. P values answer an entirely different question that is only loosely related to this one. Confidence intervals don’t answer this question either — instead, they tell you something about the error rate of a decision procedure. Bayesian credible intervals DO answer this question. Furthermore, using a prior that summarizes background knowledge as to what are believable effect sizes (from previous studies or biological knowledge) can help deal with the problem of spurious significance in small samples.

See http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2606016 .

2. The null hypothesis is often a straw man that nobody believes to be true. That is, the effect you are studying is almost certainly NOT zero, although it is plausible that it may be very small, and may change magnitude and even sign depending on variables other than the main one you are considering (i.e. the “constant effect” assumption may be wrong). Instead of type 1 and type 2 errors, which focus on whether or not the effect is 0, you should be worrying about type M (magnitude) and type S (sign) errors.

3. P hacking is a very real problem, even when researchers are trying their best to avoid it. (See “The Garden of Forking Paths”, http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf)

There is a better way: Do a Bayesian analysis, with priors reflecting generally accepted background knowledge, using a regression model (and multilevel regression for subgroup analysis), and report credible intervals for the effects of interest.

This paper summarizes the problems with NHST and assuming constant effects, and how Bayesian methods can be applied to address these problems:

http://www.stat.columbia.edu/~gelman/research/published/bayes_management.pdf
Jacob Silterra says:

May 31, 2015 at 10:34 am

People like to blame this on reporters and lazy journalism, and while that’s true, it ignores one thing. Reporters copy-paste press-releases written by the institutions which conducted the study. So essentially scientists are being misleading and the press is coming along for the ride. See The association between exaggeration in health related science news and academic press releases: retrospective observational study; I did some follow-up analysis here
emily says:

May 31, 2015 at 10:31 am

p-values are useful, but by themselves don’t tell the whole story. Every study should be required to report a power analysis. You want to know if you study is ridiculously over-powered- and you are going to pick up a 0.01% difference between groups (not a meaningful difference even if true), or alternately if you would have needed to have a huge effect size like 50% to pick it up before reporting headlines like “X does not help Y.” I think many late stage drug trials are over-powered, and the effect sizes are all significant if not always impressive.
- RCF says:
  
  June 1, 2015 at 10:35 pm
  
  Scott already addressed this with his discussion of micohitlers. Although the issue of power is much more important in studies that don’t reject the null, compared to those who do.
alwhtie says:

May 31, 2015 at 10:08 am

Maybe this is my bias showing but if I read a study that encourages the eating of chocolate, I’m going to believe it. 🙂
- houseboatonstyx says:
  
  May 31, 2015 at 8:37 pm
  
  I used to dismiss all reports that some good effect came from chocolate and such, figuring that the chocolate vendors had plenty of money to fund many studies looking for some good effect they could report. But, hm, the vendors have money because most people like chocolate. Maybe the cavemen who liked chocolate were healthier than the ones who didn’t. So the chocolate-philes out-bred the chocolate-phobes, so here we are.
  
  But correlation is not causation, so I think we need to do a whole lot more chocolate studies, for which I volunteer.
  - Nornagest says:
    
    May 31, 2015 at 10:38 pm
    
    Alas, chocolate is a New World plant and early man could not have evolved to enjoy it. It’s been cultivated in Mesoamerica for a long time, though, so there’s an outside chance of adaptation if you’ve got any native blood in you.
  - Anthony says:
    
    June 1, 2015 at 4:38 pm
    
    I will gladly provide information for observational/epidemiological studies, but the though of being in the control group of a chocolate study is way too depressing to volunteer for that.
    - Deiseach says:
      
      June 1, 2015 at 4:45 pm
      
      Oh lord – the control group for a chocolate study might be fed carob and that is a torture even the Devil in Hell would not inflict 🙁
Kaj Sotala says:

May 31, 2015 at 9:35 am

Conclusion 2: This proves that nutrition isn’t a real science and we should all just be in a state of radical skepticism about these things

I don’t get this conclusion in the first place. The guy admitted himself that he didn’t get this published in even a low-tier peer reviewed journal, he got it published in a fake for-profit journal that will print anything you send them. Yeah, that the popular press reported on this like it was published in a real journal is pretty damning for the popular press, but how is it any indication about the state of nutrition science?
- Deiseach says:
  
  May 31, 2015 at 4:42 pm
  
  The only worthwhile research is that recognised by the Journal of Irreproducible Results 🙂
HeelBearCub says:

May 31, 2015 at 9:12 am

What if there was a “Journal of Pre-Registration” where you could pre-register the study you are doing, what tests you plan on doing on the data and what your hypothesis is? This seems like it would at least allow scientists doing small studies to make sure their wheat stands out from the chaff. It would need to be very lean, and online publish all registered studies, and be easy to search. But as an outsider to research, it seems a reasonable part of a solution to me.

Given the whole publish or perish model, I’m not sure those who aren’t already established scientists could afford to do this. The whole “lets decide who the author of the paper is after we are done” also is problematic for adoption.

What would other hurdles be? Does anything like this already exist?
- AnObfuscator says:
  
  May 31, 2015 at 8:19 pm
  
  They could always use something like GitHub…
  
  Upload the plan and hypothesis. Upload the raw data, when gathered. Upload the cleaned data, and the code used to clean it. Upload the results.
  
  All nicely version controlled.
  - HeelBearCub says:
    
    May 31, 2015 at 8:37 pm
    
    Yeah, something like GitHub/version control in gereral could make it easier to get the hypothetical journal off the ground.
    
    But the real questions are: 1) Would the idea (regardless of implementation) be a significant improvement, and 2) Would anyone use it.
    
    Although, build it and they might come seems like a reasonable approach, if you can do it cheaply enough. So perhaps questing 1 is the most salient, as a cheap approach seems possible.
    - AnObfuscator says:
      
      May 31, 2015 at 8:54 pm
      
      “ResearchHub” (as I have now named the idea in my mind 😉 ) could definitely be cheap. Heck, just use GitHub to start with.
      
      (1): I personally find it frustrating how hard it is to actually get the data used for all the pretty graphs in a paper. I think a lot of meta-review could be done very easily, and frequently, if the data were freely accessible to all. Also, what are we missing by *not* having all this data available? I mean, a lot of studies are done, but the results aren’t published because nothing “interesting” was found. Wouldn’t it be nice to have that boring data available somewhere?
      
      (2): Yes, if one or two major publications prioritized papers that used “ResearchHub” or similar, it would gain a lot of popularity quickly. Once it hit critical mass, it would be considered odd for a paper to *not* include a link to the repo.
  - William O. B'Livion says:
    
    June 2, 2015 at 1:41 am
    
    Cloud servers are pretty cheap, hooking up the source forge isn’t (or shouldn’t be, haven’t looked recently) that tough.
    
    Funding for the first year or six might be a problem. Any grant writers around?
- qsz says:
  
  June 1, 2015 at 12:30 pm
  
  The journal Cortex recently launched such a format, called a Registered Report.
  Descriptive link
  Detailed guidelines (PDF)
  
  It’s too early to see how many people will submit such formats (critics argue that it creates even further burden on peer reviewers) but definitely a trend I’m watching closely from my almost-related field.
- RCF says:
  
  June 1, 2015 at 10:32 pm
  
  I suppose one strategy would be to put a voter initiative on the ballot requiring studies involving a public university to follow these procedures. I’m not sire how voters would respond to that.
bayesian says:

May 31, 2015 at 8:57 am

Why not just use likelihood ratios?
mico says:

May 31, 2015 at 8:27 am

Bild is a trashy tabloid. Biggest only by circulation.
- Besserwisser says:
  
  May 31, 2015 at 10:36 am
  
  It’s not even a real newspaper. Too many pictures and too little text, hence the name.
- nydwracu says:
  
  June 1, 2015 at 5:20 am
  
  Is there any German newspaper that isn’t awful? I used to keep my German up by reading their papers, but all the ones I could find were terrible and now it’s atrophied horribly.
  
  Nuclear power plants are Satan incarnate! There are eight neo-Nazis in Saxony! Some bureaucrat did something somewhere! Mumble mumble green!
  - mico says:
    
    June 1, 2015 at 11:51 am
    
    Frankfurter Allgemeine Zeitung? It is by far the least annoying in terms of its treatment of Germany’s peculiar cultural obsessions and has particularly good coverage of the European financial markets.
    
    Generally for anything that doesn’t touch the political topics you mention I find the German papers superior to English language papers. They assume a higher standard of intelligence in their readers.
  - anon says:
    
    June 1, 2015 at 2:37 pm
    
    I second Frankfurter Allgemeine, and add Switzerland’s Neue Zürcher Zeitung as a recommendation.
  - Mary says:
    
    June 1, 2015 at 2:48 pm
    
    “The public doesn’t understand German, and in Journalese I can’t tell them so.” Karl Kraus
Deiseach says:

May 31, 2015 at 8:04 am

Well, if Mr Bohannon (Dr Bohannon? If he’s a genuine PhD?) wanted to make sure the ordinary layperson never believes a damn word about diet and nutrition ever again, I think he’s succeeded.

I also think Dr Frank, G.P., should be careful of throwing stones in that glass house of his profession: the idea that “if it tastes horrible, it’s doing you good” comes from medicine, after all, and the bitter/nasty/unpalatable tastes of drugs. A generation reared on “Drink your cough syrup and never mind making those faces about the taste, it’ll do you good” is naturally going to be susceptible to “Bitter chocolate in its unprocessed state tastes bad, bitter + unprocessed = good for you” (after all, we’re constantly being told by the professionals to Eat More Fibre and you’ll get that from raw vegetables, wholegrains, and other unprocessed foods).

Thirdly, that you should not believe everything you read in the papers, especially when it sounds too good to be true? Never bad to have a reminder of that.

Lastly, I hope the wider message of this is that just because someone has a PhD or A Real Actual Genuine Scientific Qualification, it does not mean tuppence when they’re outside their area of qualification. Dr Bohannon has a Real Actual Genuine qualification in the field of “the molecular biology of bacteria” but this does not qualify him to make statements on anything but the molecular biology of bacteria; Scott points out that he seems either unaware or dismissive of the studies done about dark chocolate, and he doesn’t seem to have considered that maybe the weight loss was not down to “dark chocolate makes you lose weight” but “letting people have little treats like portions of dark chocolate helps them stick to diet plans and combat cravings better”.

I’m constantly seeing inspirational quotes from Bill Nye, The Science Guy (and not being American, I wouldn’t know him if I fell over him or what is so great about him) and Neil deGrasse Tyson all over my social media feeds and people who post and repost/reblog them going “YEAH SCIENCE!!!!” and that’s fine – as long as it’s about science.

When it comes to religion, ethics, the economy, changes in the law, or what colour you should paint your hall door, I wish people would remember that being Dr Science qualifies them or others no better to opine on these topics than Mick down the pub, or you, or me.
- Irrelevant says:
  
  May 31, 2015 at 9:20 am
  
  If you’re going to refer to Nye with his full title, you should also use “Neil deGrasse Tyson, the Science Liaison.” It’s unsymmetrical otherwise!
  - Gbdub says:
    
    May 31, 2015 at 1:35 pm
    
    Well, if we’re doing titles, it’s “Bill Nye the Mechanical Engineer and kids’ show host” and Neil deGrasse Tyson the semi-retired Astrophysicist and active TV star”. 😉
    - Deiseach says:
      
      May 31, 2015 at 4:38 pm
      
      At this stage, it’s “Bill and Neil The Two People I’d Most Likely Punch In The Face (If I Ever Met Them)” due to the uncritical parroting and praise of their most anodyne utterances as some kind of great wisdom of the ages.
      
      I’m sure they’re perfectly nice blokes in their private lives and better able to work out a mathematical formula than I am, but that does not mean they are the unerring founts of all human wisdom on every conceivable topic. Or rather, that the young people should not treat them as such 😀
      - Steve says:
        
        June 2, 2015 at 5:00 pm
        
        Granted, there is also the matter of Niel’s plagiarism of quotes (don’t have the link off hand, but he made up some ridiculous things “said by religious nuts” and then road that skepticism/slander train to celebrity (among the athiest/skeptic/’scientist’ community and in the hearts of really stupid “nerds” on tumblr everythwere), and general willingness to suck every available media opportunity out of any cock which is presented to him.
  - RCF says:
    
    June 1, 2015 at 10:28 pm
    
    You do realize those don’t rhyme, right?
Mike Capp says:

May 31, 2015 at 7:40 am

Nerdly niggle – you have unescaped left-angle-brackets in your body text, e.g. in the sentence beginning “I would like to be able to say that since it has p < 0.001". Browsers are pretty forgiving and it still looked OK on the blog itself, but my RSS reader’s parser gave up on the offending paragraphs at that point.

W3 validator results for confirmation.
Mark says:

May 31, 2015 at 7:36 am

Bad science to dogma: Interesting development in understanding of cholesterol and saturated fat at the HHS reported here in the UK by Matt Ridley. http://www.mattridley.co.uk/blog/cholesterol-is-not-bad-for-you.aspx
Corwin says:

May 31, 2015 at 6:57 am

“Bitter chocolate tastes bad” 🙁
- Corwin says:
  
  May 31, 2015 at 6:59 am
  
  (Huh, form ate my comment)
  
  “Bitter chocolate tastes bad” ? WELL WHEN IT’S MADE PROPERLY IT DOESN’T
  
  There are three ingredients in dark chocolate : cocoa powder, cocoa butter, and sugar, everything else is evil.
  
  In the early 90s, chocolatiers sorta collectively decided they wanted a new sales argument and chose to pitch high cocoa percentage. Which just means “there is this much sugar in this”, really.
  So they added a lot of cocoa powder, because it’s a lot cheaper than cocoa butter; this entirely fucked up the correct recipes, making for chocolates that leaves a bitter paste in the mouth, because there is not enough butter to make it slide down to swallow.
  
  Meanwhile, some rare, and small, chocolatiers are still making dark chocolate that actually tastes good… and finding that it’s stupidly hard to sell, because people expect it to taste bad.
  - Deiseach says:
    
    May 31, 2015 at 8:18 am
    
    Agreed that I expect high cocoa volume dark chocolate to taste chalky and bitter; I’ve tried some of the ones on here and have been disappointed by Lindt and Green and Black’s (though that may have been because I tried the products after they’d been taken over first by Cadburys and then Cadburys was taken over by Kraft Foods, so the “small boutique producer” thing was pretty much knocked on the head).
    
    On the other hand, Aldi’s own-brand Moser-Roth gets two thumbs up from me 🙂
    
    White chocolate is the devil’s drippings. Yuck! Pointless, much too sweet, and feels like I’ve shoved a bar of soap down my throat.
    - Corwin says:
      
      May 31, 2015 at 11:49 am
      
      yeah, Lindt is terrible, but they already were before they sold out. Green & Black rings a bell, but i don’t remember much, maybe i just heard the name…
      
      Aldi indeed often has products of surprisingly good quality 🙂
      
      White chocolate … it’s a matter of personal taste, but there exists good white chocolate; it’s just so much even easier than dark chocolate to add unnecessary, evil things in it to cheapen it, that it’s very rare to find any that hasn’t been fucked with 🙁
      - nydwracu says:
        
        June 1, 2015 at 5:17 am
        
        Lindt? That’s what I usually buy. But only the 90% — everything else (including other brands like Ghirardelli) has an odd chemical/fruit taste to it.
    - Anthony says:
      
      June 1, 2015 at 6:19 pm
      
      Lindt appears to have changed their recipe for their 70% a couple of years ago – it’s more sour than it used to be, to the point where I rarely buy it anymore. I’m not sure what you can get in Ireland; availability of high-end chocolates varies widely even in the Bay Area.
      
      I’ve found that anything much more than 70% is a risk – sometimes it’s just way too bitter, or has weird flavors – but there are some good ones above that. Trader Joe’s has an inexpensive 85% from Colombia which I really like, but has some flavor notes which some of my friends really dislike.
    - James Picone says:
      
      June 1, 2015 at 9:49 pm
      
      “White chocolate is neither white nor chocolate. Discuss” to quote Kingdom of Loathing.
  - Mike says:
    
    May 31, 2015 at 5:35 pm
    
    If you are in SF and you like dark chocolate, look out for Dandelion Chocolate (also I think you can find it in NY and Boston). The chocolate is made with 30% sugar and 70% whole cacao mass, no added powder or butter or lecithin or vanilla or any of that crap. They use beans from a single source rather than some kind of blend, and the roast tends to be on the lighter side, so you can actually taste the differences from the original beans.
    
    It’s seriously good. Not cheap, though.
    - Anthony says:
      
      June 1, 2015 at 6:27 pm
      
      Enh. I’m really unimpressed by Dandelion’s chocolate bars. They’re waxy and the flavor isn’t very strong. Their drinking chocolate (in the shop), however, is excellent. Tcho makes better chocolate locally.
Ilya Shpitser says:

May 31, 2015 at 5:23 am

“Then we do epidemiological studies, which can have large sample sizes, human applicability, and last for many decades, but which aren’t randomized very well and might be subject to confounders.”

And we know how to adjust for confounding properly, too. Robins et al. now replicated the result of an RCT via observational data by properly adjusting for confounding.

“Get rid of p-values”

The problem is incentives, not statistical methods.
- gwern says:
  
  May 31, 2015 at 12:33 pm
  
  Robins et al. now replicated the result of an RCT via observational data by properly adjusting for confounding.
  
  Since you’re the one bringing it up approvingly, can I assume that Robins et al predicted the result rather than postdicted it?
  - Douglas Knight says:
    
    May 31, 2015 at 1:24 pm
    
    Lots of people have predicted true results based on observational studies. For example, Agostino Salumbrino predicted that quinine would treat malaria, based on his observation of it being used to treat other diseases.
    - gwern says:
      
      May 31, 2015 at 5:31 pm
      
      Oh, I don’t claim inferring causality from correlation is always wrong. (Just 66% of the time.) But I am less impressed by an analysis which reaches the already-known answer.
      - Ilya Shpitser says:
        
        June 1, 2015 at 4:43 am
        
        Gwern, I think the game here isn’t to do science correctly, but to do statistical methodology of causation correctly.
        
        We already know what to do, more or less, for the former: make a prediction and run an experiment. But what if we can’t run an experiment, and have to kill confounding off? How do we do it in general?
        
        This is actually important: there is almost an infinite amount of ways to massage observational data to try to kill confounding off, and thereby get the result of the RCT. Robins et al fixed the method of massage in advantage.
        
        I think what they were doing is quite a bit more sophisticated than “pick one out of three cases in the common cause principle.”
    - RCF says:
      
      June 1, 2015 at 10:16 pm
      
      So, someone, somewhere, managed to once correctly guess one bit of information? That’s supposed to be impressive?
- William O. B'Livion says:
  
  June 2, 2015 at 1:35 am
  
  The problem is incentives, not statistical methods.
  
  As I’ve told more than a few managers/directors over the years “Be careful what you measure because you’ll get more of it.”
  
  You measure what percentage of trouble tickets are close in two hours, you’ll get more closures at 2 hours. Not more *fixes*, but more closes.
  - RCF says:
    
    June 4, 2015 at 1:34 am
    
    “You measure what percentage of trouble tickets are close in two hours, you’ll get more closures at 2 hours.”
    
    Well, that’s rather incomprehensible.
zz says:

May 31, 2015 at 5:20 am

Obligatory xkcd is obligatory.

Nutrition isn’t a real science and we should all just be in a state of radical skepticism about these things

If anyone shares this opinion, I invite you to read Food and Western Disease. Nutrition science is hard, but it’s not bullshit. (note: the author advocates a paleo-ish diet; I eat primarily soylent based off the conclusions drawn in the book, even though that’s literally the furthest thing from an ancestral diet. Lindeberg doesn’t not advocate a point-of-view, but mostly sticks to evaluating evidence as neutrally as possible; take 1 2 3 of his slides to illustrate his moderate views)
Houshalter says:

May 31, 2015 at 4:41 am

The whole system seems wrong. You shouldn’t be allowed to collect data, then choose a hypothesis to test on it, then find a journal to publish it. At each step you have a chance to add bias and eek out more statistical significance than is actually there.

Data should be published no matter what is collected. If you measure 16 things, you should report 16 things, not just the 1 thing that had a correlation. And then everyone can look at the data and analyze it, not just the scientists trying to prove their hypothesis. And all data gets published, not just that which finds something interesting.
Luis Pedro Coelho says:

May 31, 2015 at 4:29 am

The real problem is that there are too many scientific publications that pass all the science rituals for their disciplines which are just very unlikely to be true.

These studies rarely correct for the multiple hypothesis testing [https://en.wikipedia.org/wiki/Multiple_comparisons_problem]. If they reported corrected p-values (like is standard in other fields of science, like the one I work in), then many of these studies would have null-findings and the ones that did have a positive finding would be that much more likely to be true.

It’s not throw out p-values, rather correct your p-values for multiple hypothesis testing.
Jiro says:

May 31, 2015 at 4:19 am

Anyone who thinks Wikipedia is a good information source should read their article on Gamergate controversy.
- Jon Gunnarsson says:
  
  May 31, 2015 at 6:40 am
  
  Gamergate is a controversial and highly ideologically-charged issue. When it comes to topics like that, most sources are going to be quite biased.
  - HeelBearCub says:
    
    May 31, 2015 at 9:00 am
    
    And most readers as well. Which affects there perception of of now unbiased the writers are.
  - Cauê says:
    
    May 31, 2015 at 11:40 am
    
    Wikipedia’s “reliable secondary sources” policy usually works well enough, but it does funny things when those reliable sources don’t live up to their reliability credentials. Like when they are one side of a controversy.
    
    But these situations are extraordinary, and Wikipedia as a whole tends to work better than that.
    - Anthony says:
      
      June 1, 2015 at 6:15 pm
      
      That Slate article points to a comment on dKos which says “They’ve purged all their female editors: They’re not allowed to write anything on anything having to do with women or gender issues.”
      
      With a sig line which says “You can’t fix stupid.” Oh, the irony.
- Besserwisser says:
  
  May 31, 2015 at 10:34 am
  
  There was also a feminist campaign to edit wikipedia according to their ideology. I wouldn’t trust Wikipedia on anything concerning them nowadays.
Jacob Steinhardt says:

May 31, 2015 at 3:46 am

Scott, what do you think of Gelman and Loken’s article on “the garden of forking paths”, which argues that one need not actually perform multiple comparisons explicitly in order to obtain inflated p-values?

http://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1

The point is that simply looking at your data before choosing an analysis strategy already will lead to inflated p-values, even if you don’t explicitly make multiple comparisons. Moreover, it is quite difficult to know what correction to apply in this case, since the number of a priori comparisons one could have chosen is nearly infinite (although, there are some clever ideas being developed in the statistics community right now to try to partially alleviate this issue).
Richard says:

May 31, 2015 at 2:43 am

An extremely (over-)simplified model of weight control
[Epistemic status: probably not wrong, but probably only works to a first order of approximation]

There are two levels of metabolism; active and resting. (books say at least three, sometimes five, probably there is actually a continuum, but this is simplified, yes?)

Resting metabolism is mainly affected by your calorie intake. A ‘normal’ level means ~2000 Calories per day is spent on keeping your body running at idle. During periods of restricted calorie intake, the number of Calories burned goes down significantly, and as low as ~700/day is not uncommon.
The resting metabolism does not return all the way to normal when Calorie intake increases, but stays at a slightly reduced level. This is repeatable so that yo-yo dieting decreases your resting metabolic rate every time you diet. The resting metabolic rate can be increased by exercise, but it requires quite a lot of effort to increase it just a little.

This means that no weight loss regime that is based on reduced calorie intake will work long term. Unless you are willing to spend the rest of your life on a starvation level diet of less than 700 Calories per day. This is both hard to do and will probably have some adverse effects on your health.

The active metabolism is affected by exercise and the effect depends on how fit you are. For the purpose of this discussion, we will simply say that you can expend ~1000 Calories per hour, the fitter you are, the higher the number, but there is an asymptote. The other side of this equation is calorie uptake. Studies done on extreme athletes indicate that you can peak out on ~350 Calories per hour, possibly as high as 400 if you base your diet on energy gels, but that hardly seems feasible for extended periods of time.

This means that if you burn more than ~8400 Calories per day, it is impossible not to lose weight. Now, more than 8 hours of exercise per day may seem like a lot, but on the other hand, it is roughly similar to the level of effort that was expected of a twelve year old farm boy about a century ago, when obesity was not a problem. (though undernourishment was…) 8400 Calories corresponds to ~2 pounds, placing an upper limit on weight loss.

To complicate this simple model are things like insulin sensitivity and thyroid function which may skew things quite a bit, but probably not more than ~20% in either direction while still supporting life.

At a very rough estimate, diet composition will have an affect of no more than ~10% on the basic numbers above, because of things like energy spent on digestion. Whenever I see any research that claims more than that, I will view this as extraordinary claims requiring extraordinary evidence.

Dark chocolate seems to have a positive effect on insulin sensitivity, which is not terribly relevant unless you are pre-diabetic or worse and the claim does not affect the numbers above. I am willing to provisionally accept the claim, but would probably not base any self-treatment of diabetes on it without further evidence.

Personally I really enjoy the taste of 96% cocoa bitter chocolate and whether it is good for me or not is really not something I worry about.

This seemed like as good a place as any to share my heuristics about dieting and exercise, sorry if it was only tangential to the actual topic.
- Jiro says:
  
  May 31, 2015 at 4:29 am
  
  This means that no weight loss regime that is based on reduced calorie intake will work long term
  
  You’re only using the reduced calorie intake to lose weight. Yiou don’t use it long term. Once you’ve lost all the weight, you can stop eating reduced calories and eat at a steady state rate. The steady state rate is slightly less than your pre-weight-loss steady state rate because you burn fewer calories when you are lighter, but only slightly unless you’re losing really huge numbers of pounds.
  - Richard says:
    
    May 31, 2015 at 7:07 am
    
    I think we all know that this is how dieting is supposed to work. The way it actually works is that your body will always consume fewer calories than you eat unless
    a) you bottom out at starvation level
    b) you reach a weight that is at least as large as your starting point or
    c) you force the metabolism up by exercise
    
    The only reason you can lose weight temporarily with a 1500 Calorie diet is because your body can not change it’s metabolism quite as quickly as you can change your diet. This lag actually works against you when you start eating again.
    - Godzillarissa says:
      
      June 1, 2015 at 7:33 am
      
      The problem I see there is that people actually try to lose weight instead of fat*. As long as we go by scale it’s a lost cause.
      
      *Assuming the actual goal is “to look better”, whatever that means.
  - Deiseach says:
    
    May 31, 2015 at 8:27 am
    
    I wish I had the metabolism to go “Oops, put on a few pounds, better cut back on the calories a bit” and then when the weight shifted go back to eating as before.
    
    No. If I go on restricted-calorie, I’m on it forever, else when I stop the diet and eat “normally”, I pack it all back on plus a bit extra to boot.
    
    By “normally”, I do not mean “back on the junk food and excess”, I mean “eat a healthy – by whatever standards of healthy we’re being told is healthy today by nutritionists – three meals a day” diet. Everyone says “you need to eat breakfast, breakfast is the most important meal of the day” and “eating breakfast means you’re less hungry during the day and won’t eat a massive lunch”?
    
    Not the way my system works. I eat breakfast, it kicks the appetite into gear so that I am hungry at teabreak time and ravenous at lunchtime and get home from work and can’t wait to cook the evening meal. The only way I get my appetite back to “No, I don’t even want biscuits at tea break and a bowl of soup will fill my right up at lunchtime” is no breakfast. Often no lunch either; these days I’m eating one proper meal a day (but yeah, tending to pick and snack during the evenings) and getting by fine with no hunger pangs on cup of tea in morning, cup of tea at lunchtime.
    - Hyzenthlay says:
      
      May 31, 2015 at 10:00 pm
      
      That’s how it’s always worked for me too. I never eat breakfast, have done two meals a day for years, and despite everyone talking about how important breakfast is, I very rarely see anyone produce actual data to support this.
      
      I’ve heard people make the claim that you’ll burn more calories throughout the day if you eat breakfast, or if you snack between meals. I’ve also heard the claim that short periods of fasting (12-18 hours) followed by one large meal burns calories more efficiently than if you spread them out over several meals.
      
      Claims about how you’re “supposed” to eat change so much and contradict themselves so often that I tend to just ignore them.
      - Richard says:
        
        June 1, 2015 at 1:15 am
        
        Like many ‘old wisdoms’ when it comes to nutrition, I suspect that ‘breakfast is important’ is adapted to a world we no longer have.
        
        Try not eating breakfast and then go out and do hard physical labour until lunch. Do stop before you collapse, because without breakfast you will.
        Now try not eating breakfast and go sit in an office chair until lunch. See the difference?
        
        As late as the 1960s, physical labour was common enough that the breakfast advice made sense and without looking it up just now, I seem to remember it is much older than that.
      - nydwracu says:
        
        June 1, 2015 at 5:14 am
        
        Eating breakfast was once considered low-class, which is an interesting piece of evidence.
        
        I can’t stand not eating breakfast, but that’s because I eat a small dinner if I eat one at all.
      - Tracy W says:
        
        June 1, 2015 at 6:20 am
        
        Try not eating breakfast and then go out and do hard physical labour until lunch.
        
        Yes, I don’t normally eat a large breakfast, and that was pretty tough on my body when I went on a cycling trip that kept involving climbing big hills in the morning.
      - Mary says:
        
        June 1, 2015 at 1:47 pm
        
        Much of the “breakfast so important” stuff came from feeding inner-city kids breakfast. It can not be reproduced with suburban kids — that is, the effect stemmed from prior hunger.
- HeelBearCub says:
  
  May 31, 2015 at 8:59 am
  
  I believe the picture is more complex than that. For instance see the research on gut biome and how it affects weight gain or loss. It is possible to lose weight simply by replacing your gut biome with one from a person who is “naturally” low weight.
  
  I think of the issue with understanding weight loss and gain is that its a really complex system, with many moving parts. Simple explanations feel right, but any adequate explanation is likely to be complex.
  - Richard says:
    
    June 1, 2015 at 1:36 am
    
    Agreed, for values of ‘adequate’. A first order approximation is almost never adequate for anything but basic sanity checks, for which this was intended.
    - HeelBearCub says:
      
      June 1, 2015 at 10:55 am
      
      Except … there exists a basic assumption in your oversimplified model that all calories consumed make it into your bloodstream, or that this is an immutable property of the individual.
      
      To take a step back 30 years or so, the oversimplified model drawn would have been total calories in/out and would have regarded metabolism as an immutable fact of the person. This simplified model lead to some particularly bad outcomes.
      
      I guess I am challenging the idea that an oversimplified model is “sane”.
- Glen Raphael says:
  
  May 31, 2015 at 11:08 pm
  
  Have you tried fitting cold exposure into your framework? Ray Cronise has been doing some pretty interesting stuff.
  - Richard says:
    
    June 1, 2015 at 1:11 am
    
    I’m not sure it’s needed for a first order approximation. The link you provided talks mostly about a variant of the paleo diet, so falls squarely in the ‘diet composition’ bit which is relevant, but does not invalidate any of the basics.
    
    Paleo & friends are interesting, but my main contention with it is:
    * We had no problems with obesity only a few decades ago when people did physical labour.
    * We have not changed our diets but have changed our physical activity and have gained an obesity problem
    * Looking at what has changed recently might be at least as profitable as looking at what changed so long ago that it allows for evolutionary changes.
    
    Thus, paleo and variants are interesting and quite possibly correct, but does not make up for the disappearance of a 12-hour workday in the fields.
    
    Clarification: The cold exposure link talks about how seasonal variation in diet and cold exposure + sleep cycle can directly affect your metabolic rate(s) and how this is probably good for you.
    I am fully prepared to believe this and in fact do eat less greens and lower my bedroom temperature in winter.
    I am not at present prepared to believe that cold exposure has a significant effect compared to that of starvation, because starving has a massive effect on your metabolic rate.
    - nydwracu says:
      
      June 1, 2015 at 5:15 am
      
      There were desk workers a hundred years ago. Was obesity a problem for them?
      - Richard says:
        
        June 1, 2015 at 6:32 am
        
        Not sure they would say it was a problem as such, since being fat was a rather strong status symbol (at least among men and until WWII). Also, the link to diabetes and heart problems were largely unknown.
        
        Were they often fat? yes.
      - Anthony says:
        
        June 1, 2015 at 6:05 pm
        
        Before the invention of telephones and elevators, lower-level desk workers would often have to deliver messages, papers, etc., from one office to another, which often involved climbing stairs.
    - Glen Raphael says:
      
      June 1, 2015 at 11:05 am
      
      Ray thinks what changed recently is “winter went away”. If we define winter as a season during which you are cold /and/ calories are scarce, mammals are adapted to surviving that period by burning fat they accumulate the rest of the year. There is some evidence that persistent exposure to cold triggers preferential burning of stored calories. Squirrels do it and bears do it, so why not us? Since the 1970s we’ve gotten REALLY GOOD at controlling the temperature – most houses are now well-insulated and central heating is cheap and warm clothing is cheap so most people basically never get cold in the winter. This might explain why we AND OUR INDOOR PETS all became obese at about the same time.
      
      Ray found that frequent cold exposure “supercharged” his weight-loss program – he lost more weight faster and easier. Here’s his TED Talk: https://www.youtube.com/watch?v=UrQ_ldCwKUQ
      
      What I got from Ray’s talk was not so much “you should use my diet” but rather “whatever diet you do, you should add cold exposure to it”.
      
      (I heard about this when Penn Jillette mentioned on his podcast that he lost 76 pounds in 90 days on Ray’s program. I tried applying some of the same principles to my own efforts WITHOUT doing Ray’s particular brand of weird diet and it worked pretty well for me too.)
      - Deiseach says:
        
        June 1, 2015 at 4:43 pm
        
        I remain unconvinced there is any one magic solution to weight loss; re: cold exposure, I’ve lived in houses where I was blue-with-the-cold, frost on the INSIDE of the window cold, and I never burned off a significant amount of fat.
      - Nornagest says:
        
        June 1, 2015 at 4:54 pm
        
        It may help to think of this not as a magic solution to weight loss, but as a possible answer to the question of “why is the population so much fatter now than it was a hundred years ago?”
        
        There were fat people a hundred years ago, too, so relatively few answers to that question are going to lead to weight loss for all the ones we have now. But it might well be helpful for some, though there’s still the chance that the cure might be worse than the disease.
        
        (That said, I’m a little skeptical of the winter answer; we’re basically tropical animals, so our metabolisms wouldn’t have had long to evolve this, and empirically we don’t see huge obesity rates in the tropics. [Though many tropical countries are getting fatter, like most places are.])
    - Nancy Lebovitz says:
      
      June 1, 2015 at 8:25 pm
      
      There were a lot of people not doing hard physical labor in the 60s and the 70s. Admittedly, it was necessary to walk to the television to change channels.
      
      Have a weird theory– there’s some evidence that fluoridated water lowers thyroid function.
      - William O. B'Livion says:
        
        June 2, 2015 at 1:03 am
        
        https://www.youtube.com/watch?v=Qr2bSL5VQgM
    - RCF says:
      
      June 1, 2015 at 9:58 pm
      
      We have not changed our diets? Seriously?
    - William O. B'Livion says:
      
      June 2, 2015 at 1:26 am
      
      * We had no problems with obesity only a few decades ago when people did physical labour.
      
      You want to see pictures of my fathers father who worked most of his life as a chef’s helper and died in his early 60s of heart disease?
      
      Like my father.
      
      In other words, while obesity was not as prevelant *then* as it is *now*, it was not unknown and was a problem.
      
      * We have not changed our diets but have changed our physical activity and have gained an obesity problem
      
      We have changed *almost everything* about our diets at least once in the last 40 years.
      
      In the 1920s most milk was unpasteurized. In 1987 federal regulations required milk transported interstate to be pasteurized, and these days you can get jailed for selling unpasteurized milk (I take no position on the drinking or use of unpasteurized milk, but it’s your body, your choice).
      
      However just drinking unpasteurized milk (or milk products) does two things, one it changes your gut biome with good flora, and two it changes your gut biome with bad flora that makes you sick.
      
      Also we supplement vitamin D heavily in milk
      In the 1940s **ALL** bread used long rise yeast. Today most commercially available grain products that are leavened use short rise yeasts. Long rise yeasts break down phytic acid, which prevents uptake of certain minerals and vitamins. Short rise yeasts do not. Also folic acid is added to breads.
      
      Prior to WWII no one used Margarine if they could afford real butter. Today 1/2 the chiller space is taken up with fake butter products.
      
      We raise meat animals much different today than 40 years ago, egg layers are handled and fed differently, and vegetable production is different.
      
      I have no idea whether this is good or bad (in general. Some specifics clearly are one or the other) but our diets *have* changed significantly.
      
      * Looking at what has changed recently might be at least as profitable as looking at what changed so long ago that it allows for evolutionary changes.
      
      We sit more, and our sitting activities are more sedentary. The brain consumes a LOT of calories. It would be interesting to see the difference between someone watching a football game, listening to a radio mystery theater presentation, listening to a radio mystery theater presentation while crocheting, someone reading that same mystery in a book, and someone readying a physics book.
      
      Today both our heating *and* cooling systems are more efficient, meaning our bodies have less to adjust for.
      
      Heck, my two wimmen folk won’t even put on jackets in WINTER if we’re going to be in the car. In some cases they’ve left the fucking house with out them, and we live in the damn MOUNTAINS where weather can change quickly. Well, really close to the mountains anyway (as in we can see 14k peaks from our house).
      - Douglas Knight says:
        
        June 2, 2015 at 2:09 am
        
        Prior to WWII no one used Margarine if they could afford real butter.
        
        Prior to WWII margarine was made from whales.
- sov says:
  
  June 1, 2015 at 5:19 pm
  
  During periods of restricted calorie intake, the number of Calories burned goes down significantly, and as low as ~700/day is not uncommon.
  
  Hmm, [citation needed]. The research I’ve read hallmarks fasting with an increased RMR. As here.
  
  As an aside: you’re also on the hook to indicate that being heavier is more dangerous than having a lower RMR (which, as far as I know, has never been researched directly!). Fruit fly studies that knock out certain genes that increase metabolism indicate the fruit flies live much longer at a lower metabolism. If you believe the wear-and-tear theory of aging, one certainly might welcome a lower metabolism, assuming even that such a thing can be accomplished with diet alone.
- William O. B'Livion says:
  
  June 2, 2015 at 12:58 am
  
  The resting metabolism does not return all the way to normal when Calorie intake increases, but stays at a slightly reduced level.
  
  I’ve read in more than one place that this *isn’t* true, unless you lose significant muscle mass along the way, and that as you replace that muscle mass you will pick back up.
  
  The one caveat to this is that fat cells place a small but measurable caloric load and once they are gone that load goes too. So if it takes (about) 2 calories per day to sustain a pound of fat, and you lose 50 pounds of fat, that’s about 1/3 a snickers bar a day 🙂
  
  Oh, and it takes roughly 6 calories a day to sustain a pound of muscle. This is why people who starve out in concentration/prison camps seem to lower their metabolism so much–they’ve lost muscle mass *and* fat mass, and this significantly lowers their BMR.
EvolutionistX says:

May 31, 2015 at 1:59 am

Just a request for clarification–Are we talking about 100% pure cacao with no sugar, or some adulterated blend? Because if it’s the pure cacao, I wouldn’t be surprised to see some kind of effects–it’s kind of like eating coffee. But the pure stuff is way too bitter for the vast majority of people.
- Douglas Knight says:
  
  May 31, 2015 at 2:57 am
  
  The meta-analyses cover a hundred studies, with all kinds of different definitions of chocolate. Some look at dose-response curves, measuring dose in mg of flavanoids, ie, taking into account adulteration.
  - HeelBearCub says:
    
    May 31, 2015 at 8:50 am
    
    But can’t we also say that:
    more flavanoids -> more effect
    more simple carbohydrates -> more effects from simple carbohydrates
    
    That’s a genuine question, not a statement of knowledge. It seems truthy, though.
- William O. B'Livion says:
  
  June 2, 2015 at 12:46 am
  
  I have long been tempted to mix coffee grounds with peanut butter and smear the result between two slices of Chocolate Cake, and wash it down with cappuccino.
JME says:

May 31, 2015 at 12:29 am

Is there an advantage to using Bonferroni rather than Holm correction? (Assuming you’re doing serious research into which you’re putting hours of work anyway, and not just trying to mentally calculate a rule-of-thumb significant p value in lieu of someone who is using p values totally uncorrected for familywise error, in which case Bonferroni could be better because you can just take five seconds to think “30 effects? P < 0.05? It should be about P < 0.0017." or something.)
NZ says:

May 30, 2015 at 11:28 pm

I was gonna say, seems like a great argument against journalism in general.

Or more specifically, against the sacredness we ascribe to journalism, calling it “the fourth estate” and so on. Taking seriously the way journalists sit in front of images of the earth spinning around, talking in that strange cantillation that’s supposed to make them seem bored and disinterested, putting their mastheads in pretentious Gothic fonts and giving them ludicrously self-important names like “Sentinel” and “Chronicle” and “Guardian” and “Sun Times”.

For a while now I’ve been saying that since the only peculiar talent of the journalist is the willingness to go talk to strangers and write down what the strangers say, journalism would be more appropriately treated as an industrial product like raw millet or something, whose prime consumer ought not to be the general public but expert writers/bloggers who can analyze what was given to them and who don’t pretend not to have any biases of their own.
- Besserwisser says:
  
  May 31, 2015 at 10:37 am
  
  “Bild” doesn’t really fit in your naming scheme, meaning “picture” after all.
- Michael vassar says:
  
  May 31, 2015 at 10:38 pm
  
  Honestly, I’m conflicted WRT Journalism. My impression of it is much more favourable than my impression of almost every other institution I have become familiar with, e.g. Politics, law, medicine, education, charity, most of academia, most of science, etc, but I still think poorly enough of it to overwhelmingly avoid it in favor of blogs and books.
  - Tarrou says:
    
    May 31, 2015 at 11:39 pm
    
    See, and I loathe media personnel uniquely among all professions. They are to a man useless, hysterical fame-grubbers. I’d sooner sit to dinner with the Ayatollah Khomeini than a journalist. Even lawyers seem staid and ethical next to journalists. Venture capitalists look like the Pope. The Pope looks like someone who isn’t running the worlds largest child sex ring….so on and so forth.
  - nydwracu says:
    
    June 1, 2015 at 5:11 am
    
    How familiar are you with journalism?
    
    Having once tried to run a vaguely journalistic site, the pressure to latch onto the first thing you hear about and bang out any old shit about it because deadlines is surprisingly large. And I’d occasionally get a request from another site to take a conclusion, write the argument for it, and send it off to them. I suspect this is a common practice.
    - Nancy Lebovitz says:
      
      June 1, 2015 at 8:20 pm
      
      Additionally, the need to fill the same amount of space every day or week or whatever (or is this as much space as possible for online journalism?) means that there’s no way to say “nothing important has happened lately– we’ll let you know when we’re got something we think is worth your attention”.
      
      On the other hand, societies with low-quality journalism that’s not controlled by a government do seem to be better places to live than places with government-controlled journalism.
- Tracy W says:
  
  June 1, 2015 at 6:17 am
  
  The issue is that the main people who write about journalism are journalists, thus of course they are going to tend to ascribe sacredness to journalism.
  - NZ says:
    
    June 1, 2015 at 9:13 am
    
    If that were true it would be an understandable and containable problem. All professions have an interest in broadcasting their own importance.
    
    However, the average person ascribes plenty of false sacredness to journalism as well. They buy, hook line and sinker, the whole pomp and circumstance that’s constructed around journalistic presentation (after all, that’s what it’s designed for). The title “journalist” carries the imagined weight of someone who’s in the business of getting at the truth and, out of the goodness of his heart perhaps, disseminating it for the masses as a kind of public service. People uncritically cite news articles as evidence of real narratives happening in the world. People use the stilted and absurd output of journalists as the standard by which to measure the relative importance of real events.
    
    If it were any other way, no news organization could form, much less stay in business, because it wouldn’t be possible even for semi-retarded dupes to take any of it seriously.
- William O. B'Livion says:
  
  June 2, 2015 at 12:43 am
  
  against the sacredness we ascribe to journalism
  
  What “we” white man?
  
  Journalists, or at least the newspapers they grew from have always been festering piles of craptastical toxic nonsense.
  
  I hold that the *government* can’t censor a journalist, and that we *need* them to be honest, or failing that to be dishonest in very diverse way (so as to be able to triangulate their lies), but sacred?
  
  Those drunken egotistical whoresons?
  - NZ says:
    
    June 2, 2015 at 1:02 pm
    
    I dunno, the general “we”? I obviously don’t ascribe sacredness to journalism. Should I have said “y’all”?
    
    Anyway, you “need” them to be honest. Okay. Does that mean you expect them to be honest? That you hope they’re honest? That you think honesty is somehow ontologically related to what journalists do for a living?
    
    A diversity of dishonesty is exactly what I was calling for when I said that journalists should turn over their output not to the public but to writers and bloggers who are expert in just a few areas, who both will be able to scrutinize what the journalists give them and will not try to feign disinterested objectivity.
Alex says:

May 30, 2015 at 11:18 pm

This is more useful than the last post. I have at least learned that chocolate may really be good for me. My crazy aunt wasn’t wrong after all.
- zz says:
  
  May 31, 2015 at 5:12 am
  
  A few Christmases ago, I told my then-7ish-year-old daughter-of-a-cousin that chocolate came from a plant, was therefore a vegetable, was therefore healthy. Daughter-of-a-cousin excitedly tells her parents, both of whom research biology at Cornell, this. They were not amused.
  - Bryan Hann says:
    
    May 31, 2015 at 11:16 am
    
    I do hope you were! 🙂
  - Tracy W says:
    
    June 1, 2015 at 6:07 am
    
    Wine is just concentrated grapes.
    - Mary says:
      
      June 1, 2015 at 1:33 pm
      
      Containing chemicals that combat disease!
    - Doug S. says:
      
      June 4, 2015 at 3:26 am
      
      Wine is spoiled grape juice.
  - RCF says:
    
    June 1, 2015 at 9:40 pm
    
    Even putting aside the fact that “vegetable” is generally taken to mean something more than “plant”, why would you say that everything from a plant is healthy? Ricin is from a plant. (And apparently not a word, according to Chrome’s spellcheck).
    - William O. B'Livion says:
      
      June 2, 2015 at 12:33 am
      
      I don’t think it’s Chrome, I think it’s your underlying OS or spell check library. Whatever the default is on MacOS 10.10.3 doesn’t flag ricin as a misspelling.
- Deiseach says:
  
  May 31, 2015 at 4:20 pm
  
  At this stage, with all the screaming headlines that change from day to day about what is good for you/will kill you instantly, I have come to the conclusion that:
  
  (1) The only permissible diet is rainwater and moss. Everything else will kill you stone dead, even if last week you were told it was the only possible healthy option.
  
  (2) Our grannies had the right idea: “A little of what you fancy does you good” (e.g. chocolate) but also “You can have too much of a good thing” (don’t sit down and eat a whole boxful of chocolates in one go).
  
  Bohannon is right about the media turning pop-science results into simplified headlines that get blared about and copied by one outlet from another, but doing it by setting up a study that makes it sound – at least to this ignorant layperson – as if it could equally conclude “All research is just as crappy” is not the way to do it.
  - Tarrou says:
    
    May 31, 2015 at 11:36 pm
    
    Amazing how a hundred years of intensive research, billions of dollars and tens of thousands of careers spent get us to “Grandma wasn’t a tard after all, just don’t be a pig”!
    - Robbbbbb says:
      
      June 1, 2015 at 2:32 pm
      
      And thus you reach the essential core of conservatism: Received wisdom is there for a reason. Ignore it at your peril.
      - William O. B'Livion says:
        
        June 2, 2015 at 12:35 am
        
        And from that the neo-cons: Using statistics and research to show that your grandma was right all along, and using the government to make you do what she suggested.
- Tracy W says:
  
  June 1, 2015 at 6:06 am
  
  Yes, I agree. It reminded me that I had uneaten chocolate and raspberry in my handbag. It was good. More food posts please Scot.
  - William O. B'Livion says:
    
    June 2, 2015 at 12:39 am
    
    A friend of mine is mortally obese and is currently on a medically supervised 800 calorie a day diet (so far he’s lost 106 pounds and is still in the danger zone, and is somewhere between 1/2 way and 2/3rd to his target). He was complaining last night that he can’t even watch television, because the food ads hurt too much.
    
    Food is his only vice, he has never *tasted* alcohol, doesn’t smoke or use recreational pharmaceuticals etc.
    
    It’s almost enough to make me start smoking again just to be on the safe side. OTOH, I drink a bit, so I guess I’m safe there 🙂
Irrelevant says:

May 30, 2015 at 10:51 pm

Conclusion 4: P-Values Are Stupid And We Need To Get Rid Of Them

Wouldn’t getting rid of p-values prevent “p-hacking”? Maybe, but it would just give way to “d-hacking”.

“P-values are stupid and we need to get rid of them” isn’t usually meant as “…in favor of a slightly different statistical standard”, it’s meant as “…in favor of Replication Or GTFO.”
- David Friedman says:
  
  May 31, 2015 at 3:24 am
  
  Getting rid of p-values might be going too far. But the “experiment” does demonstrate a simple version of one of the problems with a p-value. Unless the experimenter has specified his model in advance, he can use a p-value to make the result look much better than it is by checking a lot of different models and reporting the one that looks best.
  - Irrelevant says:
    
    May 31, 2015 at 3:41 am
    
    Right, I (and you, Scott, the average blog reader here, and Scott’s hypothetical complainant) totally understand that p-value 0.05 is rendered meaningless when your “1-p”-value is (0.95)^18. And Scott even ends that section in the right place with his “STOP TRYING TO BASE CONCLUSIONS ON ONE STUDY” link, albeit after burying the lede for a couple thousand words.
    
    Where I take issue here is I think he’s misrepresenting the sentiments of his hypothetical complainant. People don’t zero in on p-value-based validation because they think p-values are a uniquely worthless method of statistically verifying truth, they zero in on p-values because they think they are a representatively worthless method of statistically verifying truth.
  - Adam says:
    
    May 31, 2015 at 9:22 am
    
    This is why Scott mentioned using the family-wise error rate (Bonferonni effect). It prevents you from surveying 100 meaningless subgroups to find 5 with p < 0.05.
Steve Johnson says:

May 30, 2015 at 10:45 pm

Do you know why you can trust Wikipedia better than news sites? Because Wikipedia doesn’t obsess over the single most recent study. Are you starting to notice a theme?

For me, the takeaway from this affair is that there is no one-size-fits-all solution to make statistics impossible to hack. …
At the end of the day, you have to actually know what you’re doing. Also, try to read more than one study.

Agree 100% that there’s no substitute for human judgment.

However, the problem isn’t so simply solved by looking at more than one study. If every individual study in an area can be fraudulent through a combination of automated judgment based on pre-announced rules and a strong desire of all the scientists in the field to produce certain results and to not report on results that don’t fit the narrative then aggregating a bunch of studies doesn’t actually help. Speaking probabilistically you don’t decrease error by adding more cherry-picked results to a data set.
- Adam says:
  
  May 30, 2015 at 11:18 pm
  
  I think you can be reasonably assured, at least in most problem domains, that there is a lesser chance that 70 studies are all reporting something (intentionally or unintentionally) fraudulent than there is that 1 study is doing so.
  - Steve Johnson says:
    
    May 31, 2015 at 1:20 am
    
    I am not assured at all.
    
    If fraud is done by an individual for careerist motives he had a specific result that he expected would help his career. This is pretty bad for a field – it means that there’s no checking going on or that there’s no checking of results when the results come out in a certain way. That will inexorably push the field into total error.
    
    If the field is totally indifferent to fraud because the results are the approved results then the whole field is completely invalid and as the saying goes “not even wrong”. Cliques form and push each other’s publications as they take over funding gatekeeper roles. The whole field can agree on a consensus that no outsider can replicate. Anyone who might stray doesn’t get published. Hey, the article failed on peer review, right?
    - Adam says:
      
      May 31, 2015 at 9:19 am
      
      Unless the probability of fraud for each researcher is 1, math assures you.
      - Steve Johnson says:
        
        May 31, 2015 at 11:57 am
        
        The probabilities of papers in a field being fraudulent are correlated.
        
        The information content of a fraudulent paper is zero.
        
        Much like the point of the original post there’s no statistical method that can be used to extract information in this situation – literally the only piece of information that’s relevant is what probability you assign to papers in a field being fraudulent.
      - RCF says:
        
        June 1, 2015 at 9:35 pm
        
        In the broadest sense of “evidence”, it is fallacious to assert that a fraud provides none. How someone lies can be quite informative.
Steve Johnson says:

May 30, 2015 at 10:37 pm

He does point out a pretty huge problem – you can fake results quite easily that fit through the filters that are set up to prevent fake results from getting published. Why? Because the filters are pre-known.

It’s actually ironic that he cheats in proving that you can cheat in that he takes a real effect, subjects it to cheating methods and finds that the results are good enough to publish using cheating methods. Logically he actually told us exactly nothing – actual effects measured with cheating methodology will show up as actual effects – the distinction is that cheating methods are supposed to show something with no effect as significant.
- Deiseach says:
  
  May 31, 2015 at 4:15 pm
  
  Well, he seems to have started out with the aim “To prove any old crap will get published” and succeeded, so I don’t see how that counts as cheating 🙂
  
  It was, in fact, a fairly typical study for the field of diet research. Which is to say: It was terrible science.
  
  Seriously, though – it does make it sound as if you can set up a study and run it by the approved methods and the results will be a pile of horse manure but you can still get them published because you produced your rose fertiliser by the approved methodology. And that this doesn’t just affect diet research, but could possibly apply to all research?
  
  I don’t know; there’s something dishonest in this, and I don’t mean the “Suckers! You fell for my prank!” That part about his collaborator who very definitely has an axe to grind about the “whole food fanatics” and “dietary pseudoscience” sounds like the whole project may have been skewed from the start, so that even “The purpose of this is to demonstrate you can get publicity for any old nonsense so long as you slap a scientific title on it” may not have been what was actually going on (if Dr Frank, for instance, was running it to show that the whole food people were all crazies and cranks).
  - thepenforests says:
    
    May 31, 2015 at 6:14 pm
    
    Well…he got it “published”. There are good journals and then there are crappy journals, and it’s worth distinguishing between the two. But there are also money-making scam “journals” which will publish practically anything and shouldn’t be considered scientific in any sense of the word. It looks like this was the third type of journal.
  - Tarrou says:
    
    May 31, 2015 at 11:26 pm
    
    Dodgy results have a better chance of getting into better journals in almost perfect correlation to how well they flatter the political prejudices of the dominant group represented by the journal.
AR+ says:

May 30, 2015 at 10:30 pm

Math nitpick: 10 kilodeaths is about 600 microHitlers, assuming you’re using the 17 megadeaths that can broadly be called “Nazi mass murder victims,” counting deliberate Russian starvation victims but not combat deaths.
- Desertopa says:
  
  May 31, 2015 at 3:47 pm
  
  On the other hand, the 10,000 deaths are per year, whereas Hitler’s 17,000,000 were racked up over more than a decade, albeit not at a flat rate per year.
- Error says:
  
  May 31, 2015 at 10:42 pm
  
  I’m confused. Assuming you’re talking about what Wikipedia refers to as the Holodomor, how do those get laid at the feet of the Nazis for purposes of Hitler calculation?
  - Nornagest says:
    
    May 31, 2015 at 11:17 pm
    
    AR+ is probably talking about the Hunger Plan.
Lightman says:

May 30, 2015 at 10:23 pm

This is the obligatory “less blogging, huh?” post.
- Scott Alexander says:
  
  May 30, 2015 at 10:31 pm
  
  I’ve made 11 posts in the month of May. Compare to 16 in January and another 16 in February. So my output has declined about 30%.
  - anon says:
    
    May 31, 2015 at 2:54 am
    
    STOP BLOGGING
    - Bugmaster says:
      
      May 31, 2015 at 7:18 am
      
      DON’T LISTEN TO THAT GUY
      
      🙂
      - Citizensearth says:
        
        May 31, 2015 at 8:13 am
        
        Listen to this guy
  - Anthony says:
    
    June 1, 2015 at 4:40 pm
    
    Someone on an previous post (the links one?) noted that you promised “less blogging”, not “fewer blogging”. As your output has become less Moldbuggian in length, I think you’ve made good on your promise.
R. Donald James Gauvreau says:

May 30, 2015 at 10:22 pm

Dang it, I feel like I may as well just stop reading the news altogether. If it’s interesting or important, it’ll pop up here at some point.

“Did you hear about the US invasions of Iran, Russia, China, and Quebec?”

“Scott Alexander hasn’t mentioned it. Must not be important.”
- zz says:
  
  May 31, 2015 at 5:03 am
  
  I’ve stopped reading the news. Life satisfaction immediately increases.
  
  Mark Manson gave up politics and sports. Stress is reduced, time is saved, and attention is better allocated
  
  Guess what happened when this guy gave up on the news.
  
  As discussed in the previous links thread, you should probably expect most of your news articles to deceive you to some degree.
  
  I am unaware of a single case of going from non-newsreading to newsreading that improved anyone’s life.
  
  JOIN OUR ARMY THAT IGNORES CURRENT EVENTS, THEREBY BECOMING HAPPIER AND AT LEAST AS WELL-INFORMED AS THOSE PEOPLE WHO HAVE THEIR HEAD STUFFED FULL OF GARBAGE.
  - Neike Taika-Tessaro says:
    
    May 31, 2015 at 8:22 am
    
    I bow to your allcaps wisdom. ~~Where to I sign up?~~
    
    Anyway, I ignore the news, too, and essentially let other people filter it for me. It’s not a perfect remedy for constantly getting frustrated at journalism (especially political journalism) and it comes with its own unique problems, but I can confirm that my life definitely feels better without it.
    
    \o/ So I endorse this recruiting effort.
    
    (My one boyfriend genuinely seems to enjoy the news (and political talk shows), though, but I haven’t figured out how that works, yet. He certainly doesn’t trust those sources or respect the kindergarten-level bickering that tends to happen in talk shows, but he’s also not sitting there laughing. Mysterious, mysterious…)
    - Alexander says:
      
      June 9, 2015 at 6:29 pm
      
      One must note that being able to ignore the news is a privilege enjoyed by those living in the developed world. If I ignored the news where I’m living in the past 3 years, I would be definitely much, much poorer, and also conceivably dead.
  - Error says:
    
    May 31, 2015 at 7:15 pm
    
    I’m in this camp. Keeping up with anything normally reported in the news has a net value comparable with that of, say, voting. If something genuinely important happens, somebody I know will mention it. Even that filter is often insufficient.
    
    I suppose it might be different if news reporting was still a brief nightly show in which irrelevancies won’t fit. But the modern 24/7 firehose spouts mostly black water.
    - James Picone says:
      
      May 31, 2015 at 7:43 pm
      
      Iunno, voting can have some effects. Quota for senate elections in the state of Australia I live in was 148,348 votes. Contributing 1/150,000 towards one of 70 senators seems relevant to me, especially given that senate balance here is generally quite tight, to the point that one or two senators difference does matter.
      
      My first-preference party in the Senate got ~73,000 primary votes in 2013, the two major parties got ~280,000 and ~230,000, an independent got ~250,000. Seems important to me.
      
      (Put another way: first past the post suuuuuuucks)
  - Mai La Dreapta says:
    
    June 1, 2015 at 8:51 am
    
    Some time ago I realized that nearly 100% of news consists of things that (a) don’t affect me or (b) which I can’t do anything about. If it affected me directly, I wouldn’t need the news to tell me about it, and on the odd chance that I can do something useful about it, then I’ll hear about it from someone else.
    
    So, yeah, ignore the news. It’s worse than useless.
  - Edward Scizorhands says:
    
    June 1, 2015 at 10:33 am
    
    There’s a bunch of things that I think are valuable to do while growing up, but not valuable as an adult. Unfortunately doing them as a child means your parent did them.
    
    My dad spent a lot of time watching the stock market. I believe I learned a lot from this, since I have lots of basics of financial markets ingrained.
    
    But as an adult chasing the stock market is pretty useless. Especially because my earnings are much bigger than my parents, time spent improving my career will pay off more than increasing my returns from 4% to 5%.
    
    However, my kids have a much worse understanding of markets than I do. Maybe that’s okay, but it still feels like I’m missing out on teaching them something big.
    
    I’m not sure how to do that, because even if I watched FNN[1] at each market close, my kids wouldn’t pay attention, because there are enough other things to do.
    
    My dad also watched the news and politics a lot, and even assuming that a lot of it was misleading, it still led to a lot of discussion about the world. I’m having slightly more success talking about the daily news over the dinner table.
    
    [1] Referring to FNN shows how out-of-date I am.

Blogroll

Economics

Effective Altruism

Rationality

Science

SSC Elsewhere

Archives

That Chocolate Study

185 Responses to That Chocolate Study

Meta