Open threads at the Open Thread tab every Sunday and Wednesday

The Wisdom of the Ancients

Were The Victorians Cleverer Than Us?, asks a new study by Woodley et al that has gotten name-dropped in places like The Daily Mail and The Huffington Post.

Meanwhile, Betteridge’s Law of Headlines continues to warn us that “Any headline which ends in a question mark can be answered by the word no.”

On first glance, the paper looks solid. It investigates simple reaction time, a measure which is known to be correlated with g, the mysterious general intelligence which is supposedly measured (to some degree) by IQ tests. People have been experimenting with simple reaction time for over a century now, so the paper asked the relatively simple question of whether it has changed over that century. They found that it had: it had gone up, signifying a decrease in general intelligence. Their explanation was dysgenics.

People have known for a long time that high-IQ people have fewer children than low-IQ people, so it might make sense genetically to believe that each generation becomes a little dumber. This pattern has stubbornly refused to appear: instead, every generation has had significantly higher IQ than the one before, an observation called the Flynn Effect. This has been attributed to various things, including better nutrition, child-rearing, and education.

What the authors of this paper do – and it’s pretty clever – is say that the Flynn Effect is an environmental increase in IQ which has hidden a simultaneous genetic decline in IQ. They try to prove it by saying environmental and genetic factors affect IQ in different ways, and that genetic factors are more likely to affect certain features like reaction time – a pattern which is called a Jensen Effect and which is on relatively solid ground. Because they find reaction time is declining, probably people are becoming genetically stupider and the only reason we can keep having a civilization at all is because our environment is getting better – which is too bad, since our environment may have stopped doing that.

All the theory here sort of checks out, except for the part where they say IQ changed 15 points in a hundred years, which is just a little bit faster than any responsible person expects evolution to progress. People critique the idea that Ashkenazi Jews could have shifted fifteen points in nine hundred years on the grounds that it’s too fast. So let’s take a closer look at their data.

Only two of their sixteen studies come from the Victorian Era: Galton 1889 (n = 3410) and Thompson 1895 (n = 49).

Francis Galton, a brilliant Victorian scientist who was a half-cousin of Darwin, is the source of 98.5% of our Victorian reaction time data – not to mention the concept of reaction time itself, several statistical tools including correlation and standard deviation, the use of the survey in data collecting, the term “eugenics”, the entire science of meteorology, hearing tests, the first study on the power of prayer (he prayed over random fields to see if the crops there grew higher; they didn’t), fingerprinting, the scientific investigation of synaesthesia, and a horrible warning about how not to do facial hair.

Galton’s Data A Century Later, published in 1985, tells us a little about how he gained his ground-breaking reaction time statistics. He set up a laboratory in the Science Galleries of the South Kensington Museum. There he charged visitors to the museum three pence ($25 in modern currency after adjusting for inflation) to be measured by his instruments, a process he advertised as “for the use of those who desire to be accurately measured in many ways, either to obtain timely warning of remediable faults in development, or to learn their powers.” Over the course of nine years, he attracted about nine thousand curious individuals, three thousand of whose data managed to make it into the current meta-analysis.

His colleague in Victorian reaction-time measurement was Helen Thompson Woolley, an American psychologist who published a 1903 dissertation titled The Mental Traits of Sex: An Experimental Investigation of the Normal Mind in Men and Women (it was, apparently a simpler time). With an optimism bordering on the incredible, Wikipedia notes that “Before Woolley, research on sex differences was heavily influenced by conjecture and bias.”

Woolley writes of her sampling technique:

“In making a series of tests for comparative purposes, the first prerequisite is to obtain material that is really comparable. It has been shown that the simple sensory processes vary with age and with social condition. No one would question that this statement is true for the intellectual processes also. In order to make a trustworthy investigation of the variations due to sex alone, therefore, it is essential to secure as material for experimentation, individuals of both sexes who are near the sae age, who have the same social status, and who have been subjected to like training and social surroundings. Probably the nearest approach among adults to the ideal requirement is afforded by the undergraduate students of a coeducational university. For most of the the obtaining of an education has been the one serious business of life.

The individuals who furnished the basis for the present study were students of the University of Chicago. They were all juniors, seniors, or students in the first year of their graduate work. The subjects were obtained by requesting members of the classes in introductory psychology and ethics to serve.”

She found (a finding replicated by all later studies and now considered essentially proven) that women have slower reaction times than men (interestingly, this difference does not correlate with IQ) – but more relevant to the current meta-analysis, she found the same generally fast reaction times as Galton.

The modern studies, keeping with the zeitgeist of the modern age, are much less colorful. I only looked into the two largest: one Scottish, the other Australian. Here’s what the Scottish study says of its methodology:

The study was originally located in the Central Clydeside Conurbation (Figure 3), a socially heterogeneous and predominantly urban region, including Glasgow City, which is known to have generally poor health. Two-stage stratified sampling was used to select subjects. For the regional sample, local government districts were stratified by unemployment and socio-economic group data from the 1981 Census and 52 postcode sectors were systematically selected from these with a probability proportionate to their population size. The same postcode sectors were chosen for all three cohorts. The sampling frame used for individuals was Strathclyde Regional Council’s 1986 Voluntary Population Survey—an enhanced electoral register that provides details of the age and sex of all household members.3 Individuals were selected from the 52 postcode sectors within each age cohort with a systematic selection with a prescribed sampling interval from a random start.

I was getting bored by the time I made it to the Australian study, but I managed to keep my attention on it long enough to note the following sentence:

Persons selected at random from the Electoral Roll [of Canberra] were sent a letter informing them about the survey and saying that an interviewer would contact them soon to see if they wanted to participate.

Look around you. Just look around you. Have you worked out what we’re looking for yet?

That’s right. The answer is selection bias.

Back in the Victorian Age, science was done by aristocrats and gentlemen who drew their subjects from their own social groups. There were no poor people in either study, because getting poor people to participate in an experiment would require finding some poor people, who probably smelled terrible and lived in areas where there were no good restaurants.

In the Modern Age, everyone is excruciatingly Socially Aware, and studies go out of their way to look at Disadvantaged Disempowered Disprivileged Populations so their results can serve as Cutting Social Commentary.

Galton’s study population was visitors to a science museum in the posh part of London who were willing to pay him $25 to participate. Thompson’s population was University of Chicago philosophy students. The two modern studies are random selections double-checked to make sure they don’t undersample the poorest sections of the population.

So, uh, congratulations, authors of this paper! You have successfully proven that the average member of the population is dumber than wealthy science dilettantes and students at elite colleges! Go pat yourself on the back!

In case we need more rigor: according to The National Center for Education Statistics, about 2.3% of Americans went to college in 1900. In a perfect meritocracy maybe only the smartest people would go to college, but we’re not a perfect meritocracy. Would it sound about fair to say that the people in college at the time were a sample of the 20% or so of the smartest Americans?

Because the IQ of someone at the 80th percentile is 113 – that is, exactly enough to explain the 14 point IQ “drop” that Woodley et al found.

This is a little harder to do with Galton’s science museum visitors. The 1985 commentary on Galton’s data tells us:

As would be expected of a group of paying testees being measured in a museum, a sizable portion of Galton’s sample consisted of professionals, semiprofessionals, and students. However, as may be discerned in Tables 10 and 11, all socioeconomic strata were represented.

Tables 10 and 11 turn out to be a gold mine – I worried the records of exactly who took the tests would be lost, but as you might expect of someone who basically invented statistics single-handedly and then beat Darwin in a debate about evolution as an encore, Galton was very good at keeping careful data.

This site tells me that about 3% of Victorians were “professionals” of one sort or another. But about 16% of Galton’s non-student visitors identified as that group. These students themselves (Galton calls them “students and scholars”, I don’t know what the distinction is) made up 44% of the sample – because the data was limited to those 16+, I believe these were mostly college students – aka once again the top few percent of society. Unskilled laborers, who made up 75% of Victorian society, made up less than four percent of Galton’s sample!

So this discredits this meta-analysis way beyond any need for further discrediting, but since I can’t help beating a dead horse…

Let’s talk about race. We know that studies find white people usually have faster reaction times than black people – in fact, a lot of the voluminous and labyrinthine research on race and IQ hinges on this fact. We thankfully do not have to enter the minefield of trying to figure out the causes of this discrepancy (biological vs. environmental vs. social) – we can just take it as a brute fact.

What percent of Galton’s 1889 science museum visitors do you think were non-white? What percent of Thompson’s 1895 University of Chicago students? Approximately zero? Sad to say, non-white people were as likely to be exhibits in the science museums of the day as visitors, and according to no less a figure than W.E.B. DuBois in 1900 there were only 2600 living black Americans who had graduated college.

I looked them up some stats on the sample areas for the modern studies – 6% of Glasgow is non-white, and about 12% of Canberra. So aside from selection bias affecting intelligence which affects reaction time, we have selection bias affecting race which affects reaction time.

May I just say how annoyed I am that I have to remind reactionary eugenicist IQ researchers, of all people, to pay attention to race? YOU HAD ONE JOB!

Finally, there’s significant IQ differences within populations of the same race and country simply due to migration effects. An analysis of IQs across Great Britain finds that the highest scores are in London (102) and the lowest in Scotland (97). Almost all this meta-analysis’ Victorian data came from London (Galton’s museum in Kensington) and the largest source of modern data (making up about half of the whole, and being unusually high in reaction time) came from Scotland (and Glasgow isn’t even the nice part of Scotland). The 5 point London – Scotland difference explains over a third of the “difference between Victorians and moderns” found in this study.


There is some really excellent IQ research out there that everyone should be reading, but this is not it. Please please please don’t cite this study as evidence for dysgenics or the decline of civilization.

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

77 Responses to The Wisdom of the Ancients

  1. Multiheaded says:

    Beautiful. Been thinking of making some kind of general attack on the reactionaries’ wilful ignorance w.r.t. enormous classist and racist selection bias inherent in EVERY account of the past (not just formal studies, of course) – but such blow-by-blow detailed takedowns is what we need, not my ramblings.

    • Konkvistador says:

      Wrong. This study was greeted mostly sceptically by reactionaries. See HBD chick.

      • Multiheaded says:

        I’m talking about all that general noise re: the past being full of brilliant and learned Ubermenschen. Because you people certainly make a lot of it, lol.

      • suntzuanime says:

        Yeah, I laughed at the “you had one job” line, but in fact HDB chick knows her job and does it with grim, fanatical determination.

    • Scharlach says:

      Greg Cochran, JayMan, and hbdchick have all expressed skepticism about this study for the same reason discussed here. You honestly don’t think the HBD crowd doesn’t understand this very basic sampling issue? And anyway, as anyone working in academia knows, research goes forward on incomplete or faulty research. Stop pointing and sputtering.

  2. von Kalifornen says:

    I am invoking my power as Dread Lord to forbid anyone from talking about Richwine.

  3. Deiseach says:

    I would also assume that the tests, the equipment used, the measure of time, etc. is more sophisticated than in Galton’s day. I’d be more inclined to follow this line if they gave a random sampling of modern people the exact test Galton used with the same measuring equipment, then see if the results match up.

    If modern people are indeed stupider, then surely the same test on the same equipment will bear that out. If, on the other hand, you gave the same set of people (a) test on Victorian-standard (b) test on modern standard and got two different results, maybe it’s an artefact of the technology?

    Has anyone tried this, rather than “I’ll make a list of all the studies, take an average of the averages they give, then say this demonstrates we’re all thick”?

  4. Deiseach says:

    “Galton calls them “students and scholars”, I don’t know what the distinction is”

    I believe in Victorian times, “scholar” meant simply someone attending school – what we would nowadays call a “school pupil”, that is, still either in primary or even secondary education, but not a college student.

    Again, given that the legal minimum age for leaving school (and going into employment ) was eleven in 1893, obviously Galton’s sample was – as you point out – from higher socio-economic classes. If you could afford to keep your twelve year old going to school instead of earning a wage to contribute to the family finances, you’re one of the better-off (even by a small margin).

    • Deiseach says:

      And with regard to your 1900 American college students being amongst the most intelligent 20% of the population, that may or may not be so, but at least in the United Kingdom, there was the venerable tradition of the Gentleman’s Third – that is, the Right Honorable John Smith or Lord Thomas Brown went up to Oxbridge for four years of rowing or cricket, partying, mixing with his peers (quite literally in some cases, going by the courtesy titles) until he was legally of age and went home to the family estate to dabble in politics as the local Member of Parliament or some harmless profession until it was time for him to inherit 🙂

    • Ron Williams says:

      Looking up Wikipedia (I know… 🙂 ) there’s a hint that ‘scholar’ referred (refers?) to what we would know as a ‘professional academic’ – someone who knows how to gather and present research, not necessarily in the sciences, but with rigor, while a student is merely someone in active study.

      Hence the term ‘scholarly’ to refer to a well written monograph, say, with correct and complete references to primary sources and with conclusions supported by logic (unlike the paper being discussed above :/)

  5. I would think that driving has been selecting for at least one of fast reflexes and prudence, though perhaps for not enough time to make much difference.

    • Ronak M Soni says:

      Um, ~.01% of people die from car accidents yearly. I honestly doubt that’s a selection pressure.

      • gwern says:

        0.01% doesn’t sound like a lot to you? Over a lifetime of ~75 years (random life expectancy), that’s a rough risk of 1 – 0.9999^75 = 0.8% ~= 1% risk of dying in a car accident. Not trivial if it correlates with anything significantly.

        Of course, the major problem is that things like video games ought to be swamping any such selection pressure (in the short term).

        • Is anything known about whether video games improve people’s reflexes in general?

          • gwern says:

            ‘In general’? Dunno what that would mean for reflexes (would something like the gag reflex count? or the knee-kicking reflex?).

        • I was thinking about reflex speed for learnable skills.

        • Ronak M Soni says:

          There have been around four or five generations since cars became common no? I was thinking of that when I said .01% wasn’t enough to add up to a selection pressure.

          Another thing that springs to mind now is that because of medical facilities any pressure is not quite as well-correlated with being bad at driving as it should be, even given enough time.

          (Now I want to start looking at the math and see how well my intuition about time bears out. I’ll post again if I’m able to conclude something soon, though that’s unlikely to happen in less than a few months.)

          Re Nancy’s question: I don’t know about the specification about learnable skills, but the interesting question might be about the same sort of reaction times as Galton measured.

          • gwern says:

            I don’t know if anyone’s tested the exact apparatuses used by Galton or Thompson, but I would be very surprised if the existing research on eg FPSes and reaction time didn’t apply to them nicely.

          • Rm says:

            People of less than 40 years old are the relevant group here, since older ones largely don’t contribute to reproduction.

        • Kibber says:

          Are you sure you’re really interested in the possibility of dying from a car accident over the course of a *lifetime* here? We’re talking selection pressure, so I imagine the meaningful figure would be more like 0.9999^30 , meaning total ~0.3% chance of affecting one’s contribution to the gene pool.

          • gwern says:

            No, because parents meaningfully affect the prospects of their kids, and it seems like grandparents may as well. With the increasing investment into raising and education over the same time period as cars came along, this should be even more true than before.


          I was expecting the proportion of those too young to be likely to have children to be higher, and likewise for the proportion of those with high blood alcohol.

        • Kibber says:

          Err… even if not having alive parent(s) does affect a child’s chances of procreation in a negative way (which is debatable), your calculation basically suggests that dying before conceiving/giving birth to a child reduces one’s contribution to the gene pool to the same extent as dying after the child is born, which doesn’t make any sense.

          • gwern says:

            I’m not saying that my calculation is perfectly accurate; I’m pointing out that putting fatal events into a per-year risk is a rather misleading way of expressing net selection pressure.

      • Army1987 says:

        Among people who die while still in reproductive age, the fraction is much larger than that.

  6. Konkvistador says:

    I certainly didn’t after lookimg into it. Henry Harpending made basocally the same criticism you did in the comment section of awest Hunters recently. The sheer 15 point drop made me go ‘nope’ right away. If I remember Lynn right we should be losing about one IQ point per generation of our genetic potential in the West, two thirds of that due to mass immigration which wasn’t a big factor for Britain before the late 70s. The dysgenic drop can’t be more than about 3 points.

  7. Konkvistador says:

    You are wrong on making comparisons to Askenazi evolution. Smarts are easy to lose hard to build.

    • Randy M says:

      Yes, that stood out to me. Harmful mutations are a lot easier to arise than beneficial ones. Analogies to programing aren’t entirely off base, and if you open up code and randomly cut or copy and paste some portions, only the fact that you are thoroughly testing it will keep it from being gibberish most of the time.
      Although it does still seem like a good drop for a population in that time.

    • Michael says:

      this is mighty fine, but you end up being in a not so much supportive environment, hence it’s not that easy for unfavorable mutations to prevail.
      this is why the average is the average. you actually don’t take too much risk. this is for example how the economy works right now. anything out of the ordinary has to be kept down. this just began to change in recent times.

      also, you should read up on the gaussian distribution. it’s as likely to obtain a harmful mutation as is with a favorable mutation.
      in the end there is no new mutation at all and the pool is rather stable for some time.
      also, Randy’s analogy sucks: your premise is, that this system’s not oughta work or it has to work totally at random. gene mutation has a lot to do with stability while your analogy does not depend on any in any way. this is selection-variation, while your analogy is not (at least as long as you dont incorporate some kind of selection)

    • Charlie says:

      Humans have a mutation rate of about 5*10^-8 per base pair per generation. So attributing a hypothetical drop in genetic intelligence-stuff to mutation (which as you correctly note has an easier time disrupting smarts than building them) vs. changing allele frequency (which doesn’t care about smarts, once the genes are in the gene pool), seems ambitious.

  8. Mary says:

    All analyses of the Flynn Effect I’ve seen say that it’s concentrated in one part of the test: namely the sort of place where you ask a child what do a rabbit and a dog have in common. If you answer “mammals” you are “correct.” If you answer they have four legs, you get some points. If you say they both feature in rabbit hunts, you get none. Even though all three answers are indeed something they have in common, so what it’s really testing for is lack of diversity and tendency to think like a test maker.

    • Deiseach says:

      Yeah, I tend to think that at least some of the “increase in IQ” is down to mass exposure to standardised testing through the educational process and learning how to take tests, what kinds of answers are the ‘right’ answers, and how to pass them.

      Saying “A rabbit and a dog are mammals” is the kind of answer you learn in school, not something that would natively spring to mind. You have to know what a mammal is in the first place (and I’d fail that kind of test off the bat, because I’d be thinking more ‘four legged’ than ‘oh yeah, warm blooded live young bearing’).

      • Mary says:

        I’d be tempted to say, “They’re quadrupeds, they are domestic species that have become invasive species and serious pests as a consequence, they’re obligate aerobes, they’re carbon-based life forms, they’re mammals.”

    • Randy M says:

      Hmm, really? I’d think that’s rather not a sign of intelligence, just memory (which certainly isn’t unrelated) and education. Maybe a question more like “Rabbits and dogs are both animals that give birth to live young. What other similarities do you expect them to have?” would involve more analytical ability.

      • Mary says:

        I think I would phrase it “A grokin and a geeple are both animals that give birth to live young. What other similarities do you expect them to have?”

        • Anonymous says:

          I’d add another piece of evidence, such as that they both have hair, since there are other animals aside from mammals which are viviparous.

    • Douglas Knight says:

      It is true that the Wechsler subtest on which the effect is largest is Similarities, but this is exceptional. The general trend is that abstraction predicts the strength of the effect. Flynn first noticed the effect on Raven’s Progressive Matrices, the most abstract test, and that remains the whole test on which the effect is largest (though the effect is just as strong on the Similarities subtest).

  9. jimrandomh says:

    It’s even worse. There is another, entirely separate line of argument which demolishes this argument equally well. See, when you set up a reaction-time test, there’s some latency added by your test equipment. As long as you’re comparing reaction times between two subgroups of your study population, rather than comparing against other studies, you don’t care.

    Well, it turns out that a computer-based test is much more likely to add extra latency than a non-computer-based test. Here is John Carmack talking about latency, in the context of getting rid of it to make VR work; he starts from a baseline of 64ms (but real-world setups are sometimes much worse than that baseline). My take is that every reaction-time test conducted after 2000, unless it explicitly talks about its calibration strategy, should be assumed to have an unknown amount of extra latency somewhere between 30-150ms added uniformly to all its measurements, for meaningless technical reasons, while reaction-time tests conducted before then should not be assumed to have that extra latency. And this is, gosh, not far off from the entire effect size of the paper.

    • gwern says:

      The paper was linked, you know, so you can read it and see for yourself:

      pg4; of the 14 studies used, only 2 were post-2000 (2002 and 2004).

      And if that were driving the effect, you would see a sudden discontinuity ~2000 in the plotted RTs (which you clearly don’t in fig1/pg5), and anyone checking Galton-style apparatuses would also be reporting large differences from that alone.

    • Deiseach says:

      Exactly! Whatever about the social sciences, any exposure to a hard science discipline teaches you that unless you calibrate properly, you will get different results and that different equipment gives different readings.

      I would expect a Victorian micrometer and a modern one to give me different readings, so I would expect a Victorian mechanical measure of reaction time and a modern computer-based measure to give different readings.

      Swear to creation, before they’re allowed write let alone publish a paper, these guys should be made set up and perform a series of titrations to teach ’em about the kinds of errors you get when going just a drop too much/little. Once they’ve got that into their noggins, then they can start drawing conclusions from “results we got and results somebody in Australia got and results somebody ninety years ago got” 🙂

  10. Randy M says:

    “They try to prove it by saying environmental and genetic factors affect IQ in different ways, and that genetic factors are more likely to affect certain features like reaction time – a pattern which is called a Jensen Effect and which is on relatively solid ground. Because they find reaction time is declining, probably people are becoming genetically stupider and the only reason we can keep having a civilization at all is because our environment is getting better – which is too bad, since our environment may have stopped doing that.”

    Is prove here your word or theirs? Surely explain would have been better.

  11. Douglas Knight says:

    In case anyone else made the same mistake: Woolley and Woodley are different names.

    Here or here is the scatterplot. It is probably worth checking the three non-victorian fast results to see how they sampled. It doesn’t look to me like it rules out Jim’s hypothesis.

    • Douglas Knight says:

      Actually, Scott already looked at one of the fast studies, namely the Australian one. He says that it’s so much slower than the Victorian studies because it wasn’t selective in its sampling. But this is explaining a false claim! The Australian study shows only 10% higher reaction time than the Victorian study; most of the modern studies report 50% higher. Scott’s hypothesis predicts that studies should not be both broad and fast, contrary to the Australian observation. Also, the fast Finns from 1990 are conscripts, thus probably representative, again contrary to Scott’s hypothesis. The studies from the 1940s are more in line with his hypothesis.

      Three of those four studies appear to me to have larger sample sizes than reported in the meta-analysis.

      I think that Jim’s hypothesis of lack of inter-author calibration is more plausible.

  12. gwern says:

    HBD chick discussion:

    West Hunter:

    (The reactions so far are funny. Does *any* HBDer, except Charlton who apparently has been pushing this RT stuff, think this is valid work?)

    I commented back in 2012 that this seemed to be a highly implausible claim given the enormous effect size ( ), and selection issues certainly seem like the most convincing way to resolve it.

    So, let’s see what the paper says about controlling for selection biases…

    > Silverman (2010) reviews simple RT studies conducted between the 1880s and the present day. In Silverman’s (2010) study, Galton’s estimates collected between 1884 and 1893 (as reported in Johnson et al., 1985) were compared with twelve studies from the modern era (post 1941). Galton’s measures indicated a simple visual RT mean of 183 millisec- onds (ms) for a large sample of 2522 young adult males (aged between 18 and 30), along with a mean of 187 ms for a sample of 888 equivalently aged females. These means seem to be representative of the period as a 1911 review of various studies conducted in the last 19th and early 20th centuries (Ladd & Woodworth, 1911), which did not include Galton’s measures, found an RT range of 151–200 ms (mean 192 ms), using different instrumentation to that employed by Galton (1889). Moreover, Silverman was also able to comprehensively rule out lack of socioeconomic diversity, as Galton’s samples were diverse enough to be stratified into seven male and six female occupational groups (Johnson et al., 1985).
    > Twelve modern (post 1941) simple RT studies by contrast revealed considerably slower RTs for both males (mean 250 ms) and females (mean 277 ms) in a combined sample of 3836. In comparing the 19th-century measures with the modern ones, Silverman found that in 11 of the 12 studies and in 19 out of 20 comparisons, the differences were statistically significant. Furthermore age was not a confounding factor as Silverman matched studies across time based on age range.
    > …We take our general inclusion rules from the meta-analysis by Silverman (2010). First, the samples consisted of people recruited from the general population and whose ages ranged from about 18 to 30 years. Second, the study sample had to be in good health, as poor health is a known inhibitor of RT performance. Third, given that Galton’s sample was British the studies had to have been conducted in a Western country.
    > Fourth, the study samples had to be 20 or larger in size for each sex. Fifth, the delivery of the stimulus was not predictable, which ruled out studies in which the interval between stimuli was fixed or increased or decreased according to a regular pattern. Sixth, the response to the stimulus had to be manual in nature, such as pressing or releasing a button or key. Seventh, to generate the response, the arm did not have to be moved (this restriction was based on the consideration that if the arm must be moved, RT is necessarily lengthened, and the g-loadedness of the estimate potentially reduced due to the addition of a non-cognitive ‘movement time’ component to the measures (Jensen, 2006).
    Eighth, the RT measure had to be representative of the total set of RTs. This restriction eliminated studies in which RT was measured in terms of the best RTs or the longest or shortest RT. As sex-differences data were not available for each study, we generate weighted averages for studies reporting sex differences, thus we produce a single RT mean for each study. Finally it must be noted that reaction time measures tend to show strongly skewed distributions (see: Jensen, 2006).
    > …There are some limitations to this study. Although Silverman used stringent selection criteria the trend may nonetheless be influenced by methodological artefacts and sample peculiarities. This is a potentially important issue as there appears to be a substantial discrepancy between the test-retest coefficients in Galton’s data reported by Johnson et al. (1985), i.e .21 for people tested within a year (N = 421) and .17 for people retested over any time interval (N = 1069), and the equivalent suggested coefficient of the ‘Hick’-style device employed in our reference study (.85; Deary et al., 2001). Given the large N used by Silverman (2010) in establishing the Galton simple RT means, it is unlikely that even relatively low reliability at the individual level would seriously compromise the accuracy of the group mean of Galton’s data. This is especially likely to be the case given the apparent representativeness of Galton’s mean relative to other contemporaneous studies of simple RT, some of which employed likely much better quality instrumentation than that used by Galton (1889), such as the electro-mechanical Hipp chronoscope (Ladd & Woodworth, 1911; Thompson, 1903).

    So, Woodley et al do some controlling for age, and nothing else. The ~Victorian comparison of Galton’s data is to Thompson (dissected above) and to a textbook _Physiological psychology_. The citation is badly mangled, but I *think* it’s referring to _Elements of Physiological Psychology:
    A Treatise of the Activities and Nature of the Mind, from the Physical and Experimental Points of View_ which turns out to be downloadable from Google Books (thank goodness for the public domain) at . The claim about 151-200ms is apparently taken from the ‘optical’ table on pg476 in chapter VI; skimming through the entire chapter, they do not appear to discuss at all issues of representativeness, so this citation is a bust too.

    Woodley’s reply to previous criticism doesn’t consider the selection effects:

    So, I consider this paper well and debunked.

  13. JayMan says:

    About the racial composition of the study subjects, a similar point was made here:

    we’re dumber than the victorians | hbd* chick

    There, I made similar notes about the sampling issues.

  14. hbd chick says:

    excellent! thank you.

    may i ask how you arrived at this?:

    “There he charged visitors to the museum three pence ($25 in modern currency after adjusting for inflation)….?

    • Scott Alexander says:

      Oh, hi, I like your blog!

      I arrived at that number through the extremely awful and untrustworthy method of throwing terms at Google until it led me to this article, which made the entirely unsourced claim that “Interestingly, sixpence (2.5 pence in modern decimal coinage, 5 cents), allowing for inflation translates to something in the region of £25 ($50) now”

  15. suhrob says:

    Excellent, thank you very much!

    > There is some really excellent IQ research out there
    > that everyone should be reading…

    Could you please share which studies do you find worthy of reading?

    Thank you!

  16. Pingback: A week of links - Evolving Economics

  17. Pingback: btw, about those victorians… | hbd* chick

  18. Bottledwater says:

    So you’ve debunked the claim that reaction time has become 1 SD slower, but why has it not become 1 SD faster since IQ and brain size have increased by at least 1 SD since Victorian. Era?

    • gwern says:

      Who says brain size has increased? That’s a new one on me. And the 1sd is from the Flynn effect and a reasonable combination of things like iodization (non-existent at the time; eg. iodization didn’t hit the US until the 1920s or so) and hollow gains on subtests, so represents far less than 1sd gain on g and primarily benefits the worst-off, consistent with what we see today (many fewer extremely stupid people, roughly same sort of smart people).

      • Randy M says:

        what we see today (many fewer extremely stupid people

        Although the extremely stupid people have much more visibility today.
        Idiots–the last ‘freaks’ it’s acceptable to point and laugh at.

      • smartandwise says:

        Brain size has very much increased in the 20th century:

        And what does it matter if the Flynn Effect is related to the g factor. If we discover that the 20th century rise in height is unrelated to the general height factor, that still doesn’t change the fact that we’re taller.

        • gwern says:

          Why would anyone care…? o.0 To reuse your height metaphor, it would be as if no one was at all higher, but they are officially taller because the government decreed that henceforth 1 inch was now 0.9 inches, so now you’re not 5foot, you’re really 6foot!

          No one cares about increases in IQ due to the existing tests being inflated.

        • smartandwise says:

          gwern, just because the Flynn Effect is unrelated to g does not make it any less real.

          With respect to the height analogy, let’s say there’s a general factor for human height, meaning that’s there’s a positive correlation between many different parts of height (leg length, torso length, neck length, face length, cranium height etc).

          Now let’s say that torso length loads very highly on this general height factor and leg length loads very weakly. Now what happens if we discover that the 20th century height gains have been primarily on leg length and not at all on torso length. Then the height gains would be unrelated to the general height factor? Does that change the fact that we’ve become physically taller? Does that make the height gains a statistical artifact? Of course not.

          • gwern says:

            To continue your analogy again: the ‘legs’ here are the subtests like analogies. Why do we care for even a second about ‘analogy legs’ growing?! They are not g, which correlate with practically everything positive that there is, from almost all mental traits to income to health to criminality and dozens more.

            Increasing g is a *big fucking deal* which is why the Flynn effect is so famous since it may reflect increasing g. Increasing a random subtest is meaningless, and if that is all the Flynn effect is, it is a footnote for the psychometricians as they adjust the tests.

        • smartandwise says:

          g is only a big deal because it explains most of the variation in cognitive performance WITHIN generations. g has not been validated as an important measure BETWEEN generations.

          To continue the height analogy, WITHIN generations, the general height factors would be very predictive of important variables like basket ball success and being able to reach the top shelf. But if the general height factor did not correlate with height differences BETWEEN generations, it would be irrelevant in that context.

      • smartandwise says:

        It’s a myth that the Flynn Effect primarily affects dumb people. What the actual data shows is that the entire bell curve is scoring higher. It’s like the 20th century rise in height. It didn’t just eliminate short people; the entire distribution got taller.

        As to why reaction time has apparently not improved; my guess is it reflects a part of intelligence that is resistant to nutritional influences on the brain.

        I also wonder if reaction time was measured the same way when comparing generations.

        • gwern says:


          > Some studies have found the gains of the Flynn effect to be particularly concentrated at the lower end of the distribution. Teasdale and Owen (1989), for example, found the effect primarily reduced the number of low-end scores, resulting in an increased number of moderately high scores, with no increase in very high scores.[10] In another study, two large samples of Spanish children were assessed with a 30-year gap. Comparison of the IQ distributions indicated that the mean IQ-scores on the test had increased by 9.7 points (the Flynn effect), the gains were concentrated in the lower half of the distribution and negligible in the top half, and the gains gradually decreased as the IQ of the individuals increased.[11] Some studies have found a reverse Flynn effect with declining scores for those with high IQ.[12]…A 2005 study presented data supporting the nutrition hypothesis, which predicts that gains will occur predominantly at the low end of the IQ distribution, where nutritional deprivation is probably most severe.[11] An alternative interpretation of skewed IQ gains could be that improved education has been particularly important for this group.[10] Richard Lynn makes the case for nutrition, arguing that cultural factors cannot typically explain the Flynn effect because its gains are observed even at infant and preschool levels, with rates of IQ test score increase about equal to those of school students and adults.

        • smartandwise says:

          gwern, James Flynn himself states that the Flynn Effect affects the ENTIRE distribution and here’s a study confirming that fact:

          Of course different studies may reach different conclusion because not all abilities are equally Flynn Affected, and the Flynn Effect may have different causes for cultural tests compared to culture reduced tests.

          And it’s a myth that nutritional advances primarily benefit the bottom of the distribution. The 20th century rise in height made everyone taller, not short people only.

          • gwern says:

            Congratulations, Araragi-kun, you found an opposing study. Treasure this moment forever.

            But seriously: finding one citation to the several citations in the WP article does not prove you are right. The citation could be wrong. You want a meta-analysis or something to show what the overall result is.

            As it happens, I am familiar with that Wai paper already from when it first came out, and I think it is completely wrong. Please scroll to page 3 and examine the score graphs. Notice that on several subtests, scores in this elite group of chosen children are stagnant, have fallen, or have barely recovered. Note that over these 30 years, net gains are minimal or confined to specific subtests – the exact opposite of what we would expect from genuine increases in _g_, but exactly what we would expect from narrower trends like American education’s increased focus on drilling mathematics. In particular, note the small effect sizes: we are not seeing multiple IQ points on average per generation here – remember how the extremes of a bell curve change when the *mean* changes a few points! Note the absence of any apparent multiple correction I could find.

            So, what we have in Wai is nothing but some cherrypicking of weak results which belies any strong interpretation of the Flynn effect.

        • smartandwise says:

          I have the WAIS-III norms in front of me. A 22 year old who gets a raw score of 5/26 on the Matrix reasoning subtest, has an IQ equivalent on this subtest of 60.

          A 60 year old who gets 5/26 has an IQ equivalent of 70.

          Meanwhile a 22 year old who gets 25/26 has an IQ equivalent of 130. A 60 year old who gets 25/26 has an IQ equivalent of 140.

          So at both the top and bottom of the IQ scale, there’s a 10 point IQ difference between the difference old and young Americans. Thus, I don’t think the Flynn Effect primarily affects the low end, at least not on this kind of culture reduced test one should be using to compare different generations.

          I can’t find a citation, but I seem to recall James Flynn also saying in “The Rising Curve” that Raven gains have affected the entire distribution.

          I suspect the studies that find the Flynn Effect is stronger in the low end are looking at cultural tests requiring general knowledge and vocabulary. Because low IQ people today get disproportionately more and better schooling than low IQ people in the past, low IQ people today probably do disproportionately better on crystallized and academic tests, but I don’t think they do any better on the kinds of culture fair measures of fluid reasoning that reflect the biological brain. Only the latter types of tests should be used when investigating the Flynn Effect. Everything else is just noise.

          If gains on culture reduced tests are caused by nutrition, they should improve the entire distribution just as height gains have; thus I believe they do.

          • gwern says:

            > So at both the top and bottom of the IQ scale, there’s a 10 point IQ difference between the difference old and young Americans

            Gosh, I’m sure there’s no other possible explanation for that.

            > If gains on culture reduced tests are caused by nutrition, they should improve the entire distribution just as height gains have; thus I believe they do.

            No. Seriously. Look up iodine and iron deficiency. Tell me how they are supposed to benefit people out a few standard deviations just as much as they benefit the cretins.

        • smartandwise says:

          Gosh, I’m sure there’s no other possible explanation for that.

          Age differences in test performance reflect both the Flynn Effect and cognitive aging, so if the Flynn Effect were weaker in high IQ people, then cognitive aging would have to be stronger to explain why ages differences are the same in both the bright and the dull. Yet cognitive reserve theory suggests the opposite.

          No. Seriously. Look up iodine and iron deficiency. Tell me how they are supposed to benefit people out a few standard deviations just as much as they benefit the cretins

          Iodine and probably iron deficiency impair height, so by your logic primarily the bottom of the height distribution should have increased over the 20th century. Yet we know the ENTIRE height distribution has increased, with some studies showing even bigger gains among the tall.

          Why is it so hard to believe that bright people today are brighter than bright people a hundred years ago? Look at the internet, Ipads, satellite technology, GPS, drones, cloning, genetic research etc. What exactly do today’s geniuses have to do to convince you they’re smarter? Build a time machine? LOL!

          • gwern says:

            I don’t care about iodine’s effect on height. We’re not talking about height, we’re talking about *IQ*. Show me a decent cite that iodine has boosted IQ in the highest percentiles or gtfo.

        • smartandwise says:

          I don’t care about iodine’s effect on height. We’re not talking about height, we’re talking about *IQ*.

          Then explain why the 20th century rise in measured height is not analogous to the 20th century rise in measured intelligence.

          Show me a decent cite that iodine has boosted IQ in the highest percentiles or gtfo.

          You’re the one with the iodine fixation. I’m simply asserting that in the last 100 years, nutrition has caused about a 1.5 SD increase in both height and intelligence for roughly the entire distribution of both traits. I don’t pretend to know exactly which nutrients are involved or whether the mechanism is reduction in disease or mass market foods fortified with vitamins etc. That’s beyond the scope of this discussion.

          • gwern says:

            > Then explain why the 20th century rise in measured height is not analogous to the 20th century rise in measured intelligence.

            Because they are completely different bodily systems and there is no reason to think that they will affect their respective distributions exactly the same way, and we don’t see any such lift in high-IQ types in the citations I’ve provided and in the Wai study you provided? There is no explanation why it’s not analogous because there’s no reason why they would be analogous in the first place.

            > I’m simply asserting that in the last 100 years, nutrition has caused about a 1.5 SD increase in both height and intelligence for roughly the entire distribution of both traits.

            So, no response to the citations or criticism of Wai, just blind assertions.

            I’m done with you.

        • Douglas Knight says:

          I’m interested in height. What studies do you recommend that show the whole bell curve lifting?

  19. Daniel Speyer says:

    It seems likely that someone did this study on University of Chicago undergraduate psychology majors recently. If we really want a comparison, we could grab that (I realize UoC changed status in the interveaning time, but probably not all that much).

  20. Pingback: Are people getting more stupid? - Page 24

  21. Jan te Nijenhuis says:

    A response to two critical commentaries on Woodley, te Nijenhuis & Murphy (2013)

    Michael A. Woodley, Jan te Nijenhuis, & Raegan Murphy

    Our study on the lowering of intelligence has drawn massive attention from the media, with headlines from Brazil to Vietnam. Also thousands of reactions were posted on blogs, including two highly relevant critical comments on the blogs of Scott Alexander and HBD Chick. We give a response in this post. We are also pleased that our paper in Intelligence is starting a scientific discussion on the lowering of intelligence.

    Alexander (2013) advances the argument that Galton’s sample is unrepresentative of the population of Victorian London, and may be heavily skewed towards those with high-IQ and faster reaction times (RTs) owing in part to the fact that Galton charged a small fee to those wishing to participate in his data collection exercise. Hence, these studies should not be used as the basis for comparison with more modern studies, which, it has been argued are relatively far more representative in many cases of the populations from which they are drawn. We show here that this argument is wrong.

    HBD Chick (2013) has advanced a second argument to the effect that Galton’s sample, and other contemporaneous 19th century studies (i.e. Ladd & Woodsworth, 1911; Thompson, 1903) represent ethnically homogeneous samples in comparison with more modern samples, which are obviously less homogeneous. Given the existence of ethnic-group differences in reaction time means (i.e. Jensen, 1998), this is argued to lead to substantially depressed means in current-era studies and thereby strongly undercuts our conclusions of IQ becoming lower for the general population (HBD Chick, 2013). We show here that this second argument is wrong in as much as changing population composition cannot account for the preponderance of the observed secular decline.

    So, our critics argue that there is a combination of unrepresentative sampling favouring those with faster reaction times (RTs) amongst Galton’s sample coupled with the more representative sampling of increasingly ethnically heterogeneous modern Western populations, whose mean simple RT might be depressed as a consequence. This constitutes an alternative hypothesis to the dysgenic hypothesis advanced in Woodley, te Nijenhuis and Murphy (2013). We will show in detail that this alternative hypothesis is basically incorrect.

    In addressing the first argument, the seminal paper of Johnson et al. (1985) which constitutes the source of Galton’s simple visual reaction time data employed in both our study and that of Silverman (2010), contains excellent data on the socio-economic diversity of the relevant subset of Galton’s exceptionally large sample (N around 17,000 individuals, 4838 [or 30%] of whom were included in Johnson et al’s study). The paper states that “… a sizable portion of Galton’s sample consists of professionals, semi-professionals, and students. However … all socioeconomic strata were represented” (p. 876). As can be seen in Tables 10 and 11 (pp. 890-891), the male cohort could be split into seven socioeconomic groups (Professional, Semi-professional, Merchant/Tradesman, Clerical/Semiskilled, Unskilled, Gentlemen [aristocracy] and Student or Scholar). For females, there were six socioeconomic groups represented in the data (Professional, Semi-professional, Clerical/Semiskilled, Unskilled, Lady [aristocracy] and Student or Scholar). In both the male and female sample the modal group appears to be the Student or Scholar category; in both cases these groups exhibit the largest Ns – 1657 in the case of 14-25 year old males, and 297 in the case of equivalently aged females. The second- and third-largest groups amongst the males of equivalent age were Clerical/Semiskilled (N=425) and Semi-professional (N=414). This is basically true of the female sample also, with Semi-professional being the next largest group after Student or Scholar (N=104) and Clerical/Semiskilled comprising the third largest group (N=47). Whilst it is obviously true that the sample is skewed towards Students or Scholars in both cases, individuals from these lower-middle/upper-working class occupations combined (see p. 888 in Johnson et al., 1985; for a full description of how these occupational categorizations correspond to employment type), make up a respectable proportion of the 14-25 year old samples also (>30% in the case of the males, and >30% in the case of the females). It is important to note that according to Johnson et al (1985) many of the students would have been pupils at schools accompanied by teachers on day-trips to Galton’s laboratory at the Kensington Museum. However, a fundamental point is that Silverman’s (2010) study uses only data for those aged 18-30 (see Table 1, p. 41 in Silverman [2010] for full details of this subsample), hence is quite unlikely to have been nearly as skewed towards school-aged students relative to the sample as a whole, which included a much larger range of ages.

    A careful reading of Silverman (2010) will reveal that he was cognizant of precisely how much socioeconomic diversity was present in Galton’s dataset. Accordingly he was very careful to include only samples that would broadly match one or more of the categories in Galton’s dataset (see: Silverman, 2010, Table 2, pp. 42-43 for full disclosure of the sample background characteristics). One advantage of Silverman’s care and meticulous attention to detail is that it permits us to make like for like comparisons with specific socioeconomic and occupational groups in Galton’s data, thus we can directly test the claims of Alexander (2013). Concerning the post-Galton studies Silverman included five student samples, two of which date from the 1940s (Seashore et al. 1941), and the remaining three of which date from the 1970s to the 2000s (mean testing year = 1993; Brice & Smith, 2002; Lefcourt & Siegel, 1970; Reed et al., 2004). These can be compared with the combined Galton and Thompson 19th-century student data in a three-way comparison as follows:

    Comparison involving male students Difference in mean N-weighted RT means
    19th-century students vs. 1940s-era students +16.8 ms (183.2-200 ms)
    19th-century students vs. ‘modern’ students +74.2 ms (183.2-257.4 ms)
    1940s-era students vs. ‘modern’ students +57.4 ms (200-257.4 ms)

    The difference between the 19th century and the ‘modern’ students is very similar to the meta-regression-weighted increase in RT latency between 1889 and 2004, estimated on the basis of all samples included in the meta-analysis (81.41 ms). Silverman also included data from other socioeconomic groups. For example the study of Anger et al. (1993) included a combined male + female sample of 220 postal, hospital and insurance workers from three different US cities. These occupations clearly fall into the Clerical/Semiskilled and Semiprofessional groups identified in Galton’s study. For both males and females in Galton’s data, the N-weighted RT mean for these two groups is 185.7 ms, the N-weighted average amongst the participants in the study of Anger et al. (1993) was 275.9 ms. This equates to a difference of 90.2 ms between the 19th century and 1993. Again, this is not dissimilar to our meta-regression-weighted estimate of the cross-study increase in RT latency (81.41 ms).

    The results of these broadly socioeconomically- and occupationally-matched study comparisons therefore imply an additional degree of robustness to the findings of our more statistically involved analysis of the overall secular trend. Furthermore, this evidences Silverman’s contention that as an aggregate, the ‘modern’ studies have broadly equivalent representativeness to the subset of Galton’s data employed in his and our own analyses. Alternatively we could state that neither Galton’s nor Silverman’s data are truly fully representative of any population, however they are both ‘biased’ in their sampling towards broadly similar groups.

    We continue with the second concern, i.e. the lack of strict ethnic matching criteria, hypothesized to lead to substantially depressed RT means in current-era studies. Ethnic-group differences in performance on various elementary cognitive tasks have been documented and are to be expected (i.e. Jensen, 1998). Substantial changes in terms of the ethnic composition of test-takers would however be needed in order for the magnitude of change to be solely or even substantially a consequence of this process.
    RT is related to g via mutation load (Thoma et al., 2006), which is in turn a source of individual differences in underlying fitness within populations (Miller, 2000), but not between them (e.g. Rindermann, Woodley & Stratford, 2012), hence there is no good reason to expect ethnic-group differences in RT means to be meaningfully comparable to within-group differences in terms of proportionality (consistent with this is the observation that on simple RT these differences are actually quite small; Jensen, 1993; Lynn & Vanhanen, 2002, pp. 66-67). So, indeed ethnically heterogeneous samples will exhibit slightly slower or even faster reaction times (depending on the populations and proportions involved), however the current proportions of groups exhibiting slower simple RT means to Whites in Western countries are simply too small, and the group-differences too slight to have had a substantial effect.

    It is also worth noting that the weighted mean of our modern (post-1970) aggregated estimate (264.1 ms) is actually less than Jensen’s (1993) finding of a 347.4 ms mean of simple visual RT amongst a sample of 582 White US pupils described as being of European descent, and also Chan and Lynn’s (1989) finding of a 371 ms simple RT mean for over 1000 White British school children in Hong Kong. It must be noted however that these studies were conducted on young children – simple RT shortens until the late 20’s when full neurological maturation is achieved (e.g. Der & Deary, 2006), hence Jensen and Chan and Lynn’s estimates are likely to be underestimates of the adult simple RT means of these Whites, which may be somewhat closer to our sample mean of ‘modern’ (mostly White) populations in actuality.

    We would like to thank Scott Alexander and HBD Chick for their interest in our study, and for their commentaries, however the counter-arguments, whilst thought-provoking, do not appear to withstand scrutiny. We must therefore conclude that the secular slowing of simple reaction time between the closing decades of the 19th century and the opening one of the 21st has had little to do with sampling issues.


    Alexander, S. S. (2013). The wisdom of the ancients. Slate Star Codex. URL: [retrieved on 24/05/13]

    Anger, W. K., Cassitto, M. G., Liang, Y.-X., Amador, R., Hooisma, J., Chrislip, D. W., et al. (1993). Comparison of performance from three continents on the WHO-recommended
    Neurobehavioral Core Test Battery (NCTB). Environmental Research, 62, 125–147.

    Brice, C. F., & Smith, A. P. (2002). Effects of caffeine on mood and performance: A study of realistic consumption. Psychopharmacology, 164, 188–192.

    Chan, J., & Lynn, R. (1989). The intelligence of six year-olds in Hong Kong. Journal of Biosocial Science, 21, 461-464.

    Der, G., & Deary, I. J. (2006). Age and sex differences in reaction time in adulthood: Results from the United Kingdom Health Lifestyle Survey. Psychology and Aging, 21, 62–73.

    HBD Chick. (2013). We’re dumber than the Victorians. HBD Chick. URL: [retrieved on 24/05/13]

    Jensen, A. R. (1993). Spearman’s hypothesis tested with chronometric information-processing tasks. Intelligence, 17, 47-77.

    Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT:

    Johnson, R. C., McClearn, G., Yuen, S., Nagosha, C. T., Abern, F. M., & Cole, R. E. (1985). Galton’s data a century later. American Psychologist, 40, 875–892.

    Ladd, G. T., & Woodworth, R. S. (1911). Physiological psychology. New York, NY: Scribner.

    Lynn, R., & Vanhanen, T. (2002). IQ and the Wealth of Nations. Westport, CT: Praeger.

    Miller, G. F. (2000). Mental traits as fitness indicators: Expanding evolutionary psychology’s adaptationism. Annals of the New York Academy of Sciences, 907, 62–74.

    Reed, T. E., Vernon, P. A., & Johnson, A. M. (2004). Sex difference in brain nerve conduction velocity in normal humans. Neuropsychologica, 42, 1709–1714.

    Rindermann, H., Woodley, M. A., & Stratford, J. (2012). Haplogroups as evolutionary markers of cognitive ability. Intelligence, 40, 362-375.

    Seashore, R. H., Starmann, R., Kendall, W. E., & Helmick, J. S. (1941). Group factors in simple and discrimination reaction times. Journal of Experimental Psychology, 29, 346–394.

    Silverman, I. W. (2010). Simple reaction time: It is not what it used to be. The American Journal of Psychology, 123, 39–50.

    Thoma, R. J., Yeo, R. A., Gangestad, S., Halgren, E., Davis, J., Paulson, K. M., & Lewine, J. D. (2006). Developmental instability and the neural dynamics of the speed-intelligence relationship. Neuroimage, 32, 1456-1464.

    Thompson, H. B. (1903). The mental traits of sex. An experimental investigation of the normal mind in men and women. Chicago, IL: The University of Chicago Press.

    Woodley, M. A., te Nijenhuis, J., & Murphy, R. (2013). Were the Victorians cleverer than us? The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time. Intelligence. doi:10.1016/j.intell.2013.04.006

    • Scott Alexander says:

      Thank you. I actually couldn’t access Silverman’s paper, so I had to work solely off of yours, but I’m glad you’ve given additional information on his methodology.

      I doubt that 1880s students vs. 2000s students are a fair comparison group. In 1880s only the very upper classes would have been in school at those ages, and nowadays access to schooling is much broader. The postal worker comparison is much more interesting and I will have to look into it further.

  22. Pingback: a response to a response to two critical commentaries on woodley, te nijenhuis & murphy (2013) | hbd* chick

  23. Pingback: Were the Victorians genetically smarter than modern Westerners? | Brain Size