SSC Survey: Scattered Negative Results

Traffic to this blog is declining. I need to act decisively to draw people back. Write something so interesting it can’t help but go viral. I’m going to write about…negative results from the perception questions on last year’s survey.

The last SSC survey had a lot of optical illusions and visual riddles. I had hoped to expand on some of the work in Why Are Transgender People Immune To Optical Illusions and Can We Link Perception And Cognition? This post is a very brief summary of results and, basically, an admission of failure. While I was able to replicate the same suggestive results as in the last survey, I was unable to expand on them, strengthen them, or really turn them into any kind of interesting framework.

I was able to weakly replicate the headline result from Why Are Transgender People Immune To Optical Illusions: transgender status still correlated with all three mask illusions, and with the average of all three mask illusions, but very weakly: r = -0.04, p = 0.001. This was true even when I excluded everyone who took place in last year’s survey, providing an independent confirmation of the result. But with correlations this low, it’s hard to get too excited.

I was also able to weakly replicate the headline result from Can We Link Perception And Cognition?. I haphazardly gave people a “weirdness score” based on them having more mental illnesses, more unusual political opinions, and more minority sexual/gender identities (without looking at their illusion results). People with higher weirdness scores consistently had more ambiguity-tolerant results on illusions, with correlations around r = 0.05 for most tests. They also had notably higher average Tolerance of Uncertainty Test scores. But none of these results were very striking and there was minimal individual structure in them. If I was going to take this further I would have come up with a more principled definition of weirdness, but at this point it doesn’t seem worth it.

What do I mean by saying these results are weak and lack internal structure? To give an example: the last survey focused on a single optical illusion, the Hollow Mask. This survey used three different versions of the Hollow Mask in the hopes of removing noise and getting a higher-fidelity mask perception signal. This didn’t work at all. The correlations between the three masks were very low. For example, there was only an r = 0.09 correlation between being able to see the second mask illusion and the third mask illusion. While all correlations were significant, it doesn’t seem fair to conceptualize them as testing the same perceptual function.

Given that I couldn’t even get different versions of the same illusion to line up, you can guess that I didn’t get much correlation between different illusions. For example, the Tables Illusion correlated with average score on the Mask Illusion at about r = 0.06, p = 0.001. I was also able to replicate the correlation in the literature between autism and the Tables Illusion, r = 0.03 p = 0.01, but again this was very small.

Were there any actually large correlations? Surprisingly, the Surgeon Riddle produced some of the most impressive results on the whole survey. For example, it correlated at r = 0.18, p = 0.001 with ability to see “the the” as two separate words. Given that the Surgeon Riddle seems much less perceptually basic than the other things on here, I believe this is probably some kind of confounder, maybe amount of time people spend on each question or something like that.

There was an overwhelmingly strong effect of age on the Parentheses Riddle and nothing else. 62% of people in their 20s got it right, compared to 29% of people in their 70s. This was surprising enough that it might be worth its own post. I was unable to wring anything else out of this no matter how hard I tried.

I tried a factor analysis to draw some factors out of all the different illusions. SPSS came up with six factors, each of which explained 5-10% of the variance; given how many variables I put in, this doesn’t look that much better than chance. None of the factors correlated surprisingly well with anything else, nor was there an obvious pattern in which illusions they grouped together.

In conclusion, most illusions had very low (less than r = 0.1) but highly significant (less than p = 0.001) correlations with one another and with various mental illnesses. There was no clear pattern to the correlations, although they did generally replicate past observations and findings from the literature. I can’t really say for sure if there’s a real effect here or it’s all just confounders, and if there is a real effect I definitely can’t tease out its structure or say anything about it for sure. I encourage other people to look into these and see if they can do better. You can download the survey results here.

I have a couple other things I pre-registered to investigate but never got around to before, so here they are:

Political conflict theory was slightly correlated with various questions on the Tolerance of Ambiguity and Tolerance of Uncertainty tests, but not enough to be interesting. For example, when asked whether political extremists you don’t like (eg fascists) were making understandable mistakes or just evil, the answer was correlated at about r = 0.05 with various questions like “Small doubts keep me from acting” and “The ambiguities in life stress me out”. The more stressed people were by ambiguity, the more they thought extremists were evil. This was not a very large effect, it varied from question to question in an unpredictable way, and I wouldn’t have mentioned it if I didn’t feel obligated to tie up preregistered loose ends.

Contra my prediction, there was no relationship between autism and the likelihood of giving process-based (“meta-level”) responses to categorization questions as opposed to person-based (“object-level”) responses. For example, when asked to judge fascists beating up minorities vs. minorities beating up fascists, autistic and neurotypical people were about equally likely to base their responses on principles (ie “beating up people is wrong”) vs. on the groups involved (ie “fascists are bad”).

Contra my prediction, people with ADHD did not describe themselves as more ambitious. They did describe themselves as more risk-taking and more likely to prefer a buzzing-city aesthetic than a quiet-country aesthetic, but the results, although significant, were too low in magnitude to be interesting.

As I mentioned in a previous article, the results of the AI Persuasion Experiment two years ago did not persist.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

38 Responses to SSC Survey: Scattered Negative Results

  1. melboiko says:

    N=1 but for the record:

    • I’m trans MTF.
    • I was blind to the original survey image of the computer-generated spinning mask. I was blind to no other illusions.
    • The effect was robust with the original spinning-mask GIF: I could not see the illusion no matter how much I tried. But the effect wasn’t robust with other masks: videos of actual, real-world spinning masks did trigger the illusion.
    • I’m quite convinced of the above, since it was very shocking for me, and I wrote about it at the time. But now that I’m 4 months under hormone therapy (estradiol + testosterone blocker), I find it has weakened. I can see the illusion almost for the entire duration of the reversed spin, even in the original mask. It only breaks down after the nose has crossed the centre-line, by which point it looks hollow again (I don’t know if it’s like that for everyone).
    • I suspect a degree of ASD but it’s undiagnosed. (Gender dysphoria was properly diagnosed.)

  2. googolplexbyte says:

    Aphantasia was only recently discovered maybe you’d have more luck exploring differences in people abilities to generate mental imagery.

    You could even ask about peoples ability to imagine visual illusions. e.g. I can fairly easily imagine the face side of a mask, but not the hollow side at all. I can just barely imagine the infinite staircase, but it’s really hard as my mental image keeps trying to correct itself.

  3. Poruchik Rzhevsky says:

    Holy Moley. Your traffic is only sub-ten thousand per month? For some reason I thought it would be in the hundreds of thousands based on the sheer number of comments and the interesting links round ups. Every time I’m in the office I talk about the blog posts I’ve read with my friends. I just thought this was one of those things -everyone- read and I was late to…

    • JRM says:

      I don’t think I had a number in mind, but 10K per day also struck me as surprisingly low. I guess this means I can steal Scott’s Swifties without fear of getting caught.

  4. fr8train_ssc says:

    Traffic to this blog is declining. I need to act decisively to draw people back. Write something so interesting it can’t help but go viral. I’m going to write about…negative results from the perception questions on last year’s survey.

    I know I haven’t been really active here in the last three months since I’ve been managing/career-mentoring summer interns at my firm. How does your blog traffic vary seasonally? I would imagine Springs and Summers would have a heavy decline if most of your readers are undergraduate or graduate STEM people, between finals and then internships. An another note, how correlated is your comment traffic with your view/reader traffic? There have been times I’ve seen the preview for a new entry have 1000+ comments in it and decided “Wow, well I’m already too late to the party here, guess I’ll say something next post”

    Contra my prediction, there was no relationship between autism and the likelihood of giving process-based (“meta-level”) responses to categorization questions as opposed to person-based (“object-level”) responses. For example, when asked to judge fascists beating up minorities vs. minorities beating up fascists, autistic and neurotypical people were about equally likely to base their responses on principles (ie “beating up people is wrong”) vs. on the groups involved (ie “fascists are bad”).

    I remember your survey was a volunteer sample (i.e. strangers on the web) so if the population of your readers, their friends, etc. are more likely to be meta-level thinkers (either from participating in less wrong or the rationalist community) compared to the population at large, that may mask or conceal a correlation between meta-level thought and autism. This would also depend on what the estimated general percentage of the population or object vs meta level thinkers, and what percentage of active SSC participants have Autism-Spectrum behaviors (compared to the general population statistic of 1%)

  5. yodelyak says:

    I think “took place” should be “took part”

  6. idontknow131647093 says:

    After reading the question I suspected it was a trick question with B or C as the correct answers. I knew B was intended to be correct, but didn’t know whether A was not or was depending on the levels of trickiness intended.

    In reality, my favorite answer would be C, just as reference for your stats I am a 30 YO, ex-engineer, patent attorney.

  7. Tatterdemalion says:

    If the declining traffic is just from people who were here for the political controversy, I’m not sure how much of a loss that is.

    I suspect it would be easy to increase SSCs hits per day, at least in the short term, by getting adopted by the Red tribe as someone who can be relied upon to stick it to the Blue tribe on a regular basis.

    But I think the effect on the comments quality of that happening would probably be catastrophic.

  8. P. George Stewart says:

    If traffic is declining, it’s probably because the blog has become more timid about controversial topics. From where I’m sitting, you got a name for being an incredibly smart commentator who was somewhat fearless to air somewhat taboo topics.

    Incredibly smart people are ten a penny on the internet, but incredibly smart people who aren’t automatically conformist are harder to find.

    But it’s understandable – you’ve got a job, etc., etc. But then you’ll just have to accept that a more timid blog will garner less interest.

  9. Nornagest says:

    There was an overwhelmingly strong effect of age on the Parentheses Riddle and nothing else. 62% of people in their 20s got it right, compared to 29% of people in their 70s. This was surprising enough that it might be worth its own post. I was unable to wring anything else out of this no matter how hard I tried.

    Sounds like crystallized vs. fluid intelligence to me.

  10. Ken says:

    The results are interesting and all, but I just want to know what my own weirdness score was.

  11. Doug says:

    I think you’re overly discounting high-signifigance results with low correlation. There’s nothing inherently untrustworthy about low-r. It’s simply consistent with a high amount of noise overlaid on top of a latent structure. The classic example is the stock market. The vast majority of stock price movements are not predictable. The market’s not perfectly efficient, but it’s pretty efficient. Because if there’s $20 bills on sidewalks people tend to pick them up.

    Most market volatility represents true, inherently unavoidable, risk. In other words: noise However the market’s not perfectly efficient. There are still a few $20 bills hidden in dark corners and between cracks. Some small percent of day-to-day variance can be predicted ahead of time.

    For example we know that stocks on the day they’re announcing earnings tend to go up ever-so-slightly more than stocks not announcing earnings. We know that cheap stocks tend are slightly more likely to go up on any given day than expensive stocks. We know that stocks that went up a lot after they announced earnings tend to keep going up over the next day or two. We know that stocks where most recent trades were initiated by the buyer tend to go up a little bit more than stocks where most recent trades were initiated by the seller. We know that stocks making news headlines tend to keep moving in the same direction, and vice versa for stocks without news headlines. We know that very volatile stocks tend to go up less than stable stocks.

    These results are extremely well-documented. They’ve been replicated again and again. The findings have been confirmed across many different time periods, with different criteria, in different countries, and even with analogs in different asset classes. There’s no doubt these effects are real. That being said the correlation here is way less than 1%. I guarantee you’ll never find an anomaly in a major market with r>0.1. But they’re still highly significant, and there’s no doubt that they’re real. Billion dollar fortunes have been made by those able to who recognized these effects.

    Again, that’s just how problems like this look. If you add a bunch of random noise to a dataset, then that puts a hard upper-bound on correlation. It doesn’t mean the latent structure isn’t there. The reason we use significance tests, instead of just eyeballing correlation, is because those tests are capable of cutting through this problem. Correlation is invariant to sample size. But no matter how noisy the data, we can achieve any arbitrarily high level of statistical significance (of a real effect) by using larger and larger datasets.

    My guess is that when it comes to illusion recognition there’s simply a lot of noise. (At least noise relative to the independent variables found in the survey.) Mental health and personality probably do influence illusion recognition. But things like whether you’ve seen similar illusions before, how much sleep you had that day, what size screen you’re looking, how you interpret the question, or even just random gut feelings overwhelm the variance.

    • Scott Alexander says:

      I wouldn’t mind the low effect size as long as it was consistent across seemingly-similar problems, but it isn’t.

      It’s also so low that I really can’t tell if it’s confounded by something else. I know it’s not age because I checked, but how many other things are there like that? And even if no one of them alone explains very much, might it be all of them together?

      • Doug says:

        Without speculating too much, maybe it’s possible that certain illusions are more noise loaded than other illusions. E.g. maybe the Surgeon Riddle is just something people are very consistent on, whereas the Mask Illusion really just tends to depend on what kind of mood they’re in that day.

        In which case the latent disorder+personality structure would still exist, and have roughly the same impact on every illusion. However the variance contributed by the noise component would be much higher for the Mask Illusion, so it drowns out the effect size of the latent structure. In which case we’d expect these noisy illusions to generally have lower correlations with other illusions, as well as personality factors.

        From an experimental standpoint I see a few ways to capture this effect. 1) You could ask subjects to assign some magnitude about how confident, clear or obvious their interoperation of the illusion is. People probably will quaffle more on their answers to noisy illusions, and hopefully that would reflect in their self-assessed confidence scores.

        2) You could retest the same subjects, spaced far enough apart that their memory is likely to be hazy. Noisy illusions are more likely to illicit flip-flops in same-subject responses. 3) You could compare within-family responses. By definition noise is whatever’s not captured by your independent variables, but I’m guessing in this case whatever’s driving the noise (like how much sleep you had last night) is much less loaded on heridatbility than the latent personality structure.

        FWIW, I’m gonna take a look at the dataset over the weekend and see if anything pops out. If you’re interested, I’ll post my findings/opinion in this thread.

  12. bbqturtle says:

    Traffic to this blog is declining.

    I work in data, and Scott, I highly encourage you to challenge this hypothesis before trying to let it shape your decisions too much. Very frequently in my job, people first come to me with a factoid like this, and the first thing I need to do is CHECK. The question I am answering is “Why does Scott think traffic is declining” not “Why is traffic declining”?

    So, without following this blog as closely as I could, let me propose several things to check before you worry about “peaking”.

    1. Has the viewership of individual posts increased while the homepage has decreased?
    2. Is the metric you are using (homepage views) highly correlated with external viewership? (Subreddit traffic, adwords/advertiser traffic). IE, do they all show a decline? ( suggests that you had a peak in 2018 of virality, but it’s been steadily growing outside of that)
    3. How has the number of monthly blog comments changed over that time? How has the number of weekly culture war thread comments changed over that time?
    4. What other benchmarks can you compare this to? How has overall wordpress traffic trended over time? Are you beating other blogs in the decline? Are more people using facebook or reddit for news, and fewer people visiting outside websites? Maybe people are tired/burnt out of political news? How does your traffic compare to other big bloggers that post at a similar frequency?
    6. Are you tallying the number of views, or the number of UNIQUE views? I imagine you are checking the number of views. But, is it possible that more people are only coming to the blog when you post a new article instead of checking back again and again to see if you have new content?
    7. If you consider your top posts in the beginning of 2018 an outlier – you went viral (which is hard to do and shouldn’t be considered a standard to hold yourself to) how is traffic doing?

    I firmly hypothesize that the above factors in some way will account for a large part of the decline.

    If traffic is declining:

    1. Is there anything significant that is different this year than last year? (Directing users to use the reddit culture war thread could cause a decline in people returning to comment) (Did you do those every-two-week off-topic posts before? Maybe people hate those or they are getting their comments there instead of in your blog posts?)
    2. Are your blog posts more accessible, meaning people don’t feel the need to re-read them as much? Are people spending less time or more time on the site when they come? Is it possible people are being reached through RSS feeds, reddit, etc, more than otherwise?
    3. Is this number going down a bad thing? I imagine you are using this to determine how many people you are reaching with your blog, which is why you blog (to be heard, to help people). Are you sure that “number of times people open your blog’s homepage” is the same metric as “number of people that have heard you this month, and appreciate what you’ve said this month”?

    So. I think you need more data. I propose a two-question survey, as a “real blog post”. “no matter where you read this from, take this survey please”. Q1: How do you learn about a new blog post? (see post on reddit, see post on other social media, refreshing my blog every day) Q2: Has that method changed in the last year? (Yes, started using reddit/social media, yes, I’m new to the blog, no, I do it the same as always)

    PS. I think you should remove the sentences regarding traffic in the beginning. It’s kind of heartbreaking to hear that your favorite blogger’s traffic is declining when you don’t really know if that’s true. A negative statement about the blog in this way could actually cause fewer people to stop reading it. Others may disagree and enjoy your transparency, but in terms of data alone – it’s likely caused by the factors listed above.

    • BASKETBALLGUY!!!! says:

      I almost always visit via the subreddit which links directly to posts rather to the main page, so I’m not counting toward views even though I hit up every post at least once, often multiple times.

    • cuke says:

      You’ve probably considered this — do you experience seasonal variation in your traffic and is the dip just this past couple of months? I don’t know if blog reading in general declines in summer (in the northern hemisphere). I know I’m at my computer less when the weather’s nice and that I get more cancellations in my therapy practice in the summer for the same reason.

      I’ve also noticed more news and social media fatigue in recent months — for sure by me, but also I’ve seen more comments and writing about it online. The frenzy with which some of us attended to keeping up with the drama that started (in the U.S.) with the 2016 election seems to have worn some percentage of people down. It finally led to my getting off Facebook and Twitter altogether, but it took until this past spring for me to do it. Some of my clients who’ve been talking about getting off Facebook for a year finally did it this summer. I don’t know if any of those data points are relevant to your readership.

      For me, getting off of Twitter and Facebook freed up time for me to read more long-form stuff, and that includes your blog. I go up and down in my capacity to engage with the comments section because I’m extremely conflict-averse, so sometimes I have to stay away from your whole blog for that reason.

    • Nancy Lebovitz says:

      I assumed Scott was joking about wanting to draw people back.

    • Rusty says:

      I think he is amusing himself on this topic. It sets up the joke nicely. I think Scott could easily drive up traffic and has no interest at all in doing so – hence the latest post!

    • Yosarian2 says:

      Random factoid: I never go to the home page because I have the “archives” page bookmarked as it lists everything.

  13. Deiseach says:

    Traffic to this blog is declining. I need to act decisively to draw people back.

    Does this mean that if it gets very low, you might decide it’s too much effort to keep this going?

    Anyway, about the Surgeon Riddle, I think getting the correct answer on this one is because it’s been around so long; I’ve seen it before, I saw the answer, so I knew what I was supposed to say. Unless you have a bunch of eight year olds who’ve never heard of it before, I don’t think it yields much of use, unless someone is tracking ‘sexism down the decades’ and wants to see if the underlying assumptions have changed much (okay that is useful, but the Riddle has lost its surprise value).

    Though I suppose it would be interesting to see if, in five years’ time, those eight year olds all answer “the surgeon is the second dad, duh” because of gay marriage 🙂

    • cuke says:

      I agree with Deiseach about the surgeon’s riddle. I’m 53 and first heard it when I was about 10. I remember getting it wrong when I first heard it, but then it was also 1974. I’ve encountered it maybe four or five times since then and obviously didn’t get it wrong after that, but then it was no longer indicating anything other than that I remembered the first time I heard it. You could include a warning before the riddles asking people who have heard them before not to answer.

    • Nancy Lebovitz says:

      I don’t remember when I first saw it, but I remember Hofstadter writing about it– I don’t know about your younger readers, but your mid-range to older readers may well have seen it there.

    • JG28 says:

      Created an account just to comment on the Surgeon Riddle. I think a lot of people get the answer ‘wrong’ not because of gender bias, but because of the odd syntax. “I can’t operate on this boy. He’s my son!” is a very atypical way for a mother to talk about her own child. The riddle would be more ‘accurate’, and consequently a lot less interesting, if the phrasing were updated to “I cannot operate on my own son!”

      • Robert L says:

        I think there is something in that, there is a distinctively masculine pomposity in the wording.

  14. nameless1 says:

    >more likely to prefer a buzzing-city aesthetic than a quiet-country aesthetic

    Look, preferences are predictions of utility. Like all predictions, it is fallible and people learn from failures. I like the idea of a quiet-country calming my painfully hyperactive brain down. It does not really work, though, I just get bored and get even more restless. I don’t like the idea of the city-buzz when my whole being craves calming and quieting the racing mind, but it actually does manage to get myself lost in it. I cannot really answer which one I prefer as such, preferences are predictions and I would like the idea of the quiet-country thing working because it sort of like looks itself like the calm state I wish to achieve but I know it does not work for me, so I go and choose the one that works even though I don’t really like the state of things where it works and not that.

    So basically when you are asking people about their preferences it is like asking them a question here, you have a flu, would you prefer this Fooazine pill or that Snafuazine pill to treat it? And it cannot really be answered when you know you like the Snafuazine pill more because it has a nice blue color and is sweet, while the Fooazine looks and tastes like mouse shit, but you know from experience Fooazine helps with your symptons of flu and Snafuazine does not. What is your “preference”, then?

  15. fion says:

    Typo: “with correlations these low” should be “with correlations this low”

  16. tailcalled says:

    Part of the inspiration for doing these tests were attempts to understand the causes of transness by linking it to NMDA hypofunction. I think it would be useful to include a question about autogynephilia and autoandrophilia in the SSC surveys for this. It would also be useful for a number of other things, e.g. knowing which influences on “gender thoughts” are driven by AGP/AAP and which are driven by other factors. In addition, I usually find that groups similar to SSC have AGP/AAP rates of about 50%, which is far higher than the population baseline, and I think it would be interesting to know whether this also applies to those who take the SSC survey and whether any factors can be identified which explain the high rates.

  17. STV says:

    How can you tell what is meta-level and what is object-level there? That only works if the only possible meta-level principle is ‘anti-violence’ (or pro-violence, I suppose).

    For the guy consistently siding against fascists, what makes ‘opposing the glorious marxist utopia is wrong’ less of a principle than ‘beating up people is wrong’? Or, for that matter, what about the meta-level anti-minority position (‘ethnostates for everyone’), that supports the beating of minorities everywhere, regardless of creed or color?

  18. shakeddown says:

    Could the age/Parentheses Riddle correlation be caused by some confounder like having taken modern format SATs, or being a programmer?

    • Harry Maurice Johnston says:

      Or perhaps stubbornness, or something along those lines. IIRC, I guessed correctly what the “right” answer was supposed to be, but still chose the “wrong” one because, well, I think basically because the “right” answer was just too darn ugly. Perhaps that’s not stubbornness. Differing sense of aesthetics?

      • Vitor says:

        I still think that there are 2 valid solutions, depending on how you look at it.

        In mathematical typesetting, parentheses are treated specially. Most symbols have a “fixed” size, which is only modified in straightforward ways (e.g. smaller font in fractions, and as sub/superscript). A pair of parentheses, however, must have a size that is relative to the expression it contains: a very tall nested fraction is surrounded by tall parentheses. In contrast, if you e.g. add two tall fractions together, the addition operator between them is still of normal size.

        Curly braces behave in similar ways to parentheses, except that they are also frequently found at the top or bottom of an expression. All of this supports a “geometric” interpretation of parentheses, and thus I find it intuitive that when you read text from right to left, an open parenthesis should actually be curved towards the left, because it is defined as a line curved towards the expression it is enclosing.

        This intuition is further supported by the fact that there are many “directional” operators in math, e.g. assignment “:=”, with its right-to-left version “=:”, as well as implication “=>” and “<=", which can also be encountered pointing down, up or diagonally on whiteboards.

        As a computer scientist, I am well aware that it is possible to treat parentheses as a purely grammatical token in a stream of tokens. That's not how my mind sees them, though.

        Therefore, I consider both ())( and ()() as valid anagrams.

  19. Toby Bartels says:

    Disappointing results, but I'm not disappointed by your integrity in reporting them anyway.

    • John Schilling says:

      Ditto. We do get some interesting strong results from Scott’s surveys, and it would be even better if we got even more strong and interesting results. But publishing the negatives is important as well, for reasons that I think are generally understood and then generally ignored.

      • Joseph Greenwood says:


        Also, I was surprised that your readership has been declining for the last year. I would have expected it to stabilize after the controversy of the 2016 campaign died down and then experience small, fairly consistent growth since then (possibly moderated by the seasons if, say, people read more blogs in the summer than in the fall).