Book Review: Surfing Uncertainty

[Related to: It’s Bayes All The Way Up, Why Are Transgender People Immune To Optical Illusions?, Can We Link Perception And Cognition?]


Sometimes I have the fantasy of being able to glut myself on Knowledge. I imagine meeting a time traveler from 2500, who takes pity on me and gives me a book from the future where all my questions have been answered, one after another. What’s consciousness? That’s in Chapter 5. How did something arise out of nothing? Chapter 7. It all makes perfect intuitive sense and is fully vouched by unimpeachable authorities. I assume something like this is how everyone spends their first couple of days in Heaven, whatever it is they do for the rest of Eternity.

And every so often, my fantasy comes true. Not by time travel or divine intervention, but by failing so badly at paying attention to the literature that by the time I realize people are working on a problem it’s already been investigated, experimented upon, organized into a paradigm, tested, and then placed in a nice package and wrapped up with a pretty pink bow so I can enjoy it all at once.

The predictive processing model is one of these well-wrapped packages. Unbeknownst to me, over the past decade or so neuroscientists have come up with a real theory of how the brain works – a real unifying framework theory like Darwin’s or Einstein’s – and it’s beautiful and it makes complete sense.

Surfing Uncertainty isn’t pop science and isn’t easy reading. Sometimes it’s on the border of possible-at-all reading. Author Andy Clark (a professor of logic and metaphysics, of all things!) is clearly brilliant, but prone to going on long digressions about various esoteric philosophy-of-cognitive-science debates. In particular, he’s obsessed with showing how “embodied” everything is all the time. This gets kind of awkward, since the predictive processing model isn’t really a natural match for embodiment theory, and describes a brain which is pretty embodied in some ways but not-so-embodied in others. If you want a hundred pages of apologia along the lines of “this may not look embodied, but if you squint you’ll see how super-duper embodied it really is!”, this is your book.

It’s also your book if you want to learn about predictive processing at all, since as far as I know this is the only existing book-length treatment of the subject. And it’s comprehensive, scholarly, and very good at giving a good introduction to the theory and why it’s so important. So let’s be grateful for what we’ve got and take a look.


Stanislas Dehaene writes of our senses:

We never see the world as our retina sees it. In fact, it would be a pretty horrible sight: a highly distorted set of light and dark pixels, blown up toward the center of the retina, masked by blood vessels, with a massive hole at the location of the “blind spot” where cables leave for the brain; the image would constantly blur and change as our gaze moved around. What we see, instead, is a three-dimensional scene, corrected for retinal defects, mended at the blind spot, stabilized for our eye and head movements, and massively reinterpreted based on our previous experience of similar visual scenes. All these operations unfold unconsciously—although many of them are so complicated that they resist computer modeling. For instance, our visual system detects the presence of shadows in the image and removes them. At a glance, our brain unconsciously infers the sources of lights and deduces the shape, opacity, reflectance, and luminance of the objects.

Predictive processing begins by asking: how does this happen? By what process do our incomprehensible sense-data get turned into a meaningful picture of the world?

The key insight: the brain is a multi-layer prediction machine. All neural processing consists of two streams: a bottom-up stream of sense data, and a top-down stream of predictions. These streams interface at each level of processing, comparing themselves to each other and adjusting themselves as necessary.

The bottom-up stream starts out as all that incomprehensible light and darkness and noise that we need to process. It gradually moves up all the cognitive layers that we already knew existed – the edge-detectors that resolve it into edges, the object-detectors that shape the edges into solid objects, et cetera.

The top-down stream starts with everything you know about the world, all your best heuristics, all your priors, everything that’s ever happened to you before – everything from “solid objects can’t pass through one another” to “e=mc^2” to “that guy in the blue uniform is probably a policeman”. It uses its knowledge of concepts to make predictions – not in the form of verbal statements, but in the form of expected sense data. It makes some guesses about what you’re going to see, hear, and feel next, and asks “Like this?” These predictions gradually move down all the cognitive layers to generate lower-level predictions. If that uniformed guy was a policeman, how would that affect the various objects in the scene? Given the answer to that question, how would it affect the distribution of edges in the scene? Given the answer to that question, how would it affect the raw-sense data received?

Both streams are probabilistic in nature. The bottom-up sensory stream has to deal with fog, static, darkness, and neural noise; it knows that whatever forms it tries to extract from this signal might or might not be real. For its part, the top-down predictive stream knows that predicting the future is inherently difficult and its models are often flawed. So both streams contain not only data but estimates of the precision of that data. A bottom-up percept of an elephant right in front of you on a clear day might be labelled “very high precision”; one of a a vague form in a swirling mist far away might be labelled “very low precision”. A top-down prediction that water will be wet might be labelled “very high precision”; one that the stock market will go up might be labelled “very low precision”.

As these two streams move through the brain side-by-side, they continually interface with each other. Each level receives the predictions from the level above it and the sense data from the level below it. Then each level uses Bayes’ Theorem to integrate these two sources of probabilistic evidence as best it can. This can end up a couple of different ways.

First, the sense data and predictions may more-or-less match. In this case, the layer stays quiet, indicating “all is well”, and the higher layers never even hear about it. The higher levels just keep predicting whatever they were predicting before.

Second, low-precision sense data might contradict high-precision predictions. The Bayesian math will conclude that the predictions are still probably right, but the sense data are wrong. The lower levels will “cook the books” – rewrite the sense data to make it look as predicted – and then continue to be quiet and signal that all is well. The higher levels continue to stick to their predictions.

Third, there might be some unresolvable conflict between high-precision sense-data and predictions. The Bayesian math will indicate that the predictions are probably wrong. The neurons involved will fire, indicating “surprisal” – a gratuitiously-technical neuroscience term for surprise. The higher the degree of mismatch, and the higher the supposed precision of the data that led to the mismatch, the more surprisal – and the louder the alarm sent to the higher levels.

When the higher levels receive the alarms from the lower levels, this is their equivalent of bottom-up sense-data. They ask themselves: “Did the even-higher-levels predict this would happen?” If so, they themselves stay quiet. If not, they might try to change their own models that map higher-level predictions to lower-level sense data. Or they might try to cook the books themselves to smooth over the discrepancy. If none of this works, they send alarms to the even-higher-levels.

All the levels really hate hearing alarms. Their goal is to minimize surprisal – to become so good at predicting the world (conditional on the predictions sent by higher levels) that nothing ever surprises them. Surprise prompts a frenzy of activity adjusting the parameters of models – or deploying new models – until the surprise stops.

All of this happens several times a second. The lower levels constantly shoot sense data at the upper levels, which constantly adjust their hypotheses and shoot them down at the lower levels. When surprise is registered, the relevant levels change their hypotheses or pass the buck upwards. After umpteen zillion cycles, everyone has the right hypotheses, nobody is surprised by anything, and the brain rests and moves on to the next task. As per the book:

To deal rapidly and fluently with an uncertain and noisy world, brains like ours have become masters of prediction – surfing the waves and noisy and ambiguous sensory stimulation by, in effect, trying to stay just ahead of them. A skilled surfer stays ‘in the pocket’: close to, yet just ahead of the place where the wave is breaking. This provides power and, when the wave breaks, it does not catch her. The brain’s task is not dissimilar. By constantly attempting to predict the incoming sensory signal we become able – in ways we shall soon explore in detail – to learn about the world around us and to engage that world in thought and action.

The result is perception, which the PP theory describes as “controlled hallucination”. You’re not seeing the world as it is, exactly. You’re seeing your predictions about the world, cashed out as expected sensations, then shaped/constrained by the actual sense data.


Enough talk. Let’s give some examples. Most of you have probably seen these before, but it never hurts to remind:

This demonstrates the degree to which the brain depends on top-down hypotheses to make sense of the bottom-up data. To most people, these two pictures start off looking like incoherent blotches of light and darkness. Once they figure out what they are (spoiler) the scene becomes obvious and coherent. According to the predictive processing model, this is how we perceive everything all the time – except usually the concepts necessary to make the scene fit together come from our higher-level predictions instead of from clicking on a spoiler link.

This demonstrates how the top-down stream’s efforts to shape the bottom-up stream and make it more coherent can sometimes “cook the books” and alter sensation entirely. The real picture says “PARIS IN THE THE SPRINGTIME” (note the duplicated word “the”!). The top-down stream predicts this should be a meaningful sentence that obeys English grammar, and so replaces the the bottom-up stream with what it thinks that it should have said. This is a very powerful process – how many times have I repeated the the word “the” in this paragraph alone without you noticing?

A more ambiguous example of “perception as controlled hallucination”. Here your experience doesn’t quite deny the jumbled-up nature of the letters, but it superimposes a “better” and more coherent experience which appears naturally alongside.

Next up – this low-quality video of an airplane flying at night. Notice how after an instant, you start to predict the movement and characteristics of the airplane, so that you’re no longer surprised by the blinking light, the movement, the other blinking light, the camera shakiness, or anything like that – in fact, if the light stopped blinking, you would be surprised, even though naively nothing could be less surprising than a dark portion of the night sky staying dark. After a few seconds of this, the airplane continuing on its (pretty complicated) way just reads as “same old, same old”. Then when something else happens – like the camera panning out, or the airplane making a slight change in trajectory – you focus entirely on that, the blinking lights and movement entirely forgotten or at least packed up into “airplane continues on its blinky way”. Meanwhile, other things – like the feeling of your shirt against your skin – have been completely predicted away and blocked from consciousness, freeing you to concentrate entirely on any subtle changes in the airplane’s motion.

In the same vein: this is Rick Astley’s “Never Going To Give You Up” repeated again and again for ten hours (you can find some weird stuff on YouTube). The first hour, maybe you find yourself humming along occasionally. By the second hour, maybe it’s gotten kind of annoying. By the third hour, you’ve completely forgotten it’s even on at all.

But suppose that one time, somewhere around the sixth hour, it skipped two notes – just the two syllables “never”, so that Rick said “Gonna give you up.” Wouldn’t the silence where those two syllables should be sound as jarring as if somebody set off a bomb right beside you? Your brain, having predicted sounds consistent with “Never Gonna Give You Up” going on forever, suddenly finds its expectations violated and sends all sorts of alarms to the higher levels, where they eventually reach your consciousness and make you go “What the heck?”


Okay. You’ve read a lot of words. You’ve looked at a lot of pictures. You’ve listened to “Never Gonna Give You Up” for ten hours. Time for the payoff. Let’s use this theory to explain everything.

1. Attention. In PP, attention measures “the confidence interval of your predictions”. Sense-data within the confidence intervals counts as a match and doesn’t register surprisal. Sense-data outside the confidence intervals fails and alerts higher levels and eventually consciousness.

This modulates the balance between the top-down and bottom-up streams. High attention means that perception is mostly based on the bottom-up stream, since every little deviation is registering an error and so the overall perceptual picture is highly constrained by sensation. Low attention means that perception is mostly based on the top-down stream, and you’re perceiving only a vague outline of the sensory image with your predictions filling in the rest.

There’s a famous experiment which you can try below – if you’re trying it, make sure to play the whole video before moving on:

About half of subjects, told to watch the players passing the ball, don’t notice the gorilla. Their view of the ball-passing is closely constrained by the bottom-up stream; they see mostly what is there. But their view of the gorilla is mostly dependent on the top-down stream. Their confidence intervals are wide. Somewhere in your brain is a neuron saying “is that a guy in a gorilla suit?” Then it consults the top-down stream, which says “This is a basketball game, you moron”, and it smooths out the anomalous perception into something that makes sense like another basketball player.

But if you watch the video with the prompt “Look for something strange happening in the midst of all this basketball-playing”, you see the gorilla immediately. Your confidence intervals for unusual things are razor-thin; as soon as that neuron sees the gorilla it sends alarms to higher levels, and the higher levels quickly come up with a suitable hypothesis (“there’s a guy in a gorilla suit here”) which makes sense of the new data.

There’s an interesting analogy to vision here, where the center of your vision is very clear, and the outsides are filled in in a top-down way – I have a vague sense that my water bottle is in the periphery right now, but only because I kind of already know that, and it’s more of a mental note of “water bottle here as long as you ask no further questions” than a clear image of it. The extreme version of this is the blind spot, which gets filled in entirely with predicted imagery despite receiving no sensation at all.

2. Imagination, Simulation, Dreaming, Etc. Imagine a house. Now imagine a meteor crashing into the house. Your internal mental simulation was probably pretty good. Without even thinking about it, you got it to obey accurate physical laws like “the meteor continues on a constant trajectory”, “the impact happens in a realistic way”, “the impact shatters the meteorite”, and “the meteorite doesn’t bounce back up to space like a basketball”. Think how surprising this is.

In fact, think how surprising it is that you can imagine the house at all. This really high level concept – “house” – has been transformed in your visual imaginarium into a pretty good picture of a house, complete with various features, edges, colors, et cetera (if it hasn’t, read here). This is near-miraculous. Why do our brains have this apparently useless talent?

PP says that the highest levels of our brain make predictions in the form of sense data. They’re not just saying “I predict that guy over there is a policeman”, they’re generating the image of a policeman, cashing it out in terms of sense data, and colliding it against the sensory stream to see how it fits. The sensory stream gradually modulates it to fit the bottom-up evidence – a white or black policeman, a mustached or clean-shaven policeman. But the top-down stream is doing a lot of the work here. We are able to imagine the meteor, using the same machinery that would guide our perception of the meteor if we saw it up in the sky.

All of this goes double for dreaming. If “perception is controlled hallucination” caused by the top-down drivers of perception constrained by bottom-up evidence, then dreams are those top-down drivers playing around with themselves unconstrained by anything at all (or else very weakly constrained by bottom-up evidence, like when it’s really cold in your bedroom and you dream you’re exploring the North Pole).

A lot of people claim higher levels of this – lucid dreaming, astral projection, you name it, worlds exactly as convincing as our own but entirely imaginary. Predictive processing is very sympathetic to these accounts. The generative models that create predictions are really good; they can simulate the world well enough that it rarely surprises us. They also connect through various layers to our bottom-level perceptual apparatus, cashing out their predictions in terms of the lowest-level sensory signals. Given that we’ve got a top-notch world-simulator plus perception-generator in our heads, it shouldn’t be surprising when we occasionally perceive ourselves in simulated worlds.

3. Priming. I don’t mean the weird made-up kinds of priming that don’t replicate. I mean the very firmly established ones, like the one where, if you flash the word “DOCTOR” at a subject, they’ll be much faster and more skillful in decoding a series of jumbled and blurred letters into the word “NURSE”.

This is classic predictive processing. The top-down stream’s whole job is to assist the bottom-up stream in making sense of complicated fuzzy sensory data. After it hears the word “DOCTOR”, the top-down stream is already thinking “Okay, so we’re talking about health care professionals”. This creeps through all the lower levels as a prior for health-care related things; when the sense organs receive data that can be associated in a health-care related manner, the high prior helps increase the precision of this possibility until it immediately becomes the overwhelming leading hypothesis.

4. Learning. There’s a philosophical debate – which I’m not too familiar with, so sorry if I get it wrong – about how “unsupervised learning” is possible. Supervised reinforcement learning is when an agent tries various stuff, and then someone tells the agent if it’s right or wrong. Unsupervised learning is when nobody’s around to tell you, and it’s what humans do all the time.

PP offers a compelling explanation: we create models that generate sense data, and keep those models if the generated sense data match observation. Models that predict sense data well stick around; models that fail to predict the sense data accurately get thrown out. Because of all those lower layers adjusting out contingent features of the sensory stream, any given model is left with exactly the sense data necessary to tell it whether it’s right or wrong.

PP isn’t exactly blank slatist, but it’s compatible with a slate that’s pretty fricking blank. Clark discusses “hyperpriors” – extremely basic assumptions about the world that we probably need to make sense of anything at all. For example, one hyperprior is sensory synchronicity – the idea that our five different senses are describing the same world, and that the stereo we see might be the source of the music we hear. Another hyperprior is object permanence – the idea that the world is divided into specific objects that stick around whether or not they’re in the sensory field. Clark says that some hyperpriors might be innate – but says they don’t have to be, since PP is strong enough to learn them on its own if it has to. For example, after enough examples of, say, seeing a stereo being smashed with a hammer at the same time that music suddenly stops, the brain can infer that connecting the visual and auditory evidence together is a useful hack that helps it to predict the sensory stream.

I can’t help thinking here of Molyneux’s Problem, a thought experiment about a blind-from-birth person who navigates the world through touch alone. If suddenly given sight, could the blind person naturally connect the visual appearance of a cube to her own concept “cube”, which she derived from the way cubes feel? In 2003, some researchers took advantage of a new cutting-edge blindness treatment to test this out; they found that no, the link isn’t intuitively obvious to them. Score one for learned hyperpriors.

But learning goes all the way from these kinds of really basic hyperpriors all the way up to normal learning like what the capital of France is – which, if nothing else, helps predict what’s going to be on the other side of your geography flashcard, and which high-level systems might keep as a useful concept to help it make sense of the world and predict events.

5. Motor Behavior. About a third of Surfing Uncertainty is on the motor system, it mostly didn’t seem that interesting to me, and I don’t have time to do it justice here (I might make another post on one especially interesting point). But this has been kind of ignored so far. If the brain is mostly just in the business of making predictions, what exactly is the motor system doing?

Based on a bunch of really excellent experiments that I don’t have time to describe here, Clark concludes: it’s predicting action, which causes the action to happen.

This part is almost funny. Remember, the brain really hates prediction error and does its best to minimize it. With failed predictions about eg vision, there’s not much you can do except change your models and try to predict better next time. But with predictions about proprioceptive sense data (ie your sense of where your joints are), there’s an easy way to resolve prediction error: just move your joints so they match the prediction. So (and I’m asserting this, but see Chapters 4 and 5 of the book to hear the scientific case for this position) if you want to lift your arm, your brain just predicts really really strongly that your arm has been lifted, and then lets the lower levels’ drive to minimize prediction error do the rest.

Under this model, the “prediction” of a movement isn’t just the idle thought that a movement might occur, it’s the actual motor program. This gets unpacked at all the various layers – joint sense, proprioception, the exact tension level of various muscles – and finally ends up in a particular fluid movement:

Friston and colleagues…suggest that precise proprioceptive predictions directly elicit motor actions. This means that motor commands have been replaced by (or as I would rather say, implemented by) proprioceptive predictions. According to active inference, the agent moves body and sensors in ways that amount to actively seeking out the sensory consequences that their brains expect. Perception, cognition, and action – if this unifying perspective proves correct – work together to minimize sensory prediction errors by selectively sampling and actively sculpting the stimulus array. This erases any fundamental computational line between perception and the control of action. There remains [only] an obvious difference in direction of fit. Perception here matches hural hypotheses to sensory inputs…while action brings unfolding proprioceptive inputs into line with neural predictions. The difference, as Anscombe famously remarked, is akin to that between consulting a shopping list (thus letting the list determine the contents of the shopping basket) and listing some actually purchased items (thus letting the contents of the shopping basket determine the list). But despite the difference in direction of fit, the underlying form of the neural computations is now revealed as the same.

6. Tickling Yourself. One consequence of the PP model is that organisms are continually adjusting out their own actions. For example, if you’re trying to predict the movement of an antelope you’re chasing across the visual field, you need to adjust out the up-down motion of your own running. So one “hyperprior” that the body probably learns pretty early is that if it itself makes a motion, it should expect to feel the consequences of that motion.

There’s a really interesting illusion called the force-matching task. A researcher exerts some force against a subject, then asks the subject to exert exactly that much force against something else. Subjects’ forces are usually biased upwards – they exert more force than they were supposed to – probably because their brain’s prediction engines are “cancelling out” their own force. Clark describes one interesting implication:

The same pair of mechanisms (forward-model-based prediction and the dampening of resulting well-predicted sensation) have been invoked to explain the unsettling phenomenon of ‘force escalation’. In force escalation, physical exchanges (playground fights being the most common exemplar) mutually ramp up via a kind of step-ladder effect in which each person believes the other one hit them harder. Shergill et al describe experiments that suggest that in such cases each person is truthfully reporting their own sensations, but that those sensations are skewed by the attenuating effects of self-prediction. Thus, ‘self-generated forces are perceived as weaker than externally generated forces of equal magnitude.’

This also explains why you can’t tickle yourself – your body predicts and adjusts away your own actions, leaving only an attenuated version.

7. The Placebo Effect. We hear a lot about “pain gating” in the spine, but the PP model does a good job of explaining what this is: adjusting pain based on top-down priors. If you believe you should be in pain, the brain will use that as a filter to interpret ambiguous low-precision pain signals. If you believe you shouldn’t, the brain will be more likely to assume ambiguous low-precision pain signals are a mistake. So if you take a pill that doctors assure you will cure your pain, then your lower layers are more likely to interpret pain signals as noise, “cook the books” and prevent them from reaching your consciousness.

Psychosomatic pain is the opposite of this; see Section 7.10 of the book for a fuller explanation.

8. Asch Conformity Experiment. More speculative, and not from the book. But remember this one? A psychologist asked subjects which lines were the same length as other lines. The lines were all kind of similar lengths, but most subjects were still able to get the right answer. Then he put the subjects in a group with confederates; all of the confederates gave the same wrong answer. When the subject’s turn came, usually they would disbelieve their eyes and give the same wrong answer as the confederates.

The bottom-up stream provided some ambiguous low-precision bottom-up evidence pointing toward one line. But in the final Bayesian computation, those were swamped by the strong top-down prediction that it would be another. So the middle layers “cooked the books” and replaced the perceived sensation with the predicted one. From Wikipedia:

Participants who conformed to the majority on at least 50% of trials reported reacting with what Asch called a “distortion of perception”. These participants, who made up a distinct minority (only 12 subjects), expressed the belief that the confederates’ answers were correct, and were apparently unaware that the majority were giving incorrect answers.

9. Neurochemistry. PP offers a way to a psychopharmacological holy grail – an explanation of what different neurotransmitters really mean, on a human-comprehensible level. Previous attempts to do this, like “dopamine represents reward, serotonin represents calmness”, have been so wildly inadequate that the whole question seems kind of disreputable these days.

But as per PP, the NMDA glutamatergic system mostly carries the top-down stream, the AMPA glutamatergic system mostly carries the bottom-up stream, and dopamine mostly carries something related to precision, confidence intervals, and surprisal levels. This matches a lot of observational data in a weirdly consistent way – for example, it doesn’t take a lot of imagination to think of the slow, hesitant movements of Parkinson’s disease as having “low motor confidence”.

10. Autism. Various research in the PP tradition has coalesced around the idea of autism as an unusually high reliance on bottom-up rather than top-down information, leading to “weak central coherence” and constant surprisal as the sensory data fails to fall within pathologically narrow confidence intervals.

Autistic people classically can’t stand tags on clothing – they find them too scratchy and annoying. Remember the example from Part III about how you successfully predicted away the feeling of the shirt on your back, and so manage never to think about it when you’re trying to concentrate on more important things? Autistic people can’t do that as well. Even though they have a layer in their brain predicting “will continue to feel shirt”, the prediction is too precise; it predicts that next second, the shirt will produce exactly the same pattern of sensations it does now. But realistically as you move around or catch passing breezes the shirt will change ever so slightly – at which point autistic people’s brains will send alarms all the way up to consciousness, and they’ll perceive it as “my shirt is annoying”.

Or consider the classic autistic demand for routine, and misery as soon as the routine is disrupted. Because their brains can only make very precise predictions, the slightest disruption to routine registers as strong surprisal, strong prediction failure, and “oh no, all of my models have failed, nothing is true, anything is possible!” Compare to a neurotypical person in the same situation, who would just relax their confidence intervals a little bit and say “Okay, this is basically 99% like a normal day, whatever”. It would take something genuinely unpredictable – like being thrown on an unexplored continent or something – to give these people the same feeling of surprise and unpredictability.

This model also predicts autistic people’s strengths. We know that polygenic risk for autism is positively associated with IQ. This would make sense if the central feature of autism was a sort of increased mental precision. It would also help explain why autistic people seem to excel in high-need-for-precision areas like mathematics and computer programming.

11. Schizophrenia. Converging lines of research suggest this also involves weak priors, apparently at a different level to autism and with different results after various compensatory mechanisms have had their chance to kick in. One especially interesting study asked neurotypicals and schizophrenics to follow a moving light, much like the airplane video in Part III above. When the light moved in a predictable pattern, the neurotypicals were much better at tracking it; when it was a deliberately perverse video specifically designed to frustrate expectations, the schizophrenics actually did better. This suggests that neurotypicals were guided by correct top-down priors about where the light would be going; schizophrenics had very weak priors and so weren’t really guided very well, but also didn’t screw up when the light did something unpredictable. Schizophrenics are also famous for not being fooled by the “hollow mask” (below) and other illusions where top-down predictions falsely constrain bottom-up evidence. My guess is they’d be more likely to see both ‘the’s in the “PARIS IN THE THE SPRINGTIME” image above.

The exact route from this sort of thing to schizophrenia is really complicated, and anyone interested should check out Section 2.12 and the whole of Chapter 7 from the book. But the basic story is that it creates waves of anomalous prediction error and surprisal, leading to the so-called “delusions of significance” where schizophrenics believe that eg the fact that someone is wearing a hat is some sort of incredibly important cosmic message. Schizophrenics’ brains try to produce hypotheses that explain all of these prediction errors and reduce surprise – which is impossible, because the prediction errors are random. This results in incredibly weird hypotheses, and eventually in schizophrenic brains being willing to ignore the bottom-up stream entirely – hence hallucinations.

All this is treated with antipsychotics, which antagonize dopamine, which – remember – represents confidence level. So basically the medication is telling the brain “YOU CAN IGNORE ALL THIS PREDICTION ERROR, EVERYTHING YOU’RE PERCEIVING IS TOTALLY GARBAGE SPURIOUS DATA” – which turns out to be exactly the message it needs to hear.

An interesting corollary of all this – because all of schizophrenics’ predictive models are so screwy, they lose the ability to use the “adjust away the consequences of your own actions” hack discussed in Part 5 of this section. That means their own actions don’t get predicted out, and seem like the actions of a foreign agent. This is why they get so-called “delusions of agency”, like “the government beamed that thought into my brain” or “aliens caused my arm to move just now”. And in case you were wondering – yes, schizophrenics can tickle themselves.

12. Everything else. I can’t possibly do justice to the whole of Surfing Uncertainty, which includes sections in which it provides lucid and compelling PP-based explanations of hallucinations, binocular rivalry, conflict escalation, and various optical illusions. More speculatively, I can think of really interesting connections to things like phantom limbs, creativity (and its association with certain mental disorders), depression, meditation, etc, etc, etc.

The general rule in psychiatry is: if you think you’ve found a theory that explains everything, diagnose yourself with mania and check yourself into the hospital. Maybe I’m not at that point yet – for example, I don’t think PP does anything to explain what mania itself is. But I’m pretty close.


This is a really poor book review of Surfing Uncertainty, because I only partly understood it. I’m leaving out a lot of stuff about the motor system, debate over philosophical concepts with names like “enactivism”, descriptions of how neurons form and unform coalitions, and of course a hundred pages of apologia along the lines of “this may not look embodied, but if you squint you’ll see how super-duper embodied it really is!”. As I reread and hopefully come to understand some of this better, it might show up in future posts.

But speaking of philosophical debates, there’s one thing that really struck me about the PP model.

Voodoo psychology suggests that culture and expectation tyrannically shape our perceptions. Taken to an extreme, objective knowledge is impossible, since all our sense-data is filtered through our own bias. Taken to a very far extreme, we get things like What The !@#$ Do We Know?‘s claim that the Native Americans literally couldn’t see Columbus’ ships, because they had no concept of “caravel” and so the percept just failed to register. This sort of thing tends to end by arguing that science was invented by straight white men, and so probably just reflects straight white maleness, and so we should ignore it completely and go frolic in the forest or something.

Predictive processing is sympathetic to all this. It takes all of this stuff like priming and the placebo effect, and it predicts it handily. But it doesn’t give up. It (theoretically) puts it all on a sound mathematical footing, explaining exactly how much our expectations should shape our reality, and in which ways our expectation should shape our reality. I feel like someone armed with predictive processing and a bit of luck should have been able to predict that placebo effect and basic priming would work, but stereotype threat and social priming wouldn’t. Maybe this is total retrodictive cheating. But I feel like it should be possible.

If this is true, it gives us more confidence that our perceptions should correspond – at least a little – to the external world. We can accept that we may be misreading “PARIS IN THE THE SPRINGTIME” while remaining confident that we wouldn’t misread “PARIS IN THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE SPRINGTIME” as containing only one “the”. Top-down processing very occasionally meddles in bottom-up sensation, but (as long as you’re not schizophrenic), it sticks to an advisory role rather than being able to steamroll over arbitrary amounts of reality.

The rationalist project is overcoming bias, and that requires both an admission that bias is possible, and a hope that there’s something other than bias which we can latch onto as a guide. Predictive processing gives us more confidence in both, and helps provide a convincing framework we can use to figure out what’s going on at all levels of cognition.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

271 Responses to Book Review: Surfing Uncertainty

  1. doubleunplussed says:

    Brain is a Kalman filter. Got it.

    • Peffern says:

      You know, I’ve been thinking along these lines for a while, and I never put that together.

    • alexschernyshev says:

      Huh. I remember thinking “Oh this is neat” when I was implementing it in code for a small project. Instead, apparently, I should have been thinking “Oh this is the way I’m thinking”. And feeling. And doing.

    • Michael Naunton says:

      Well, perhaps a stack of Kalman filters with non-linear interconnections and some form of learned linear models in the filters, possibly based on the “surprisal” idea.

    • humantradebot says:

      …or Kalman filter is basic model to simulate the human brain. I mean the brain came first.

    • Controls Freak says:

      No no no! Brain is a bunch of cost functions. Either way, it’s definitely a computer. Either that, or it’s part telegraph, part complex machine, part hydraulics.

      Truth is, I hope various abstractions are useful for a while in psychiatric practice, but I have very little hope that they’re fundamentally true. One of my favorite lines in any paper is the concluding line of this review paper on locomotion:

      The more we dig into the details of these sensorimotor interactions, the more it seems improbable that they should work so smoothly, but they do.

      My own take is similar to my take on why many biological systems seem able to take a bunch of nonlinear components and end up with a system that behaves relatively linearly. It’s not that biology is good at going from nonlinear to linear; it’s that our linear approximations are really really good at modeling sufficiently-abstracted dynamics.

  2. srconstantin says:

    A couple comments:

    1. This suggests that people are natural coherentists, in the philosophical sense. ( We don’t build up all their beliefs from some kind of totally reliable source, like “all our knowledge is derived from sense experience” or “all our knowledge is derived from a priori reason”; instead we test our tentative beliefs to see if they’re consistent with the other stuff we know, believe, and perceive.

    2. There’s something suggestive about generative adversarial networks that seems to be going on here, but they’re clearly not quite the same thing.

    3. The model would also predict, more specifically, autistic people’s sensory strengths. There’s a lot of perfect pitch, hyperacute hearing or smell, and so on. Also more ability to detect small details.

    4. This really isn’t a model of much more than sensory and motor processing, as you’ve presented it. Symbolic logic, how we define categories, and emotions aren’t in here.

    • Scott Alexander says:

      Surfing Uncertainty focused on sensory and motor processing, but I think we probably use the same machinery for cognition and reasoning. Obviously the schizophrenia example is suggestive of this, since that includes some very bizarre reasoning errors. And it seems to me (will need to justify this further later) that autistic people (eg especially some extreme libertarians) relying on their explicit reasoning rather than on popular social consensus is somehow aligned with more bottom-up and less top-down.

      I didn’t include it here because it’s my own speculation and not in Surfing Uncertainty, and I’ll need to write a post justifying it further, but it seems to me like emotions (at least happiness and sadness) are probably global filters that increase/decrease confidence levels of predictions.

      • ScarecrowBoat716 says:

        Wouldn’t emotions be a reaction to certain predictions? If you are surprised by something positive, you feel happy. Surprised by something negative, you feel sad. We know that one’ happiness depends on how happy others around them are, so the brain is constantly changing its standard on what makes it happy.

        I’ve always thought depression is several cognitive disorders with the same general symptoms, and after reading this I bet it all ties back to PP. Perhaps some people with depression over-predict that good things will happen and are constantly surprised that they don’t. Or maybe their surprise function doesn’t work at all so positive changes don’t cause happiness, which itself might be a surprise at the meta-conscious-level and exacerbate the depression (“Things in my life are going so well, I can’t understand why I’m depressed”).

        • jamii says:

          An idea I’ve seen elsewhere is that it’s a reaction to too much stress without emotional reward. So your brain is adjusting downwards the predicted value of applying effort until eventually it doesn’t seem worth getting out of bed anymore.

        • Nancy Lebovitz says:

          Have an anecdote about a woman with bad depression (eventually properly treated when bipolar 2* was identified): I was with her in a car. We saw that there was a fireworks display and stopped to watch it.

          It seemed to me that she enjoyed it, but when it was over, her pleasure just evaporated as though it never happened. Maybe problems with remembering pleasure are related to problems with imagining pleasure?

          *Bipolar 2 is a form of bipolar which has really tiny amounts of mania. It is easily mistaken for ordinary depression, but meds for ordinary depression make bipolar 2 depression worse. I’ve never heard a theory for why that happens.

        • lsmel says:

          Stanford neuroendocrinologist Robert Sapolsky’s recent book ‘Behave’ talks about the role of emotion in cognition – long story short is that it does actually serve a useful predictive purpose. The book deserves it own reading, and judging by this review I suspect will be a useful complement to Surfing Uncertainty. Cliff: Sapolsky tries to describe the biology of the brain to describe a decent set of biological explanations for human behaviour.

          For the time-poor, Sapolsky also did a pretty good podcast with Sam Harris which touched on the basics of the book then some Samharrisy stuff that is tangentially related.

      • Randy M says:

        emotions (at least happiness and sadness) are probably global filters that increase/decrease confidence levels of predictions.

        Very interesting way to put it. Makes a lot of sense. Internal reasoning being something to the effect of “Recent experiences have been net positive. In case of ambiguity, decide in the positive direction.” If I’m happy, your comment was a joke, if I’m sad it was a veiled insult. After a good nights sleep, moods mostly reset to baseline (not neutral!) because enough time has passed that we shouldn’t judge the surroundings by the tone of other recent experiences. Over time, though, moods could perhaps become personalities, if not major shifts, subtle changes towards being generally more cautious or forgiving.
        And of course, moods as perception filters could be more or less adaptive for an environment, so a genetic component is not unreasonable.
        Maybe it’s over fitting a theory, but I like it.

      • Kaj Sotala says:

        but it seems to me like emotions (at least happiness and sadness) are probably global filters that increase/decrease confidence levels of predictions.

        Related paper (well, chapter): Schwarz 2010, Feelings-as-Information Theory

        The theory further predicts that feelings or environmental cues that signal a “problematic” situation foster an analytic, bottom-up processing style with considerable attention to details, whereas feelings or environmental cues that signal a “benign” situation allow for a less effortful, top-down processing style and the exploration of novel (and potentially risky) solutions (Schwarz, 1990, 2002). This does not imply that people in a happy mood, for example, are unable or unwilling to engage in analytic processing (in contrast to what an earlier version of the theory suggested; Schwarz & Bless, 1991). Instead, it merely implies that happy feelings (and other “benign” signals) do not convey a need to do so; when task demands or current goals require bottom-up processing, happy individuals are able and willing to engage in it. A study that addressed the influence of moods on people’s reliance on scripts (Schank & Abelson, 1977) illustrates this point.

        Employing a dual-task paradigm, Bless, Clore, et al. (1996) had participants listen to a tape-recorded restaurant story that contained script consistent and script inconsistent information. While listening to the story, participants also worked on a concentration test that required detail-oriented processing; in contrast, the restaurant story could be understood by engaging either in script-driven top-down processing or in data-driven bottom-up processing. Happy participants relied on the script, as indicated by the classic pattern of schema guided memory: they were likely to recognize previously heard script-inconsistent information, but also showed high rates of intrusion errors in form of erroneous recognition of script-consistent information. Neither of these effects was obtained for sad participants, indicating that they were less likely to draw on the script to begin with. Given that top-down processing is less taxing than bottom-up processing, we may further expect that happy participants’ reliance on the script allows them to do better on a secondary task. Confirming this prediction, happy participants outperformed sad participants on the concentration test. In combination, these findings indicate that moods influence the spontaneously adopted processing style under conditions where different processing styles are compatible with the individual’s goals and task demands, as was the case for comprehending the restaurant story. Under these conditions, sad individuals are likely to spontaneously adopt a systematic, bottom-up strategy, whereas happy individuals rely on a less effortful top-down strategy. But when task demands (like a concentration test) or explicit instructions (e.g., Bless et al., 1990) require detail-oriented processing, happy individuals are able and willing to engage in the effort. […]

        We can form impressions of others by attending to their specific behaviors (bottom-up processing) or by drawing on stereotypic knowledge about social categories (top-down processing). Reiterating the observations from persuasion research, perceivers in a sad mood are more likely to elaborate individuating information about the target person, whereas perceivers in a happy mood are more likely to draw on the person’s category membership. This results in more stereotypical judgments under happy than under sad moods (e.g., Bodenhausen, et al., 1994; for a review see Bless, et al., 1996). Related research into the influence of brands on product evaluation similarly shows higher reliance on brand information under happy than sad moods (e.g., Adaval, 2001). Paralleling the persuasion findings, happy individuals’ reliance on category membership information can be overridden by manipulations that increase their processing motivation, such as personal accountability for one’s judgment (Bodenhausen, et al., 1994).

      • Hyzenthlay says:

        And it seems to me (will need to justify this further later) that autistic people (eg especially some extreme libertarians) relying on their explicit reasoning rather than on popular social consensus is somehow aligned with more bottom-up and less top-down.

        This feels a bit…pompous? I mean, as a libertarian I’d like to believe that my position is based on explicit reasoning rather than social consensus, and it often does feel like I’m swimming against the current. But I’m sure that’s what most people across the political spectrum think about their own beliefs.

        Though, if it is true that autistic people tend toward libertarianism that would be really interesting data. Did the survey results from a while back show a trend like this?

        • Jonas says:

          There’s consistent evidence with the arrows of causation going in all sorts of wrong directions:

          Libertarians have the most “masculine” style, liberals the most “feminine.” We used Simon Baron-Cohen’s measures of “empathizing” (on which women tend to score higher) and “systemizing”, which refers to “the drive to analyze the variables in a system, and to derive the underlying rules that govern the behavior of the system.” Men tend to score higher on this variable. Libertarians score the lowest of the three groups on empathizing, and highest of the three groups on systemizing.

          Note that autistic people are high in systemizing and low in empathizing. The scales were developed by Baron-Cohen for purposes of autism research. Correlation is not […].


          B-C, Sys/Emp:

  3. AnonYEmous says:

    So how does this new finding correspond with your previous attitude on schizophrenia as too much top-down and not enough bottom-up? It seems like now it’s more like “that, and the top-down is screwy”. Or maybe the bottom-up is so screwy that it messes with everything else?

    • Scott Alexander says:

      Some sort of horrible cybernetic chain of cause-and-effects relating to the body’s responses to responses to responses to the original problem. It’s spelled out more in the chapters I cite above, although I don’t understand it on an intuitive enough level to be comfortable explaining it.

  4. gorbash says:

    In statements like “the AMPA glutamatergic system mostly carries the bottom-up stream”, I think I might be confused about how fast neurotransmitters work. Stuff like sensory data is probably generating quite a lot of bits per second, and the signal needs to change pretty quickly so we can react to it. If that signal is getting transmitted through neurons firing electrical impulses, well, that can happen pretty quickly. But if the brain is actually communicating that data to itself by having some cells emit actual molecules of substance, and other cells absorb those molecules — how fast can that happen, really?

    Maybe I could kind of round that statement off to: “the AMPA glutamatergic system controls how much faith the brain puts in the bottom-up stream” or something of that nature?

    • Scott Alexander says:

      Good point, not sure. But we know acetylcholine and dopamine are involved in motor responses, and those seem to be pretty fast.

    • Migratory says:

      Hi. I took a neuroscience class recently. Almost all the synapses in the central nervous system use chemical synapses like this. Direct electrical connections do exist, but they’re less useful for data processing because they allow backflow. Good for synchronizing the network, bad for building logic circuits. Anyway, chemical synapses are very fast. At inter-cellar distances, diffusion of particles is far faster than our intuition suggests. Concentrations equalize across small cavities practically instantly, and the release mechanism for neurotransmitters is very fast. The cleanup is the slow part.

      But you might also be making a mistake about the bitrate required here. The processing speed of the brain is achieved through the large number of pathways, large numbers of synapses, branch structures that share the results of decisions with different circuits, and efficient pruning methods that yield effective heuristics with little data. It’s parallel, not serial, so the rate at which neurons can respond isn’t so important. In fact, keeping the response speed of a neuron low can be important. There are some toxins that shorten the “recharge” rate of neurons, causing death by paralysis.

      • blame says:

        There are some toxins that shorten the “recharge” rate of neurons, causing death by paralysis.

        Interesting, can you name some of them? I wasn’t able to google them…

    • Fossegrimen says:

      The firing of electrical impulses are not done the same way that an electron propagates through a copper wire, but by changing the charge on either side of a membrane through shuffling ions back and forth. Ions are smaller than complex molecules and thus probably faster, but it’s a difference in degree not kind.

    • caethan says:

      It’s also worth noting that transmission *within* a neuron is electrochemical from ion gradients and is extremely fast. The whole myelination bit around axons is meant to speed up that transmission as much as possible, especially for motor neurons. And the axons of motor neurons can be very long: the sciatic nerve connects directly from the bottom of your foot to the base of your spinal cord. So the process of actually getting the sensory data from the foot all the way to the brain is very fast; the actual transmission takes hardly any time, it’s processing that is the slow part. (Which, as others have said, isn’t all that much slower than the transmission.)

    • Randy M says:

      To add slightly to what’s been said, neurotransmitters are pretty tightly controlled and the gaps between neurons at synapses are very small.

      It seems like “vesicle breaks at the membrane surface, chemical is released into intercellular gap, chemical fits into place like a puzzle piece in a specific chemical receptor on the receiving cell’s cell membrane, membrane gate/pump opens to effect a change in charge between the receiving cell and the surrounding space” would be quite a bottleneck, but in fact, given the sizes involved, it happens at the speed of, well, thought. And since you can’t check your sensory data faster than you can think (since perception is thought) perception can’t really be faster anyhow, right?

      We don’t know what our “frame rate” for the outside world is, because these filters would smooth it all out anyway.

  5. I argued in a few recent blog posts that decisions are just self fulfilling prophecies: you believe that you are going to do something, and this belief causes you to do it. It sounds like this is true on many levels, not only on the general level that I discussed.

    • Sniffnoy says:

      As Nate Soares (Rob Bensinger? not sure who to credit for this quote) put it, decisions are for making bad outcomes inconsistent.

    • Sam Reuben says:

      Are you talking about making resolutions to oneself, or willing things to happen? If the former, we all know what goes on around New Years, and that sure doesn’t seem self-fulfilling. If the latter, it doesn’t seem that belief has a whole lot to do with it. I will my arm to raise: it raises. I believe my arm will rise: it might, if some outside force causes it to or if I will it, but not otherwise. Consider the difference in experience between willing yourself into motion and deciding that you will will yourself into motion. I don’t think it’s wrong to say that they’re entirely unlike.

      I believe that you’d get better results if your thesis was that decisions are organizing frameworks for future willings, which are in turn followed or not followed at the critical junctures due to other factors. Otherwise, you’ve failed to explain how people fail to do things which they decide to do.

      • “Otherwise, you’ve failed to explain how people fail to do things which they decide to do.”

        I won’t discuss this here, but you probably did not read the blog posts I wrote on the topic, or you did not understand them.

  6. TK-421 says:

    What do you mean by “embodied”, contextually?

    • drethelin says:

      Embodied cognition is the idea that “processing” of information happens in parts of your body other than your brain, as well as “decisionmaking”. Eg, if you step on something sharp, your foot flinches away from it faster than your brain could actually process and decide to do so, because your nervous system in the foot/leg/spine has patterns for this situation.

      There’s also a lot of physical processing, in the sense that the structure of bones and tendons encodes certain ratios into your physical motions.

  7. drethelin says:

    I strongly recommend Principles of Neural Design, which approaches this topic in a far more bottom-up way, ie information theory, chemistry, and basic electrical signaling, to build up a general picture of how brains do the vitally important thing that they do.

  8. Said Achmiz says:

    This also explains why you can’t tickle yourself – your body predicts and adjusts away your own actions, leaving only an attenuated version.

    Uh, maybe you can’t tickle yourself…

    Edit: am not schizophrenic btw

  9. Nancy Lebovitz says:

    I’ve noticed that depression has something to do with lacking the ability to make positive predictions.

    This might manifest in different ways– good outcomes can’t be imagined even for small actions, so you get lethargy, or good outcomes can’t be imagined on the large scale so life seems like nothing but pain.

    However, this doesn’t explain the pattern of being able to pursue hobbies (low threshold of effort hobbies?) while having trouble taking care of oneself.

    • carvenvisage says:

      >However, this doesn’t explain the pattern of being able to pursue hobbies (low threshold of effort hobbies?) while having trouble taking care of oneself.

      Why does that need explaining? Hobbies take more effort but less enthusiasm. Depression would stereotypically effect enthusiasm (mobilisation of effort) more than fundamental capacity for it

      >I’ve noticed that depression has something to do with lacking the ability to make positive predictions.

      I don’t understand what ‘make positive predictions’ means, but afaik depression is defined by clusters of symptoms, not underlying mechanism, so I wouldn’t expect such a unifying feature across all types.

  10. eqdw says:

    A few years ago, on the tail end of an acid trip, I independently hypothesized an idea that was very, very similar to the Motor Behaviour section. I was thinking on the nature of self, and how psychedelics can affect your sense of self. What I ended up concluding, I tried to summarize as “your ‘Self’ is all and only that which you can predict with near-100% certainty”. The idea being that, I can predict with near-certainty at all times what my arm will do, therefore it is part of my self. I cannot, on the other hand, predict that of other people, and so they are separate selves.

    I followed up this idea to some cyberpunk futurism. Essentially, if this is the case, then if technology can get reliable enough and give us a fast enough feedback loop on, well, anything, then in a very real sense we might be able to say that it becomes an extension of ourselves.

    I find it really interesting that there appears to be a rigourous foundation behind my idle thoughts, and I’m excited to see where this goes. Thanks for writing this up for us, Scott!

    • Migratory says:

      Isn’t this basically what they mean when skilled people say “X feels like an extension of myself.” where X is some tool they’re really good with? When they understand their machine enough to expect its behavior as well as their own body, the machine becomes as much a part of the agent as their body is.

    • adder says:

      , I tried to summarize as “your ‘Self’ is all and only that which you can predict with near-100% certainty”. The idea being that, I can predict with near-certainty at all times what my arm will do, therefore it is part of my self.

      Who’s the ‘you’ (in the first sentence) or the ‘I’ (in the second sentence) that’s doing the predicting? It seems that you’re assuming some prior self.

      • hlynkacg says:

        Flippant answer; “you” in this case would be the meat-based analog computer processing this information. This MBAC classifies objects as A or not A based on the confidence/accuracy of it’s predictions.

  11. adamshrugged says:

    One significant thing that this framework leaves out, or at least seems at odds with, is the notion that perception and many other parts of the mind are modular. Not sure how familiar you are with this idea, but Jerry Fodor proposed that significant parts of cognition are informationally encapsulated – they don’t take into account everything the mind knows. Classic examples are visual illusions; even when you become convinced that the two lines in the Muller illusion are identical length (i.e. you measure them), you can’t help but see them as different. Other examples are geometric realignment, maaaaybe some language stuff. And there’s a bunch of reason to think that most of perception is modular (see e.g. this review). This is in contrast to a “central” process, which is not encapsulated, and allows humans to make inferences that seem to incorporate everything they know.

    It seems like this theory, at least as you describe it Scott, proposes that nothing is modular. All parts of the brain can use everything that the brain knows; top-down influence is inherent to the process. But that seems at odds with a ton of evidence (see above). Maybe this theory is an attempt to describe a central process, but not modular processes? What are your thoughts?

    • Scott Alexander says:

      The book talked about this briefly. Yes, it would be nice to, once you realize that the hollow face illusion is a misperception, correct it and see it accurately. But brute-forcing this would require changing your prior that faces are usually convex, and you might only see the illusion correctly at the cost of seeing real people’s faces on the street as concave sometimes. Your brain pretty rationally decides it isn’t worth it.

      I guess this raises the harder problem of why the brain can’t learn to be context sensitive and only see the illusion that way. My guess is to some degree it can (I think I’ve almost trained myself out of the face illusion) and to other degrees it’s just not “worth” it.

      • Aapje says:

        There is a guy who taught himself to ride a reverse steering bike. So if you turn the handlebar to the left, the front wheel turns right. Turns out that it extremely hard to learn to ride such a bike, because steering and balancing are connected. It took the guy months of training. Once he got it, he could no longer ride a regular bike. It took him 15 minutes of practice on a regular bike before the old algorithm clicked back into place.

        In general, it is known in sports and certain jobs (like shooting a gun in combat situations) that to learn new automatic responses to stimuli often takes quite a long time, especially if it means unlearning previous automatic responses. I think it’s pretty obvious that you can only have one rapid automatic response to a specific stimulus. If you have multiple automatic response that react to very similar stimuli, it becomes very hard to deal with low-precision sense data, because the uncertainty make both automatic responses viable. So then there is a grey area of doubt where you get surprisal and thus slow responses.

        So my answer to:

        this raises the harder problem of why the brain can’t learn to be context sensitive and only see the illusion that way.

        Is that low-precision sense data makes it often impractical to depend on too much context, especially for situations where one answer is correct in 99.9% of the cases. The resulting increase in surprisal is way more costly than the benefits.

        • caethan says:

          The other interesting bit about that story is that he made a reverse-bike for his young son as well. The son also knew how to ride a normal bike, but was much faster at learning the reverse bike, and apparently had less trouble context-switching.

        • bbeck310 says:

          An example of this you might be familiar with–if you regularly have to switch between different keyboards or typing conventions. As an associate lawyer, different partners often had different expectations as to whether you should put two spaces or one after a period. I could type either way smoothly, but transitioning between them took about 10 minutes of typing every time. Now, I have a different issue–on my main work computer, my Ctrl key is the second from left on the bottom row, while on my home work computer, my Ctrl key is the furthest left on the bottom row. If I work on a document on one, and then a few hours type on the other, I always hit the wrong key a bunch of times before I can switch where the Ctrl key is supposed to be.

      • toastengineer says:

        If I really will myself to for a minute or two, I can temporarily stop seeing most illusions like the inverted mask and the horizontal bars. I’m not sure how common that is, but I suppose that implies it’s possible for the conscious mind to poke the perception-processing brain in the ribs and say “NO, that’s NOT what’s there” hard enough for it to sort-of work.

        • Randy M says:

          I tried on the mask, but was frustrated in my attempt to see the face rotate around like I know it should be. If I scroll the page so only the very top is visible it works, but once the eyebrows were in the picture it went back to shaking rather than rotating in my perception.

          • toastengineer says:

            Can you change the dancing woman illusion’s direction of rotation at will?

          • random832 says:

            No-one’s been able to explain to me which way the dancing woman is “actually” rotating, or prove that the animation contains any information at all about the rotation direction, so I’m not convinced being able to switch really means anything compared to which direction someone ‘prefers’ to see it in.

            I don’t recall whether or not the survey even asked which direction people who couldn’t switch it saw it in, but the results he posted didn’t mention it – I’m assuming the binary percentages he posted were of who did and did not report being able to switch it.

    • Eli says:

      You’re looking for the question of cognitive penetrability of perception.

  12. enoriverbend says:

    Speaking of selective inattention, I first read this as “science was invented by straight white men, and so probably just reflects straight male pattern baldness, and so we should ignore it completely”.

    I’m not sure what that means about me. Except the obvious.

    Also, of course, my version is just as convincing as their version.

  13. jeff daniels says:

    This is exactly what I’m studying in my interaction design program, so thanks for the relevant book review! I’m only a couple chapters into my textbook at the moment, but it’s extremely similar to this. So far it’s mostly centered on Signal Detection Theory, which refers to how humans differentiate between 2 discrete world states: one in which the signal exists, and one in which the signal doesn’t not.

    Here’s a Wikipedia article about it

    One interesting thing here is that people are good at adjusting their signal detection if the payoffs change (“If you miss a nuke on the Nuke Detection System, the payoff will be horrible and we’ll all die”) but comparatively bad at adjusting when the probabilities change (“The manufacturing equipment is faulty this week so expect a 10% higher chance of faulty products that must be caught by quality control.”) As usual, narratives > probabilities.

  14. biblicalsausage says:

    Not to get too pop-psychology with you all here, but the “positive thinking” crowd might love this. If it is indeed the case that limbs move in response to your brain’s belief that they will move, perhaps good things happen to people who think good things happen to them because the brain makes all kind of adjustments to cause the expected future to occur.

    I’m not saying it’s true, and I mostly really dislike the positive thinking crowd, but it might be worth thinking about a little.

    • biblicalsausage says:

      A second thought. If actions are the result of your brain doing things your brain predicts that you will do, we may have an excellent and simple explanation for why habits are hard to break.

      Or, in evolutionary terms, an added benefit to any creature that works via obeying its own predictions — it’s a really effective mechanism for that creature to learn habits.

    • Migratory says:

      I think the distinction here is between doing things and things happening to you. You are in complete control over your arm, it’s pretty well wired up with control systems. Not so with the part of the world that the positive thinking crowd wants to influence. It makes sense that you can do things by expecting them to happen, because the part of the world doing the thing is under your control. It doesn’t make sense that things will happen to you because you believe they will happen, unless you are already in complete control of whatever part of the world is acting on you.

      • JL says:

        However, it does make sense that you will *notice* more good things happening to you if you expect good things. They’d be happening anyway if you weren’t expecting them, but the brain would hide them as noise.

    • Rachael says:

      I was coming to say something similar. Particularly the kind of positive-thinking visualisations that involve motor skills, like repeatedly visualising yourself doing a perfect dive or a perfect tennis serve. I had thought that was woo-woo, but this prediction model makes it more plausible.

      • andrewflicker says:

        As someone pretty critical of the “positive thinking” crowd too, this isn’t really what we mean. This kind of movement-visualization technique is on pretty firm ground. What I/we object to is the more magical-thinking sort of hypothesis that goes “think about yourself getting a raise, and you will” or “think about nebulous good things happening to you, and people will be nice to you”, etc. It’s not about enforcing your own behavior, but as adjusting the universe around you.

        • biblicalsausage says:

          Right. But I wonder if some of the much less impressive effects, like imagining oneself successful at one’s job, might be explained by the optimism cranking up your performance a bit. Unless the effect is much more tightly restricted to lower-level processing.

  15. jimmy says:

    if you want to lift your arm, your brain just predicts really really strongly that your arm has been lifted, and then lets the lower levels’ drive to minimize prediction error do the rest.

    This is something you can see pretty clearly when playing with hypnosis. Basically, you can play games that cause people to anticipate their arm “raising on its own”, and when you succeed their brains will actually raise their arms to match the predictions.

    This has some cool implications along the lines of “intentions” and “expectations” being sort of interchangeable, so when you are struggling with something as an anticipation, it’s often helpful to play with the “intention” framing, and when you struggle with something as an “intention” its often helpful to look at it as an anticipation. For example, people have a hard time “intending” to do things that they don’t anticipate will work, but if you can make room for that possibility by removing the pressure to succeed (basically, allowing them to dismiss the alarm), people can often do stuff that they were convinced they “didn’t know how” to do. The first example, and therefore the one that sticks out in my mind, is telling my wife to remind me to turn the oven off when I get into bed. She couldn’t intend to do it because her expectations were being dominated by “I will be asleep, and I cannot remind you to do thing while I am asleep”, but I told her that it’s okay if she fails and to intend to do it anyway. She reminded me even though she was asleep and didn’t even remember the next morning. I use this kind of thing *all the time*.

  16. Thecommexokid says:

    how many times have I repeated the the word “the” in this paragraph alone without you noticing?


    Every time.

    The minute I saw the tip of the triangle as I was scrolling I knew what was coming and tried to start reading one word at a time so I wouldn’t miss them.

    And I still missed all of them.

    Including the one in the damn question asking me if I’d noticed them.

    • entobat says:

      You missed the the giant, gaping-wide-open opportunity to throw in a few duplicate articles in your comment.

    • Null Hypothesis says:

      When Scott started doing that I spent a few months being more careful reading through his posts to watch for that.

      But the higher-ups in my brain decided that was too much attention. So now I just ctrl-f “the the” before reading and it highlights them all for me. Works great.

      • Aapje says:

        I just ignore them, unless he points them out, which he does if they matter to his point. I see no benefit to try to prevent having my perception biases pointed out to me too pointedly, because such an effort doesn’t make the biases go away*, it merely reduces cognitive dissonance for those who seek to be rational and have trouble accepting their humanity, IMHO.

        * Which is not necessarily desirable anyway, since they exist for a good reason.

  17. Rachel Shu says:

    Epistemic status: so, so speculative. The process doesn’t matter if the prediction is testable. Citations are a form of oppression.

    I think this suggests an etiology for the connection between transgender identity and autism. Not all gender-atypical people are transgender. Since gender expectations are socialized, then gender-atypical people who process normally might tend to go with a mental model of “society says I’m my birth gender, so whatever” (cis by default). Gender-atypical people who process bottom-heavy might be more likely to notice a persistent mismatch between the social expectations of their gender, and their actual gender-atypical presentation, and choose to transition or ID as nonbinary based on that.

    Wait, this doesn’t match up with another transgender experience. The classic phenomenon of transgender identity is gender dysphoria, the feeling of being trapped in the wrong body, persistently having a mental expectation of a different body. Perhaps gender identity is a hyperprior, which is not modified by sensory input. This could explain gender dysphoria as a cognitive dissonance between top-down insisting on one gender and bottom-up insisting on another.

    Transgender is bottom-up, transgender is top-down; how to resolve this? While Blanchard’s autogynephilic/homosexual typology of transsexuality fails to capture the lived experiences of transgender people, he did have (possibly flawed) research data to support the presence of distinct etiologies. Perhaps this is a better typology? I can’t say whether they’re non-overlapping, or that there’s only two, but I hope this indicates some areas for potential research.

    • Eli says:

      This is where we start getting into arguments about embodiment. Gender identity would seem a really good candidate for an Embodied Hyperprior: something where a person’s ontogenetic development has forced the brain to expect certain things with certain precisions. Your development forces your brain to have a high-precision expectation of maleness, and you were born male, you become a “manly man”. Weak expectations and you’re a fairly modal cis-by-default person. Strong, high-precision expectations for a body you weren’t born with, especially if those expectations have affective charge (they feel more like oughts than is’s), and you get gender dysophoria. Apply the “just move the joints!” principle (active inference) to gender dysphoria, and you enact the expectation of a particularly gendered body by transitioning, resolving the prediction errors.

      The problem comes if you try to resolve the prediction errors the other way around: by just learning to expect the gender you were assigned at birth. If the “gender hyperprior” is neurally encoded, you can possibly do that (even if you really wanted to be the other gender, you might learn not to identify as such). If the “gender hyperprior” is embodied, something the body is forcing on the brain, no amount of Bayes updates will get rid of it, and it will behave more like a control signal (“I must be a man, I am a man!”) than like an epistemic belief.

      (I’m sure I somehow just made someone vehemently object to predictive processing theory on grounds that anything giving a scientific grounding for gender dysphoria can’t possibly be real, but oh well.)

    • caethan says:

      Your distinctions aren’t complete. I was and am not gender typical, especially as a boy – I like children, have a lot of female-coded hobbies (sewing/quilting, baking, etc.), love traditionally female stories (e.g., my leather-bound Jane Austen collection), and so forth. And misgendering me is one of the few guaranteed ways to send me into a frothing rage. When I was a kid, it was mostly done by bullies to make fun of me. Ironically, as an adult, it’s far more likely to come from transgender activists who want to tell me that my atypical hobbies and interests mean I’m really a woman.

    • eyeballfrog says:

      >Since gender expectations are socialized, then gender-atypical people who process normally might tend to go with a mental model of “society says I’m my birth gender, so whatever” (cis by default). Gender-atypical people who process bottom-heavy might be more likely to notice a persistent mismatch between the social expectations of their gender, and their actual gender-atypical presentation, and choose to transition or ID as nonbinary based on that.

      A related explanation might be that autist-like processing consistently makes categories too narrow. A normal processor can put the correct error bars on gender traits to fit >99.9% of people into the binary, and then note which side he falls on. The autist-like processor puts too low of error bars on the categories and ends up with large numbers of people (possibly including himself) outside the bars, and thus concludes there must be other gender categories to cover the space.

      • Nornagest says:

        That would predict high numbers of genderqueer autists, but correspondingly low numbers of classically transgender autists. Don’t think that’s what we see in the wild.

        • Aapje says:

          I do think that, in modern society, women are allowed a lot more diversity while still being recognized as being feminine than men are allowed while still being recognized as masculine*. So it makes sense to me that the people who experience social gender dysphoria could be fairly happy identifying as a woman, as the social approval to social disapproval ratio would then be higher than if they identify as a man or as non-binary (especially since other people are used to dealing with only two categories, so anything outside those two gets a lot of social disapproval).

          * For example:
          – women can work or be a stay at home mom and still generally be considered feminine. Men who choose the latter option are considered unmasculine.
          – women can wear pants or a skirt and still generally be considered feminine. Men who choose the latter option are considered unmasculine.
          – women can choose a job where they interact a lot with kids or not and still generally be considered feminine. Men who choose the former option are often considered not to be motivated by ‘fatherly’ love for children, but pedophilia.

          • Deiseach says:

            Women can wear pants or a skirt and still generally be considered feminine. Men who choose the latter option are considered unmasculine.

            Took quite a while for this to happen, though. Women were considered unfeminine and aping men, and up to the 30s women dressing in male-style attire were generally taken to be coded as lesbians – or at least the type of horsy, hearty, independent spinsters who never wanted to marry and were either well-off enough or earning a living in one of the professions to be able to do their own thing (as in this image from the 1920s set Miss Fisher’s Murder Mysteries, where the lady in trousers is a doctor). And you could argue that since men’s clothing is considered a signal of higher status, then wearing men’s clothing is claiming that higher status, so that once the barrier to ‘lower status’ claimants is overcome, then it becomes acceptable. Men wearing women’s clothing, though, is going from higher to lower status and since women wanted male-styled clothing for various reasons including ‘rational dress’ , comfort, convenience, freedom of movement, etc, but men did not want female-styled clothing because it was cumbersome, so there were not the same numbers of men wanting to adopt female dress and therefore it never became a social movement in the same way.

            But dress and ornament are very heavily culture-dependent; there are, for instance, separate styles for men’s earrings and women’s earrings in Hindu mythology, as in the adornments for figures of Ardhanarishvara:

            The right ear [male side] wears a nakra-kundala, sarpa-kundala (“serpent-earring”) or ordinary kundala (“earring”). The left ear [female side] wears a valika-kundala (a type of loop earring).

          • Aapje says:


            I frequently see feminists make the claim that anything linked to men automatically has higher status, but this never makes sense to me.

            For example, many dirty and dangerous jobs were considered the exclusive domain of men, but higher class people also looked down people performing these dirty and dangerous jobs. They greatly favored not helping as their driver would fix their car/carriage and people of the time also insisted that women would not help fix the car/carriage. So they favored that women would do what was higher status.

            If you look at what people of the time actually gave as their motivation for not allowing women to do these jobs, it had little to do with status and more with the claim that women were not capable of doing and/or enduring these jobs due to their frail physique, tendency to faint and such.

            Their motivation for opposing women dressing as men is commonly linked to a desire that women don’t behave like men, which makes sense from their perspective, as (male) clothing was often tailored to the job. People still frequently oppose things that are correlated to and/or seen as enabling of behavior they disapprove of.

            I pretty strongly oppose the practice of rejecting the actual arguments that people had at the time, when those are pretty consistent with their behavior, in favor of a narrative that matching a modern narrative better.

        • Ozy Frantz says:

          Autistic trans men are really common.

    • chridd says:

      The gender identity as a hyperprior thing goes with what I understand about gender identity, so I wonder if one possible response to this disconnect is to ignore top-down processing entirely (or, at least, more than would be necessary) and instead rely mostly on bottom-up processing, thus becoming autistic (or autistic-like).

      (Predictions: If this is the case, one would expect to see both a larger overlap between autism and trans-ness and a tendency for autistic trans people to have less obvious dysphoria and be less likely to already think of themselves as their gender identity; unfortunately, that’s also what you’d expect to see if autistic people are more likely to come out and transition for any given level of dysphoria, or if transgender-ness connected to autism is a different but similar thing to other transgender-ness. It also predicts that transition makes trans autistic people less autistic, and potentially for autism to manifest differently among trans autistic people and cis autistic people. This would also suggest a possible connection between trans-ness and schizophrenia, where people instead ignore bottom-up and rely too heavily on top-down, that schizophrenic trans people hallucinate that they’re the correct sex, and that transition makes such people less schizophrenic. Actually I’m not sure I’m understanding the schizophrenia stuff correctly…)

  18. Thibaut says:

    The “motor behavior as anticipation of perception” idea seems to me to explain a frequent experience among those of us who are scared of height. When looking down from a high point, we do not only feel a fear of falling down, but quite often also an urge to jump. Though this experience is frequent, it feels disturbing and hard to account for. But of course if the brain makes little difference between “anticipating a sensation” and “initiating the motor command to elicit such a sensation”, then imagining a fall should be experienced exactly as wanting to jump.

    (Same thing, maybe, for the widespread urge to squeeze cute things: the natural protective thought “This cute baby would be defenseless, would something happen to it” is experienced as “I want to hurt this cute baby”, as one imagines oneself as the agent of the situation anticipated)

  19. Dedicating Ruckus says:

    Regarding the autism connection…

    I am not autistic, but I am definitely over in the corner of personality space closest to autism. As a separate datum, I’ve always been able to reproduce unfamiliar vocal sounds (usually just from foreign languages) easily and well after hearing them once or twice. (This has been to the point where people drastically overestimate my fluency in a language based on the quality of my accent.)

    During language study, it became obvious that not everyone is like this. In particular, where a foreign language has a sound that doesn’t quite map to English (the Japanese ‘r’ was notable here), many people will mispronounce the sound, hear a corrected pronunciation, and be unable to distinguish between the two or correct their earlier mistake. For some time I had a hypothesis, supported by some other anecdotes from similarly autistic-adjacent acquaintances, that the semi-autism was related to this, and caused me to successfully parse out differences that were discarded pre-consciously by a neurotypical’s brain. This seems consistent with the PP model’s description of autism.

    Is this a thing that’s ever been studied? It certainly seems consistent with hypersensory abilities as mentioned above.

    • Orual says:

      Another data point. I am in a similar place on the spectrum (i.e. not autistic but close) and I have had a similar experience with languages and accents. I can definitely hear sounds in foreign languages which many other native English speakers do not. This helps in the quality of my reproduction of the sounds (if I am singing a song in the language, for example) but I think it actually hurts my level of confidence (which seems to be tied into the amount of progress made in learning to speak) when trying to learn or speak a new language, since I am painfully aware of any error in pronunciation that I make.

    • Ozy Frantz says:

      For the sake of avoiding bias: I have an autism diagnosis and I appear to have a normal inability to recognize unfamiliar vocal sounds. I am also more-or-less incapable of recalling music unless there are words attached, which I am pretty sure is abnormal. But then my brain is sort of hyperspecialized for dealing with words-as-units-of-meaning and I might just be saying “there are many savant skills I don’t have.”

      • sandoratthezoo says:

        And I’m not very autistic, I think, and am pretty good at recognizing unfamiliar vocal sounds (not perfect, certainly, but good enough that I’ve been notably better at it than classmates in foreign language classes).

    • Eli says:

      One of the things Predictive Processing doesn’t cover yet is the “quality” of the “predictive hardware” in the various portions of the brain. If your auditory processing has a low neural noise (each layer passes down almost exactly the signal it received), we could plausibly predict you’ll be “auditorily smarter”. You’ll perceive sounds more clearly and identify their components more precisely.

      (I’m very good at sound, but shitty at visual processing. For a long time I sucked at recognizing faces, but I have such a well-tuned ability to sing that people have repeatedly told me to buckle down, learn an instrument, and join a band already!)

      This could even be a core mechanism behind human intelligence: low neural noise would also mean near-lossless mental simulation. You could then just expect that “deeper” (more folded/layered) brains with a greater ratio of brain to body (more brain layers to spend on simulation rather than on actively coordinating the senses) and low neural noise will be smarter in a fairly recognizable sense.

  20. ScarecrowBoat716 says:

    I would bet that women tend to view the world from a more top-down perspective and men a more bottom-up perspective. Women tend to be more religious and superstitious which I assume entails prioritizing predictions over sensory data. Men tend to be more conservative which seems to me like bottom-up processing – this is how things are, don’t try and change them (too much surprise = alarms). Evolutionarily this makes sense as well. Men need to be detail-oriented while hunting, while women taking care of the family need to trust what they know about the world first – “women’s instinct.”

    It’s also possible this is a whole lot of retroactive analysis. But we already know men and women process things differently and at the very least PP fits right in.

    • Charlie__ says:

      I know you already mentioned it, but I still want to pick on the single sentence of evo-psych.

      Why don’t women taking care of the family need to notice new and surprising details, or men hunting need to be able to pick patterns out of noise? Evolutionary just-so stories are just too easy, especially when we’re just playing around with a vague and biased understanding of what life was like for hunter-gatherers.

      Also, it might be interesting to see if there’s a gender imbalance in who’s annoyed by tags on clothing.

      • engleberg says:

        My mom raised me to think only classless people left tailor’s marks on clothing- the better element has a tailor with the good taste not to put them on, and when poverty forces us to buy mass-produced we cut them off. Legible clothing is worse. Legible clothing with an alligator or some mass-production tailor’s name on the front is the lowest. I don’t bother, but there you are.

      • Ozy Frantz says:

        The useless answer to that question is “yes, because men are more likely to be autistic.”

        • INH5 says:

          Assuming that higher autism rates among men are due to actual prevalence rather than underdiagnosis of autism in women and/or overdiagnosis of autism in men.

          • Aapje says:

            If Baron-Cohen is right that (some) autism is extreme masculinity (as in system thinking, difficulty to read expressions, etc), then women with autism can be far more masculine than other women, but not stand out that much compared to all people.

            For example, imagine that
            – almost all men have a masculinity index between 5 and 10
            – almost all women have a masculinity index between 2 and 7
            – autism causes an increase in masculinity by 3 points

            – autistic men will have a masculinity index between 8 and 13, putting many outside of the range of normal men
            – autistic women will be have a masculinity index between 5 and 10, putting all of them within the range of normal men

          • INH5 says:

            I highly doubt that, because Baron-Cohen’s theory seems to have more holes than Swiss cheese. On some measures (like preference for things vs. people), both autistic boys and girls do appear to be “hypermasculine,” but on other measures (like aggression) they appear to be “hyperfeminine,” and on still other measures (like digit ratio, hormone levels, and likeliness of being trans) they show opposite results by gender, with autistic boys being more “feminine” than the average boy while autistic girls are more “masculine” than the average girl.

            When I brought up possible differential diagnosis rates of autism between women and men, I was thinking more along the lines of autism being stereotyped as a “boy thing,” and so girls with similar traits are more likely to be missed or diagnosed with something else. Of course that’s just a hypothesis, but given the very strong evidence that the recent increase in autism rates is primarily due to an increase in diagnosis rather than an actual increase in prevalence, I think we should strongly consider the possibility that other demographic differences in autism rates are also at least partially due to different rates of diagnosis rather than different rates of prevalence.

    • Baeraad says:

      Being more conservative strikes me as top-down processing more than anything – “let’s assume that that jaguar is going to behave like jaguars usually behave, because if it’s learned to think outside the box I’m probably screwed anyway, and if I pause to second-guess myself every two seconds I’m definitely going to get eaten.”

      In fact, I think that just generally, my preconceived notions of gender are the exact opposite of your pre-conceived notions of gender. Which is kind of interesting.

      • toastengineer says:

        It seems to me like unless you’ve actually spent your lifetime studying evolution, psychology, and how the ancestral environment actually worked in practice, evopsych is going to act as an universally general argument.

        You can make a hell of a lot of mutually-contradictory things seem intuitive that way; “women should be attracted to dangerous men because they’re more able to provide for/protect them due to that dangerousness” vs. “women should be attracted to gentle and caring men because they’re more likely to make sacrifices to protect/provide for them and your mate is the person you’re going to spend the most time around so you don’t want him to be violent in general”

        • Baeraad says:


          Though judging from what sort of fictional character women tend to swoon over, the ideal man as far as many women are concerned seems to be one who is gentle and caring to them while being dangerous and aggressive to everyone else. Or one who seems dangerous and aggressive, but is secretly a big softie at heart. Or some other combination along those lines.

          It looks to me like both those considerations play a part and try to reach a compromise with each other, is what I’m saying – which means that the problem is not so much with evo-psych as with overly reductionist thinking.

          • Aapje says:


            Jordan Peterson more or less argued this when he said that the archetypical romance story for women is the domesticated beast. Examples of such stories are The Beauty and the Beast and 50 Shades of Grey. The latter is/was absurdly popular among women, of course.

            It definitely seems to match the very popular gender narrative where men are assumed to be brutes whom women are supposed to ‘fix’ and/or steer into acting well.

  21. Jed Harris says:

    Thanks Scott, I’ve always liked Andy Clark but didn’t know about this big step up!

    There are four good blog posts by Andy summarizing his thinking (from 2015 when the book was published). The second one is especially helpful in understanding how much PP focuses on organism relevant action, and how little it aims at “accurate representation” which has historically gotten much more love.
    First, Second, Third, Fourth.

    • Eli says:

      The big problem with Clark’s arguments about embodiment and action-orientation in the PP context is that nothing in the PP model ever actually defines what it means for prediction errors to be organism salient. After all, one can have very precise signals which are nonetheless completely useless to the organism.

  22. tjohnson314 says:

    On the brain producing dreams – I’ve found that after a year of studying Chinese, I still can’t follow anything that native Chinese speakers say, unless they speak very slowly. But I have dreamt multiple times that people are talking to me in Chinese, even if I have no idea what they’re saying.

    I’m very curious whether my subconscious is actually generating something approaching correct Chinese, but unfortunately it seems impossible to test.

    • Hyperfocus says:

      I remember one time I met a girl in a dream, and asked her name. She grinned and said “himitsudayo” (“I’m Himitsu!”). When I woke up I thought it was an unusual name so I went to look up its Kanji and meaning. After learning what himitsu means, what she actually said to me was “It’s a secret!” (himitsu means Secret). Since I didn’t consciously know what it meant, but it made sense grammatically and in context, I think it’s safe to say that your subconscious can know more of a language than you consciously do.

  23. Anatoly says:

    >This also explains why you can’t tickle yourself – your body predicts and adjusts away your own actions, leaving only an attenuated version.

    Why can you slap yourself and pinch yourself? Why doesn’t my body predict and adjust away that?

    Why does it hurt a lot when I deliberately punch a wall, even though the result is completely predictable and predicted? Shouldn’t it hurt *a lot* less than if I punch a wall disguised as something spongy?

    • TeMPOraL says:

      Why can you slap yourself and pinch yourself? Why doesn’t my body predict and adjust away that?

      Getting pinched by someone else hurts me much more when I try to do it myself. Part of it is most likely the effect of surprise, as even pain caused by someone else gets attenuated if I fully expect it. As for slapping, personally I can’t even slap myself very hard – my body doesn’t let me.

      Why does it hurt a lot when I deliberately punch a wall, even though the result is completely predictable and predicted? Shouldn’t it hurt *a lot* less than if I punch a wall disguised as something spongy?

      At least for me, deliberate punch will hurt less. Not a lot less, but noticeably less. Which can be explained by brain attenuating the part of the experience it expects, but not eliminating it completely.

      • Dedicating Ruckus says:

        As a data point, in my martial arts training we do body conditioning by repeatedly slamming our limbs against hard objects and/or each others’ limbs. It is definitely possible to notably reduce the pain of this by fully committing yourself to doing the strike (a normal action-visualization technique), even though this leads to the strikes being harder rather than softer.

    • Kaj Sotala says:

      The function of pain is to prevent you from hurting yourself. Tickling doesn’t do any damage to you, so it can be predicted away; but you don’t want to predict away signals of potential damage, you want very clear signals of “no, don’t do that”.

    • Null Hypothesis says:

      Why can you slap yourself and pinch yourself? Why doesn’t my body predict and adjust away that?

      This is the perfect example/metaphor for how our perceptions are colored by expectations (ie biases), but not helplessly enslaved and blinded by them.

    • LibertyRisk says:

      I’m probably at least 1 SD stronger than the average male when it comes to grip strength (leftover from wrestling in high school I think). When I pinch people it hurts. I just tried it and I can pinch myself on my arm with maximum force, to the point where my fingers and hand starts shaking with the effort. It hurts quite a bit but I don’t think I’d be able to tolerate it at all if someone else were doing it to me (and others will freak out if I pinch them with something like 30% of the force I just used on myself). It seems like it’s at least being partially attenuated.

  24. j r says:

    This fascinating, largely because it corresponds quite well to a set of beliefs that I have long held. I was an undergraduate philosophy major at one of the few schools in the United States with a program that taught continental philosophy. My interest was always in phenomenology and particularly the French phenomenologist Maurice Mareleu-Ponty, who in 1942 and 1945 published two books, The Structure of Behavior and Phenomenology of Perception.

    The Stanford Encycopedia of Philosophy says this about The Structure of Behavior:

    Merleau-Ponty aims to integrate the truth of naturalism and transcendental thought by reinterpreting both through the concept of structure, which accounts for the unity of soul and body as well as their relative distinction. Against the conception of transcendental consciousness as a pure spectator correlated with the world, Merleau-Ponty insists that mind is an accomplishment of structural integration that remains essentially conditioned by the matter and life in which it is embodied.

    Much of Merleau-Ponty’s work grows out of gestalt psychology and in Phenomenology of Perception, he argues that the phenomenon, the interaction of subject and object, is the basic unit of perception and that attempting to separate the whole and posit a philosophy from purely the subjective or the objective experience would always be incomplete. That jives quite well with this:

    The key insight: the brain is a multi-layer prediction machine. All neural processing consists of two streams: a bottom-up stream of sense data, and a top-down stream of predictions. These streams interface at each level of processing, comparing themselves to each other and adjusting themselves as necessary.

  25. Andrew Hunter says:

    I find this interesting to compare to an old post by someone relevant to this forum 🙂

    Both make Bayesian models out of delusion, but it seems the new model is dramatically different here in which part is breaking down–though I suppose we’re not talking about precisely the same delusions. What say you about that piece, Scott? Do you think its theory fits in this framework?

  26. Markus Ramikin says:

    I dunno, even after seeing your spoiler, the left picture being a dalmatian makes about as much sense to me as this. The cow was instantly obvious, however, not even a moment of blur.

    EDIT: Hah. Right after writing this, I looked at the spoiler for I think the fourth time, and this time I paid attention to the word “drinking”, and only THAT resolved the picture for me, including the non-dalmatian parts of it. As long as I didn’t understand the rest of the picture, I couldn’t see the dog either.

    • toastengineer says:

      Me, I saw the dalmatian instantly and the cow eluded me until I was spoiled.

      I bet it’s dependent on random chance of what part of the image you looked at first.

  27. kboon says:

    > how many times have I repeated the the word “the” in this paragraph alone without you noticing?

    Joke’s on you. I’m always on the the lookout for these sorts of shenanigans when reading about optical illusions.

    The part about the brain achieving motor functions by predicting what it would seem like if the motion was already done reminds me of a party trick. Get a string with a bead or something heavy on one end. A necklace might work. Hold it between thumb and index finger with so the heavy part is at the bottom. Look at it, and imagine really hard that the bead is swinging sideways, while trying not to move your hands. You will notice the bead starts moving with the power of your mind!. Similarly, you can stop the movement by imagining it swinging back and forth.

  28. Sparky Z says:

    My guess is they’d be more likely to see both ‘the’s in the “PARIS IN THE THE SPRINGTIME” image above.

    You should do a study!

  29. apm says:

    Are there any artificial agents or robots implementing some of these ideas?

    • taktoa says:

      Generative adversarial networks are pretty similar to this idea.

    • CyberByte says:

      These ideas have been around in AI for a very long time. Jeff Hawkins writes about something very similar in his book On Intelligence, and his Hierarchical-Temporal Memory architecture implements this. Vicarious’ Recurrent Cortical Networks and Itamar Arel’s Deep SpatioTemporal Inference Network (DeSTIN) are somewhat similar. All of these have been used in various robots.

      Aside from those, the idea of using “context” and “top-down feedback” has been implemented in countless systems. Bernard Hengst et al’s recent paper “Context in Cognitive Hierarchies” uses this to control a Baxter robot, and Shrivastava et al implemented something like this in a Convolutional NN (not really an agent/robot I guess) in “Beyond Skip Connections: Top-Down Modulation for Object Detection”.

      I’m sure I’m not doing justice to the vast amount of work that’s out there, but I hope this helps sate your curiosity a little bit.

      • apm says:

        Thanks, I’ve read Hawkins, and it seems related, he was talking a lot about prediction. I’m wondering what he’s up to now and how much success he’s had with HTMs. Thanks for the other references, as well!

    • Null Hypothesis says:

      Estimation and control systems are built on very formalized, mathematically exact integrations of noisy sensor data.

      In the most basic form, a Kalman Filter is a Markovian sort of feed-forward filter. It takes it sensor data and performs a weighted least-squares estimation of your state based on the relative noise of the sensors. Then it takes your previous estimated state, one time-step back, forward-predicts, and uses that as another noisy measurement, with the noise being the confidence of the last state estimate, plus the ‘process noise’ involved in predicting the future.

      Bayes says that any information is good information (so long as you properly know how much to trust it). Therefore two uncertain measurements can give you an estimate more accurate than either of them.

      So if you start with just noisy sensor measurements, and no prediction, you get some sort of moderately noisy estimate. And if you just kept listening to your sensors that’d be the limit of your certainty forever. But with a Kalman filter, now you take that noisy estimate, make an even noisier prediction shortly into the future, and combine it with new sensor data. Now you’ve got the noisy sensors again, plus a third noisy estimate. This should give you a better, less noisy estimate than the first time. Now your estimate, and thus your future prediction of timestep 3 is less noisy. So this less-noisy-prediction plus the sensors gives you a still less noisy estimate of step 3. It’s positive feedback. Better estimate = better prediction = better new estimate.

      Unfortunately the noise doesn’t converge at 0 and give you 100% certain estimates. It’s bounded by the process noise – the inherent uncertainty you add when you take your estimate and predict into the future. But this will quickly converge to a significantly lower-noise (ie lower average error) estimate of your system than what your sensors alone could ever give you.

      Incidentally, the Kalman Filter (or more properly a number of optimal-control systems that exploit it) was essentially developed in order to get us to the Moon. Successive approximations give unbelievably tighter estimates than direct sensor measurements. And in space hitting your target speed and position is quite important.

      • toastengineer says:

        Is the integral term of a PID controller ultimately just a simplified version of the same idea? “When you expect that you should be at setpoint because P and D have stabilized, but you’re not, alter the control signal by the difference between observed and desired/predicted value.”

        Control system theory is incredibly interesting to me and it’s really upsetting that I have such a hard time with the mathematics that I still haven’t gotten past PID. 🙁

        • Controls Freak says:

          Not really. Kalman filters are state estimators, not controllers. Try to bin them separately, even though they can be tied together sometimes. An integral controller doesn’t set out to say, “My measurement is probably bad and needs to be corrected.” Instead, the motivation is, “Maybe there’s an unmodeled, persistent disturbance that is causing a steady-state error to persist.” It does have some smoothing effects when it’s feeding back a noisy signal, but the purpose isn’t estimation.

        • Null Hypothesis says:

          Kalman filters are state-estimators. They’re what inform control systems. Control systems try to drive you towards a goal – but how do you know if you’re at your goal? You need sensors, and sometimes those sensors are a lot less exact that the speed gauge on your car. So you have to estimate it.

          However, as far as PID goes, you’re quite correct. PID (integral term specifically) is pretty much the most basic form of machine learning (though they never present it that way, so well spotted).

          Consider the control system on something like the cruise control of a car. If you only had the proportional controller, which would apply gas to the engine based on your error from a desired speed, it would never work. Once you got to your target of 60mph let’s say, the car would pull its foot up off the gas and you’d quickly drop down to 55mph or 50mph. Eventually, at some point you’ll reach a steady-state of like 53mph, where the gas the 7mph error demands be applied perfectly balances the drag on the car at 53mph. Go faster and the drag increases and your error, and thus gas to the engine drops, so you slow back down. Go slower and the drag drops and the error-therefore-gas increases and you go back up.

          So it’s nice and stable, but it’s at 53mph instead of 60mph. Adding in an integral component, you save an ever-accumulating variable. So if you were sitting at 7mph-error for a while, the accumulator would get stronger and stronger and add extra gas to the engine. Until eventually the integral term gets strong enough to apply enough gas to keep the car at 60mph without any help from the proportional control. And with the error at zero, the accumulator just sits there holding its value.

          Then if you pick up a tailwind, or start driving up a hill, you’ll start going too fast or too slow with the value in the accumulator, so it will decrease or increase appropriately in response.

          Basically the integral term is asking: “What output do I need to provide constantly to keep the system at the goal?” It’s learning and counteracting the linear-offset of the system, and it adapts as the system changes.

        • Ilya Shpitser says:

          A Kalman filter is just a hidden variable DAG with a dude’s name attached to it, for some reason.

  30. Charlie__ says:

    It’s easy to explain some things and think you’ve found The Explanation. But if you imagine building a brain in your garage, it becomes easier to see where the explanation is only partial, and where some boundaries are, beyond which we are very ignorant.

    Take the structure of the visual cortex. How come everyone has such similar visual cortices? Well, it turns out that a bunch of complicated genes are active in a complicated spatial structure in your head. Your brain changes as it learns, and does a lot of learning of how to see, but it learns how to see in a particular guided way, especially early in life (here is a pretty paper showing gene expression in mouse visual cortices at 0, 14, 28, and 60 days), that ensures that you end up with a fairly standard visual cortex.

    So if you wanted to build a visual cortex in your garage, it’s not enough to just know that it should have hierarchy and try to process the data and make predictions that have low surprise factor, you might have to have a bunch of structure imposed on it from the start that has something to do with what the sensory processing task is like (and machine learning research seems to bear this out). Similar degrees of structure is probably also imposed on other parts of the brain we understand less well – maybe trying to make higher reasoning without understanding this structure would be like trying to use a completely unstructured neural net for vision.

    But back to the visual cortex – how well does predictive processing pin down how much feed-forward versus feedback signalling you should see, and how does this compare to what we actually see? I’m going to guess that it explains a lot, but that those design principles alone aren’t enough to let you build a visual cortex in your garage. Large chunks of our (well, mammals’) early visual processing are mostly feed-forward, taking in sense data and doing some quasi-fixed computation to it, with feedback signals coming down from higher levels of abstraction on a timescale much longer than a single feedforward step. This is in contrast to the predictive processing picture of brain function, where your brain is constantly predicting individual neuron activation levels, even at the lowest levels of abstraction.

    This makes sense – many functions of your visual cortex, just like artificial neural nets for recognizing images, can work pretty well with only feed-forward processing, and predicting everything all the time seems like a huge amount of work, so your brain can and should save resources by not constantly predicting everything. So this implies that to build a visual cortex, we need some understanding that tells us how much prediction to do, and where.

    In addition, there are multiple types of feedback we might be interested in (cool paper about feedback in the visual cortex). The simplest type of feedback follows the same connections and weights as feedforward reasoning. For example, the “cat” concept gets activated in my brain, so I am (theoretically, when applying this kind of feedback) primed to perceive cat-associated high-level percepts like meows and fur-texture, which filters down into predicting certain correlations among sensory neurons, as well as certain average properties like color. But you can also have feedback that takes a “loop” rather than just inverting perception. For example, if it’s bright out, a part of my visual cortex might take on a certain “bright” state, which decreases the activity of the low-level neurons so that their activation stays more similar in bright versus dim conditions. What sets this kind of feedback apart is that the additional connections leading back to the low-level neurons both shortcut the full tree of associations with “brightness,” and encode a specific function that the feedback performs.

    In the unrealistically-pure predictive coding model, prediction and perception share the same neurons. Predictions only flow down while raw perception only flows up, your brain can sort of try to make them meet in the middle, and where the predictions don’t fit the perceptions you update both the prediction and the perception software simultaneously. But if the brain has more types of feedback, and even the first type of feedback isn’t applied constantly and uniformly, then you need rules for how how to learn where feedback is necessary, rules for learning more complicated feedback loops and integrating them into prediction, and rules for updating prediction and perception that work even when you’ve got all this variation and complication.

    And maybe a slightly different set of neural learning rules works best for different functions and different stages of learning, controlled by spatiotemporal patterns of gene expression in the brain. But maybe then, once you’ve got that figured out, maybe then you have The Explanation that can get a human brain from a chimp brain just by scaling up the neocortex (but specifically the neocortex).

    • Ilya Shpitser says:

      Yeah, I don’t know what work this is doing.

      What’s the track record of these types of explanations for Doing Actual Brain Science?

      • sandoratthezoo says:

        This. This sounds like a pretty decent explanation, at least in the broad overview. But didn’t we know the broad overview anyway? That is, the brain predicts things.

        Does the PP theory give us actionable data on what these “levels” are that information passes up and down through? Does it give us interesting falsifiable new hypotheses? I feel like there’s a ton of handwaving in how prediction works in the summary above, at least. Is the explanation for those chemical systems in the brain precise enough to actually plan new therapies, or is it too general?

  31. amoeba says:

    The dalmatian spoiler picture looks weird to me: it totally misses the left hind leg.

  32. Adrià says:

    So are schizophrenic people like over-fitting machine learning classifiers?

  33. WATTA says:

    This post felt like the best insight porn I’ve read here in a long while, perhaps ever. The motor system part especially tickled my fancy as it made me think about mental cues used in barbell movements. The most effective verbal form of communicating the right kind of mental cue seems to descriptions of the consequences of movements instead literally describing the movements. Like “spread the floor” for squatting without collapsing knees and “bend the bar” for bench pressing with the elbows tucked. The subtle “push the floor away instead of pulling the bar up” for deadlifting doesn’t make sense initially as it seems like a useless tautology but does seem to make a difference in making it easier to keep a straight spine when you’re close to not having enough strength to complete the lift.

    I can now picture how the prediction of “the barbell moves up” approaches an inevitable contradiction to the prediction “maintain posture of limbs like this” when certain changes in posture would improve leverages and make it easier to complete the movement (but come at the cost of injury risk). The sensation of your body is screaming at you that this shit is seriously heavy and you need some serious effort or no movement is going to happen. So there’s a battle between two motor programs where both are trying to have their predictions fulfilled and it’s almost but not quite impossible to accomplish both. Maintaining the correct balance in the strength of each prediction is required; you need to really really strongly intend to lift the weight because a maximal effort lift is really damn heavy but at the same time you need to intend to maintain good form even more strongly or that form is sacrificed.

  34. leoboiko says:

    What about transgenderism? I’ve recently came to accept myself as trans, due to undeniable subjective evidence. But I still find the idea frigging weird. If I put on a woman’s coat, if I act in the way I imagine to be “womanly”, or when I’m occasionally read as a woman by other people, I feel instant and total relief of a misery that haunted me for thirty years without me grasping why (I thought it was just part of my personality, a core feature of my identity; I’m “just a gloomy person”, or “congenitally depressed”; I didn’t know I could just go and feel happiness, or emotions in general, and I certainly didn’t expect this much psychological relief merely from the prospect of not crossplaying “man” anymore). And yet, this makes no frigging sense. What counts as “womanly” clothing, gestures, or social treatment is obviously arbitrary, and part of an oppressive, senseless caste system. I am indeed philosophically opposed to the idea that you can define “woman” by those things, and I stand by the validity and importance of challenging gender norms. In transhumanist luxury queer space communist utopia, everyone will just wear whatever clothing they want, right? And yet, in this present, gendered society, I know, from direct experience, that I get immense psychological relief from embodying the arbitrary social construct of “feminine”, and immense misery from its counterpart; but I don’t know why.

    What if, for whatever reason, my brain just latched, early on and irrevocably, on “myself” as (the large cluster of arbitrary signs that construct the social idea of) “female”, and is constantly annoyed when sense data doesn’t match this prediction? So whatever stupid tokens my society declares to be “feminine” are added to the top-down model and contrasted with sensory data, and end up influencing dysphoria or its relief.

    (For the record: I’m blind to the spinning-mask illusion in the GIF above, and apparently this is correlated with transness. I do see the illusion in other versions of it, but not this particular one. When the picture was posted earlier in this blog, I said that I was questioning, but probably not trans (because how can a hairy gorilla like me not be a man?); I now believe that the benefits of transition far outweigh the costs in my case, and even if I don’t have trans midichlorians or whatever, that’s as good a definition of “trans” as any (thanks Ozy).)

    • Sam Reuben says:

      One thing that the naive predictive processing model fails on is that there can be some things which precede any learning. It seems reasonable to say that one of these is sex, on the most biological scale: humans are not born with an articulated goal of “make more humans” and then experiment with different methods for making that happen, but rather have an inherent instinct for which things they want to get busy with (i.e. sexuality). Most every human is born with a fairly coherent concept for “class of thing I want to know in the biblical sense,” be it male-sexed human or female-sexed human, with no learning required.

      The extension of this is, of course, gender: as thinking, social beings, we don’t just view the bodies which we feel lust for as objects (at least, we shouldn’t), but as a given class or kind of being. With two sexes which feature quite high dimorphism, there’s an excellent reason to construct two different categories on the evolutionary, pre-learning scale, even if the categories haven’t yet been filled in.

      I think that this is, roughly speaking, where some aspect of gender dysphoria comes from. Humans are capable of recognizing which sex someone else is, and come with an empty category for each sex. They also have some internal notion of which category they themselves ought to fall into, and then start filling their own empty identity category with the behaviors of people who they recognize as members of that sex and that gender. (We can be reasonably certain that this is a thing people do, because we tend to preferentially imitate those entities which fall into the human category over, say, the bee category.) This is why, for example, gender role models are so critically important for so many people. The system works quite well for most individuals, but for those individuals where gender and sex do not match, you get some problems. Other people recognize their sex, and categorize them with the corresponding gender, and so they receive societal pressure to adhere to a category which they do not identify with. Understandably, this makes them very, very unhappy. (I think that this kind of unhappiness fits for every case of adhering-to-non-identified-category, incidentally, from work to public image to anything else you can think of, but is likely strongest for the extremely strong category of gender.)

      So that would be my explanation. I don’t think the predictive processing model, in its naive form, perfectly explains everything our brains do. There do seem to be some pre-learning categories which we have, and gender is a prime candidate. There’s likely some amount of top-downness that factors in with this, as well as strong or weak identification with gender categories, and so much more that can only be speculated on, but that would be why those arbitrary societal tokens matter so much to you – not purely for their own sake, but because you’re being identified by others and yourself as a member of that critical and personal identity group of woman.

      (On the side: hope the transition goes well! I know the kind of relief that comes from changing lifestyles to something that suits you properly, although not with regards to gender, and it’s immense. Glad you can get rid of that psychological stress.)

    • ksvanhorn says:

      “What counts as “womanly” clothing, gestures, or social treatment is obviously arbitrary, and part of an oppressive, senseless caste system.”

      Have you considered the possibility that the sense of relief you feel when presenting as a woman is due to social programming that defines masculinity as shameful and portrays men as morally inferior to women? We live in a culture that bombards one with messages that one can be proud to be a woman, but must be ashamed of being a man.

      I’m not saying this is the root cause in most cases of MtoF transgenderism, but your specific words suggest this possibility.

      • Ozy Frantz says:

        I have not noticed any obvious trends in how commonly trans people of a particular birth assignment have sentiments like “what counts as gendered clothing, gestures, and social treatment is obviously arbitrary and part of an oppressive, senseless caste system.” (There is, however, a consistent and adorable tendency among trans people to declare that the gender they were assigned at birth is the worst and any sensible person will obviously identify with the other gender because it is so much better.)

        • andrewflicker says:

          Speaking as a cis-het guy: Lots of us are also big on the “women are basically just better” train, without it turning into an identification problem. I’m totally a guy, have no particular innate desire to be a woman, but women are obviously better- they have direct reproductive capability, better resistance to various health problems, longer lives, are tremendously more physically attractive, on average like me more, etc., etc. (All very much tongue in cheek, of course. Some of these perceptions get generated *because* I’m a cis-het guy!)

          • sandoratthezoo says:

            Do you actually think this or are you purely making a joke?

            As a cissexual man, speaking purely of biological and not social differences, I’d call:

            The Advantages of Maleness

            * Physical strength
            * No 30 year long period of monthly cramping, bleeding, mood swings etc.
            * A lower metabolic load to carry in terms of reproductive organs and no prioritization of keeping those organs stable: ie, I’m not cold all the time.
            * Breasts, while attractive, seem pretty inconvenient.
            * I can have children without the discomfort of pregnancy or the extreme discomfort/disability of delivery
            * Sexual gratification seems way easier for guys

            The Advantages of Femaleness

            * Longevity/generally lower mortality is obviously a huge one
            * Penises and testicles seem pretty inconvenient.
            * Children naturally (?) love mom more than dad, this is rewarding
            * Maybe there is more potential at the upper end for sexual gratification?

            Differences that are not clearly advantages or disadvantages

            * Whatever things/people mental differences
            * Height/size seems like a mixed bag

          • Controls Freak says:

            The important thing is that once we’ve noticed that there are some good things about being male and some good things about being female, we can immediately conclude, “The Patriarchy hurts everyone.”

          • hlynkacg says:

            Do you actually think this or are you purely making a joke?

            I can’t speak for andrewflicker, but I don’t think it’s a joke. Men are disposable, women aren’t.

        • Nancy Lebovitz says:

          I believe that there is no such thing as dressing like a woman or a man, there is only dressing like a woman or a man of a particular culture, and so far as I know, both trans and cis people mostly get imprinted on the culture they grew up in.

          (This is a stored riff. I hope it fits into the conversation.)

      • leoboiko says:

        I was raised in a Latino macho culture. I was told how femaleness and mariquice is petty and inferior and how manliness is great and noble all my life. Anything “manly” was praised and eulogized, while any slight hint of femininity was derided with humiliation, bullying, death threats and, in a few memorable cases, physical violence. Wife-beating is routine in my country—I’ve watched it first-hand—as well as punitive man-on-woman honor killings, with a large section of the public opinion finding them deserved.

        I have nonetheless all my life felt miserable at having to perform as a man, even though I have the privilege of looking like a pretty one. So no, I don’t think 11-year-old me felt compelled to terrifiedly crossdress in secret because I heard bad things about masculinity. There were no bad things being said about manliness in my culture at that time, and there was every possible social incentive imaginable to man up and be macho, including the literal and very real threat of death. (we’re still #1 in murdering trans women; I live abroad now, and I’m never going back if I can help it).

        • ksvanhorn says:

          leoboiko: I’m sorry to hear about the pain you’ve experienced. My comment doesn’t apply to you. Mainstream American culture and Latin macho culture are very different things.

    • Some Troll's Serious Discussion Alt says:

      In transhumanist luxury queer space communist utopia, everyone will just wear whatever clothing they want, right?

      And the clothing they want will… probably be pretty gendered. Particulars of expression might be socially moderated, but I think that we shall have to face the horrifying truth that, well, gender really exists and men and women are different.

      • carvenvisage says:

        Technical point: Op was probably joking (“luxury queer space communist utopia”), so you’re probably the first to prophecise on that point.

      • Vamair says:

        I’ve always thought that in a transhumanist utopia they/we will wear any bodies we want to, maybe with a set of favorite ones (this large female house cat, that guy and the purple bucket-on-wheels at the corner are my favs). Maybe a few at a time. Clothing? Meh!

    • carvenvisage says:

      I feel pretty amazing if I go out barefoot in a dressing gown, especially if in winter or a place where there’s glass in the streets (easy to dodge, try this at home!). Any possibility it’s something like that?

      Like, probably most people don’t get much chance to assert I CAN DO WHAT I WANT (so long as it’s harmless) in their lives, so I would expect things that tell yourself that to feel pretty good pretty often. (Also some at least temporarily burn your not-feeling-amazing bridges somewhat.)


    • eyeballfrog says:

      >And yet, this makes no frigging sense. What counts as “womanly” clothing, gestures, or social treatment is obviously arbitrary, and part of an oppressive, senseless caste system. I am indeed philosophically opposed to the idea that you can define “woman” by those things, and I stand by the validity and importance of challenging gender norms.

      Have you considered that perhaps you are wrong about gender norms, and that they are not arbitrary?

      • carvenvisage says:

        Aha, but have you considered that he might not be wrong about gender norms, and they might not… not, be arbitrary?

        -That perhaps the contrary hypothesis could prove the more accurate?

        *adjusts monocle*

        • eyeballfrog says:

          My point is not that the poster is right or wrong. My point is that the person is strangely certain of a position that admittedly directly contradicts lived experience.

      • Dedicating Ruckus says:

        There is clearly some component of modern gender presentation norms that is arbitrary/wholly culturally determined, and some component that is a natural consequence of ground-level biological reality, and these two poles probably define a spectrum that shades from one to the other.

        For example of wholly arbitrary components, you could see the association of color with gender, nowadays pink/blue as feminine/masculine. Potentially, also, in languages such as Japanese where there is a strong gendered component to speaking styles, the precise details of those differences.

        For something in the middle, you could think about skirts vs. pants as gender marker. There doesn’t seem to be any necessary biological connection to the topology of your lower garment, but it would make sense e.g. for women’s garments generally to have a greater emphasis on aesthetic presentation, and men’s garments to be less prone to restrict dramatic movement.

        Examples of mostly biologically coded aspects of gender presentation are of course pretty easy to come up with.

        • INH5 says:

          For something in the middle, you could think about skirts vs. pants as gender marker. There doesn’t seem to be any necessary biological connection to the topology of your lower garment, but it would make sense e.g. for women’s garments generally to have a greater emphasis on aesthetic presentation, and men’s garments to be less prone to restrict dramatic movement.

          It might seem that way to we who live in 21st-century Western cultures (or at least I assume that you do), but only a couple hundred years ago it was quite common for upper-class man to dress in clothing that was even more impractical and aesthetically oriented than modern woman’s clothing. Louis XIV was particularly famous for this but, speaking as someone who took 3 Art History courses in college, he was far from an isolated example.

      • Nornagest says:

        Well, there’s a fair amount of cross-cultural variability in gender norms: except on a very few points (it’s pretty hard to find e.g. a true matriarchy or a society where women do most of the fighting), it’s possible to find a set of gender norms for just about any set of criteria you can imagine. Sometimes quite unlike ours: it’s not hard to find one where e.g. women are considered the more lustful sex or men the more flamboyantly expressive. So I’m willing to consider the idea that most of our specific gender norms are arbitrary in that sense.

        But I’m not aware of any societies that don’t have gender norms at all. The idea of differentiating male and female (and sometimes one or two other or intermediate genders) based on personal presentation really does look like a cultural universal, and I’m quite skeptical of damning any apparent cultural universal as senseless or oppressive. Anything that common is usually serving a pretty basic cultural need.

    • INH5 says:

      I find it very plausible that there are biologically defined categories of “man” and “woman,” and an at least partially biologically based sense of identifying with one category or the other (or neither, or something completely different, etc.), but that the traits associated with each category are to some degree culturally defined. So we hear some transgender Muslims say that they feel dysphoria relating to wearing or not wearing hijab, but I highly doubt that a transgender person from a Western Christian background would be likely to feel dysphoria relating to wearing or not wearing a scarf over their head to keep warm during the winter.

  35. lil_sebastian says:

    For a less technical, more popular review of this subject, take a look at: “Making up the mind” by Chris Frith.

  36. timujin says:

    This is really interesting. As a practicing tulpamancer, this matches my experience with trying to force the top-down assumptions to do a hard override on the bottom-up data. I’m unusually bad at it, as in “after working my ass on it for years, I can reach about the same results as all my friends can in a month, and I’m pretty sure it’s not just a reporting issue”. I am also having trouble reading the jumbled-up word mess (it’s possible, but not easy), can’t recognize the plane in the video (all I see is blinking lights moving in random directions with no rhyme or reason), don’t have any problems at all with the Stroop effect task, and am unusually bad at recognizing spoken words despite having (measurably) very acute hearing.

    Could it be that my top-down processes are just not strong or confident enough, and even a pretty weak bottom-up data makes them reconsider all their assumptions? Does my brain have a particularly strong affinity for the Virtue of Lightness, to the point when the lightest push of evidence can overcome my priors?

  37. Baeraad says:

    Heh! When I was reading through the article, I was actually thinking, “say, wouldn’t this go a long way towards explaining autism?” And then I got to the section where it turns out to explain autism. How’s that for prediction?

    Seriously, speaking as an autistic person, I’d say that “being in a near-constant state of surprisal” sounds a lot like how I experience my condition. Very interesting.

  38. Nancy Lebovitz says:

    Data point on the dalmation: Last night, I couldn’t make it out at all, even though I’ve seen it before, possibly in Goedel, Escher, Bach. I was wondering whether it was a low-quality copy. This morning, the dog popped out immediately.

    • Furslid says:

      I recognized the cow picture as one I’d seen before. Then I was looking at the dalmatian picture. When it was starting to come into focus, I spotted the quadruped shape. For a few seconds I thought “Oh another low res cow picture” before I realized the proportions weren’t right for a cow.

  39. Alejandro says:

    Second, low-precision sense data might contradict high-precision predictions. The Bayesian math will conclude that the predictions are still probably right, but the sense data are wrong. The lower levels will “cook the books” – rewrite the sense data to make it look as predicted – and then continue to be quiet and signal that all is well. The higher levels continue to stick to their predictions.

    Third, there might be some unresolvable conflict between high-precision sense-data and predictions. The Bayesian math will indicate that the predictions are probably wrong. The neurons involved will fire, indicating “surprisal” – a gratuitiously-technical neuroscience term for surprise.

    In the second paragraph I read “surprisal” as “surprise!” the first time. Then I did two more rereads trying to understand what kind of weird meta-joke was saying that “surprise is a gratuitiously-technical neuroscience term for surprise” before finally reading it correctly. So my brain helpfully exemplified exactly what the paragraphs are talking about – taking first the option of the first paragraph to rewrite “al” subconsciously as the expected “e!”, then stopping on its tracks when that lead to an unavoidable higher level conflict.

  40. Enkidum says:

    Suggested further reading (both of which are much easier going):

    Consciousness and the Social Brain by Graziano, which tries to solve the (easy) problem of consciousness by treating it as a predictive model of oneself.

    On Intelligence by Jeff Hawkins, which is a pop treatment of the mind as a prediction engine.

    I’d say you’re a little over-optimistic about this as a pure unifying theory, but you’re absolutely correct that it is what everyone in neuroscience is doing these days.

    • timujin says:

      What’s the difference betweem the (easy) problem of consciousness and (hard?) problem of consciousness?

      • Enkidum says:

        The distinction comes from David Chalmers, it’s very google-able. In a nutshell, the so-called “easy” problem is identifying the neural/algorithmic/whatever correlates of consciousness. This is obviously not easy at all, but it is, he argues, much easier than the “hard” problem, which is to say the problem of figuring out how matter could have anything like conscious states. People like Colin McGinn (who I personally think is a fucking hack, but that’s neither here nor there) would argue that the hard problem is so hard that’s it’s literally impossible for us.

        Anyways, Graziano explicitly leaves aside the hard problem, and simply focusses on identifying what he believes is the type of neural processing that is involved in consciousness, namely forward predictions of one’s own social and bodily states.

    • drossbucket says:

      Kant could have done with some snappier terminology, to be fair.

      • Sam Reuben says:

        Kant was an undeniable genius with an undeniable problem with writing clearly.

        • drossbucket says:

          I found the Critique a bit more readable than I expected when I tried as a student (not that I read it all), but in retrospect this was because I assumed Kant was saying something he probably wasn’t.

          I’d just read a bunch of pop psych books, mostly Dennett and Pinker, and I rounded him off to some kind of modern cognitive science idea, a bit like this top-down processing thing (with some ‘hyperprior’ like idea for the completely fixed a priori ‘forms of thought’).

          Also I just had really low expectations – the introduction of the copy I had made him out to be tremendously obscure and difficult, so I was expecting something basically unreadable, and it was just hard. IIRC they were some sort of positivist-era analytic philosopher, and Kant’s view looks completely incompatible with the really radical bottom-up-only style of empiricism, so maybe they overstated it.

    • Eli says:

      Predictive processing tends to slur the difference between a hyperprior (a deeply hierarchical prior, which can be updated from data), and the generative model itself (which is, as you would say, the necessary condition of any possible probability assignment), particularly its topology.

      In symbols, we’d say that P(H) is a prior, but really we should write P(H | m), where m designates that we’re assuming the form of the generative model. The assumed form of the generative model is, as you say, a necessary condition of any possible Predictive Processing.

      Ultimately we’d like to assume some universal class of generative models that allow for hierarchical, high-dimensional prediction, like probabilistic neural networks or probabilistic programs.

      (In computer science terms, any of these is equivalent to stochastic Turing machines which print out a tuple of outputs. However, in probability terms, by forgetting the dimensionality of the observable sample space when we’ve actually eliminated information the generative model needs to learn.)

      The ultimate goal is to minimize the KL-divergence from the empirical distribution of observed sensory samples, after all, so really we want to say, “the necessary condition of any possible predictive processing is a homomorphism of physically possible sensory signals into physically realizable generative models.”

  41. Douglas Knight says:

    What is the theory?

    Of course, you could have a theory as simple as “It is good to think about prediction error.” But you already had that theory. You seem to claim that this book contains a new theory, different from what you have encountered before. But I don’t see that reflected in the post. It just seems like more examples. Some of the other posts were about reading people who claimed to have a theory, but they didn’t seem to convince you that they really did, while this author did convince you. I am suspicious that the difference is just that Clark was more insistent that he had a theory, or even just that he had more pages to repeat that claim.

    • Eli says:

      Clark covered what is predicted, as well as how prediction-error is minimized through action, not just Bayesian updating.

      • Douglas Knight says:

        So what does Clark say is predicted?
        Did Scott mention that in this post?
        If not, why not? Does Scott not think that is a key improvement?
        If so, why does this post not look any different to me than the others?

  42. drossbucket says:

    The top-down stream starts with everything you know about the world, all your best heuristics, all your priors, everything that’s ever happened to you before – everything from “solid objects can’t pass through one another” to “e=mc^2” to “that guy in the blue uniform is probably a policeman”.

    … That’s a lot of things. Does Clark have a story on how this is narrowed down to a reasonable sized hypothesis space?

    Also, I was wondering basically the same thing as Douglas Knight. This sounds like the same top-down bottom-up thing you always talk about in your “the the” posts. Not that I’m complaining, it’s interesting, but you seem more excited this time. Is there any qualitatively new addition you got from the book that I’m missing?

    • Eli says:

      … That’s a lot of things. Does Clark have a story on how this is narrowed down to a reasonable sized hypothesis space?

      No, but other people working in similar research subfields have some pretty strong ideas:

      1) Hierarchical models make the top levels easier to learn. Think of seeing what you think is a dog, but it could also be a cardboard cutout. Then it barks. The evidence came from two different sensory modalities, and was therefore stronger.

      2) Bayesian nonparametrics provide “infinite” hypothesis spaces that are nonetheless designed to concentrate quickly around previously-observed hypotheses.

  43. precipicebeholder says:

    Anything useful this theory might bring to understanding and/or compensating for ADD/ADHD? I’m always on the lookout for new mental models that can be potentially benificial.

    • armorsmith42 says:

      If you ctrl-f for my name, I replied further down but, as someone who may or may not have ADHD (I really don’t think I’ll ever be able to tell), this explains why putting on House music or Gangnam Style on loop helps me to concentrate.

      I wonder if the way to apply this theory to ADHD would be to start pondering the relationship between video-games, hyperfocus on video games, and surprisal.

  44. Eli says:

    I’m pretty sure I was the one nagging you to review this book, so: THAAAAAAAAAANKS!


    It’s also your book if you want to learn about predictive processing at all, since as far as I know this is the only existing book-length treatment of the subject.

    Jakob Hohwy’s The Predictive Mind gives another book-length treatment of the subject, though focusing more on applying it to traditional Philosophy of Mind problems than to showing a broad vision of mental function the way Clark has.

  45. KC says:

    Hey, regarding Asch conformity experiment you wrote “usually they would disbelieve their eyes and give the same wrong answer as the confederates,” but the Wikipedia article says only 36.8% of the responses in the experiment condition were wrong, with 75% of subjects giving at least one wrong response (across 12 trials per subject that had the confederates). I think “sometimes” would be a more accurate description of this rate of conformity.

    (Tried to post this once before and it got lost? Sorry if it posts twice.)

  46. nestorr says:

    I’ve long been aware that movement is closely tied to memory of movement, it’s especially clear when you’re typing and you miss a key – you don’t have to see the typo to know you made a mistake, you “feel it” in your fingers because the feedback doesn’t match the expectation.

    My metaphor for the brain’s control over the body is a huge organ/keyboard with thousands of buttons shrouded in darkness, the keys only become illuminated when you press them and find out what noise they make so you have to remember the sequence of noises to guide you to press the right buttons. And out in the real world, you’re riding a bike, or balancing over a rope.

    As for tickling, unfortunately for me of late it feels like I’m tickling myself even when other people do it. What do you call that? (A side effect of long term anhedonia, I think)

  47. baconbacon says:

    So what do small children experience? What about animals who are born able to walk/run within a short period of time?

    • Paul Brinkley says:

      I was going to ask something similar. What does this model look like in animals in general? Particularly mammals? And particularly the near-sophonts (dolphins, chimps, corvids, octopi)?

      Is there a part of the neurotransmission system that is more missing than others? Or some part of the top-down predictive system, maybe?

  48. Ezra says:

    This model would seem to predict that things in dreams should seem unsurprising, since it’s always exactly what the top-down stream predicts.

    • sandoratthezoo says:

      Hmmm, are you surprised in your dreams? I don’t think that I am. That’s kind of the classic characteristic of dreams, right? No matter how incredibly nonsensical they are, you take things in stride. “Yeah, of course I’m a giant undead tortoise.”

      Even in nightmares, I don’t think I’ve ever experienced a jump-scare. They’re more in the tune of “inevitable predictable but unescapable horror.”

      • Randy M says:

        You’ve heard the term “startled awake” before, no?

        • sandoratthezoo says:

          I’ve never been startled awake by a dream.

          I’ve been startled awake by a sense of falling, but I don’t think that comes from dreams.

          All other times I’ve been startled awake, it’s come from an external stimulus. Is that not typical?

          • Randy M says:

            I’ve been startled awake by a sense of falling, but I don’t think that comes from dreams.

            It’s hard (at least for me) to separate what comes from a dream or from the dreaming minds incorporation of stimuli, but this is what I associate with dream surprise too. It’s not always about falling, but there is a physicality to the sensation more than a curiosity-based surprised. Less a “huh?” and more a “Ah!”.

      • Edward Scizorhands says:

        With my dreams, something short-circuits things that ought to be issuing corrections.

        “Oh, the Sunday comics are filled with swear words? This is just like when I dream. Weird that it’s happened in real life, finally.”

        “Oh, I can stop gravity from making me hit the ground after jumping just by willing it so? This, too, is just like when I dream. And every time it happens I think I’m not dreaming when it turns out I am, but now it’s real life. Okay, let’s get stuff done.”

      • nestorr says:

        I dreamt I was trapped in a subway door and the train pulled me towards the mouth of the tunnel, at the moment I was to be crushed my brain decided to swap the bone crushing for a schmaltzy “Out of order” TV screen and I woke up with a start.

  49. Sam Reuben says:

    This is good stuff – I’ve produced a similar thesis, although more focused on conscious reflection than low-level automatic organization. There are a few areas where it ought to be fleshed out, though. Kant’s Transcendental Aesthetic and Transcendental Deduction (from the Critique of Pure Reason) give more insight on what’s needed to provide for a blank slate, for starters, and you can see the same ideas used here being bandied about in different language (rather than talking about upstream/downstream, Kant talks about the givenness and cognition of empirical objects, for example). All in all, solid work, although as usual these modern ideas would have a much easier time getting off the ground if they started their investigation by seeing what some of the smartest people from past generations had to say when confronted with the same exact issues. (For reference, Kant was justifying the new fad of empirical science, just as this brain model attempts to describe our ability to learn and discover despite previous biases.)

    Also worth noting, as is done very tersely in this essay, is how much humans rely on other minds for concept-and-data verification. In most situations, if some very surprising piece of data is received, the first response of any human will be to look to other nearby humans to check and see if they really did just receive that piece of data. Theories or guesses about how things internally work (concepts) are tested on other humans in the same way – for reference, see this blog. The reason for this is pretty simple: it’s just outsourcing these things. If one upstream-downstream comes to a conclusion, it could be deceived; if all of them do, you’ve got a very convincing case. (This is also how echo chambers can be unintentionally built, and why free speech is so valuable as an escape vector for those epistemic traps.) In other words, what Scott has relayed, and perhaps the book itself, is a very good description of the vertical processes used to acquire and verify data, but has left out the horizontal processes.

    (As another side note: where most attempts to convince people seem to go wrong is in leaning too hard on horizontal communication and not enough on vertical. That is, a particular level of data is communicated between people, but not the key elements of vertical action that resulted in their respective versions. This is most painful when talking about things like free will.)

    • romeostevens says:

      This indicates that the weird modern phenomenon of monocultures as people with similar cognitive architecture conglomerate is pretty bad. We assume we’re getting the same level of error checking we’d be getting in a tribe of diverse enough brain architectures to fill all the niches needed for a fully functioning collection of humans. Actually we’re just double counting evidence.

  50. Eli says:

    Predictive processing is sympathetic to all this. It takes all of this stuff like priming and the placebo effect, and it predicts it handily. But it doesn’t give up. It (theoretically) puts it all on a sound mathematical footing, explaining exactly how much our expectations should shape our reality, and in which ways our expectation should shape our reality. I feel like someone armed with predictive processing and a bit of luck should have been able to predict that placebo effect and basic priming would work, but stereotype threat and social priming wouldn’t. Maybe this is total retrodictive cheating. But I feel like it should be possible.

    There’s a funny mathematical move here. Predictive processing says that our perceptions (and motor actions) should obey Bayes’ Law. It also says that, in the limit as precision rises, almost anything can be Bayesian evidence about anything else — as long as there’s some sensorimotor correlation.

    It’s also not a complete process model for a lot of stuff, but that requires getting into the action half.

    The really short version of the active half is: if your joints are out of place, move them, and then extend the same principle to everything else you can affect at all. If your airplane is out of place, because take-off time is passing and you should be rising into the air now, just pilot the airplane! This lets all your built-up predictions and expectations about what makes airplanes move do the work of planning how you pilot an airplane.

    Then we have the question: well, if the airplane’s taking off, why not just adjust our expectations to believe that planes, once upon the tarmac, never take off? This culminates in the Dark Room Problem: why do anything at all, instead of just sitting around expecting your own inevitable death?

    Predictive Processing provides few to no clear answers other than, “Well I guess there really are some innate hyperpriors in there somewhere” and “Well, the precision on the expectation of moving must be higher.” This is because the neuroscience PI behind Predictive Processing, Karl Friston, denies the existence of value. He fully denies that value, valence, or affect has any core representation in the brain: dopamine encodes precision, he says, and nothing to do with this fake thing “reward” that humans invented. This is despite the fact that other researchers claim your visual cortex literally predicts how beautiful things are as part of perceiving them. Again, Predictive Processing in its most ambitious form denies the existence of value as a quantity in the brain, considering it only a conscious-level subjective epiphenomenon we use to model ourselves.

    Italics because that’s really freaking surprising, and as enthusiastic as Andy Clark is, in his most recent work on Predictive Processing (“Happily Entangled”), he doesn’t quite buy it. I think it’s a big fat unexplained phenomenon, ripe for further investigation, because affective value is exactly the quantity we need to dictate the balance between perceptual and active inference. When should we adjust our expectations, and when should we act to change the world? Well, it seems like an affectively charged, high-value predictive representation ought (in a philosophical sense) to override mere sense data, leading us to bring the two together by acting.

    • Sam Reuben says:

      It’s funny, because value exists outside of minds. There’s such a thing as a stable chemical state, or a species selected for through evolution, or path of least resistance for electricity, or… you get the idea. It’s going too far to say that it’s mind-guided value in the universe, but there’s definitely a preference for one thing over another thing existing.

      And yet there’s supposed to be no preference for one state of mind existing over another state of mind?

      • Eli says:

        And yet there’s supposed to be no preference for one state of mind existing over another state of mind?

        The precise claim is that the imperative is to minimize prediction error, period. That just means the preference is for accurate expectations, whether confirmed through action or updated through perception.

        Karl Friston is playing a trick there: if you can read the math, you notice that under the “active inference” algorithm for “self-organizing action”, you can specify a prior corresponding to any possible preference over distal states. The math really just says, “use perception and action to optimally achieve any possible sampling frequency of observations.”

        Since the “prior” specifying preference or goal has to come from somewhere, the theory is just completely failing to address value or affect as such.

  51. Mark Dominus says:

    Daniel Dennett describes this theory in his 1992 book Consciousness Explained. From the introduction:

    It is widely held that human vision, for instance, cannot be explained as an entirely “data-driven” or “bottom-up” process, but needs, at the highest levels, to be supplemented by a few “expectation-driven” rounds of hypothesis testing (or something analogous to hypothesis testing). Another member of the family is the “analysis-by-synthesis” model of perception that also supposes that perceptions are built up in a process that weaves back and forth between centrally generated expectations, on the one hand, and confirmations (and disconfirmations) arising from the periphery on the other hand (e.g. Neisser, 1967).

    The introduction alone connects this with hallucinations (how are they even possible?), dreams, and phantom limb syndrome (compare your points about tickling).

    It continues later:

    They key element in our various explanations of how hallucinations and dreams are possible at all was the theme that the only work that the brain must do is whatever it takes to assuage epistemic hunger—to satisfy “curiosity” in all its forms.

    It might be interesting to compare the 1992 Dennett version with the much more developed version of 25 years later.

    The book explores this in considerable detail, and, as the quote above shows, the idea was being developed as early as 1967.

  52. ajfirecracker says:

    You should read some James J Gibson, I recommend “The Senses considered as Perceptual Systems”

  53. mnarayan01 says:

    You can plbabory raed tihs pttery wlel eevn tguohh it’s all jelbmud up

    • Jonas says:

      For fun, sorting each word alphabetically:

      ouy acn abblopry ader hist eprtty ellw eenv ghhotu ‘ist all bdejlmu pu

      For more fun, the size of each anagram equivalence class, in my random word list:

      1: 89371 (the word is only an anagram of itself)
      2: 3739 (it’s an anagram of one other word)
      3: 553
      4: 118
      5: 27
      6: 7 (caret, abets, drapes, least, palest, opts, lapse are each anagrams of 5 other words)
      7: 2 (carets, caster, caters, crates, reacts, recast, traces and pares, parse, pears, rapes, reaps, spare, spear)

      Eyeballing here, sorting the letters of a word almost never destroys uniqueness, i.e. the function from word to sorted bundle of letters is almost injective. Yet the sorted version is much harder to read (at least for me) than the slightly disordered one. Does anyone know why that would be? Larger Levenshtein distance, perhaps—i.e. there’s more corrective work for the brain to do?

      Python snippets:

      ' '.join(''.join(sorted(s)) for s in "you can probably read this pretty well even though it's all jumbled up".split(' '))

      Counter(sorted(Counter(''.join(sorted(s)) for s in file('/usr/share/dict/words')).values()))

  54. ksvanhorn says:

    “Clark says that some hyperpriors might be innate – but says they don’t have to be, since PP is strong enough to learn them on its own if it has to.”

    Clark is wrong: at some level your ultimate hyperpriors have to be innate. This has been proven mathematically. It’s a well-known result in statistics / machine-learning that tabula rasa learning is impossible — you have to have something to bootstrap off of, although that “something” may be nothing more than an expectation of some sort of regularity (or smoothness, for predicting continuous outcomes from continuous predictors).

    The simplest way to see this is to consider supervised learning of an arbitrary function f that takes N boolean inputs and produces one boolean output. If you have examples (x1,f(x1)), …, (xn, f(xn)), and now you want to predict f(x0) for some previously-unseen x0, you have a problem: exactly half of the functions consistent with your examples output a 0 for x0, and half of these output a 1 for x0. There’s no way to resolve this unless you have some sort of prior information that makes some functions more plausible a priori than others.

    • romeostevens says:

      Some of the initial hyperpriors needing to be arrived at via random walk would help explain the massive amounts of time life spent being very simple in earth’s history.

      • Sam Reuben says:

        That’s evolutionary time, though, not individual learning time. The claim remains that although random walk can lead to the generation of entities with given “hyperpriors” (in my day we called them “innate ideas,” sort of), individual minds seem incapable of generating their own hyperpriors.

    • Eli says:

      Go read about “Bayesian nonparametrics”. They allow for simplicity-biased models with infinite possible complexity, trading off based on the data.

      • ksvanhorn says:

        Sure, I know something about that subject. Bayesian nonparametrics are not an example of tabular rasa learning. In the case of Gaussian processes, the covariance function constrains what functions are and are not plausible — some of the popular ones amount to a constraint that smoother functions are more plausible.

        • Eli says:

          Oh, I wasn’t saying they’re tabula rasa learning. Look, I got taught about the No Free Lunch Theorem in my first two weeks of ML class, and had to follow through the proof myself. No such thing as “tabula rasa” learning in the sense of starting from assumptions too weak to deny No Free Lunch.

          On the other hand, the assumptions made by No Free Lunch just aren’t physically realistic: the physical world doesn’t embody “arbitrary” (randomized continuous-nowhere) functions, instead it usually behaves according to fairly simple continuous laws.

  55. lifetilt says:

    This all makes loads of sense.

    I have been able to lucid dream a few times and and I had a few interesting experiences that, in restrospect, fit the model.

    The first one, I willed myself to look at the wallpaper. I found that I could see it in perfect detail, all the colors and textures and imperfections. I could even zoom in, I was able to see the wall as if through a microscope from yards away. I remember marveling that my subconscious could simulate the world in such “high-rez”. I remember thinking that my brain must have some serious hardware to be able to simulate with that kind of fidelity.

    The second one, there was a big book on the table and I got the idea to flip through it as fast as I could. I remember seeing pages fly by, fully illuminated and illustrated and in intricate detail, and with not even a sense of motion blur. This one seriously weirded me out. I remember thinking, “No way, dude. There is no way my brain contains a random book generator capable of working that fast. Something is screwy.”

    After the second one, I started to theorize that probably my brain was tricking me into thinking I saw these things in great detail rather than actually firing up a random number generator. Now after reading about predictive processing, everything makes total sense.

    It’s a bit of a bummer though. I was excited for the brief time I thought my brain might have the untapped potential to generate entire books in seconds.

  56. albertborrow says:

    How did something arose out of nothing?

    Assuming this isn’t a meme, I’m pretty sure this is a tense error in the first paragraph. The proper phrase would be: “How did something arise out of nothing?”

  57. MartMart says:

    Firstly: Thanks for dumbing down the book to a readable level, I find this fascinating and I’m really looking forward to the section on motor control.
    Secondly: In a previous post, I seem to remember that autism may be caused by an overactive alarm on the comparison between the top down and bottom up streams (alarming where the results are actually close enough, and do not require an alarm), where as schizophrenia was the result of overactive “cooking the books” mechanism (where the top down adjust the bottom up stream to suit itself where it should actually be alarming and saying that things don’t fit). This describes the problem with autism not in the alarming mechanism, but rather the prediction themselves being too particular (maybe the prediction confidence being set too high?), and schizophrenia as something I don’t entirely understand. Am I correct to read that the description changed?
    Thirdly: can the same info comparison mechanism be used to explain higher level delusions? For example, if my prior (top down stream) has it that my favorite youtube celebrity cat doesn’t poop, and I’m presented with a video of said cat pooping (bottom up stream) then my mind, wishing to minimize alarm, and able to convince itself that the problem isn’t that important, will choose to massage the bottom up stream, but deciding the video is fake (or something similar), rather than challenging my assumption that the cat may not be as awesome as I thought, and then write in the “videos of cat pooping (and similar) are often faked by bad people” into my priors? If so, then it suggest that the way to push people away from conspiracies is to make mountains out of molehills.

  58. wpidentity says:

    As an occasional sufferer of visual migraine headaches, these would fit into this model quite well as a temporarily unresolvable contradiction with consequent pain, resolvable only by shutting down the bottoms-up data for a reboot: closing eyes, quiet room, etc.

  59. Squirrel of Doom says:

    I think of myself as part monkey in some ways.

    Like for throwing and catching things, I just think “let the monkey do it”, and watch in slight awe as I catch a ball or hit a target with no conscious effort.

    Reading this, it does sound a lot like the motor system adjusting itself to me predicting a ball being in my hand or at the target.

    • Squirrel of Doom says:

      Thinking a little more, I wonder if this has anything to do with the whole “visualize success” self help world?

      Maybe, if done right by the right kind of person, you can plug into these systems and have your brain somehow work backwards so the change you wanted appears in the world, kinda like how the tennis ball effortlessly appears in my hand?

  60. CyberByte says:

    /u/nonnynonnynonnynonny on Reddit asked if someone could post this here to preserve their anonymity.

    Could someone post this as a comment to the SSC post (I don’t want this associated with my real name for obvious reasons and it’s really annoying to make a fake account on SSC proper. Responses here welcome too.)

    I struggle with anxiety, especially social anxiety (though plenty of non-social situations produce it as well!). A few of the symptoms that I have habitually defined as part of that anxiety now worry me–hah–more, because they seem related to this perceptual cascade of prediction.

    Example: an acquaintance texted me “Are you at home?” a week ago. I replied “No, but I’m on my way there…what’s up?”. She didn’t reply. My immediate conclusion was that she wanted to have me arrested for $DEANONYMIZING_BUT_MOSTLY_TRIVIAL_REASON and was trying to figure out if the cops should be sent to my house. (Turns out her mom called with a family crisis, hence the no reply.) In system 2 I knew this was obviously ridiculous and unlikely but part of me was a bit terrified for the next 4-6 hours that my acquaintance had it in for me. Maybe this is just catastrophizing, which plenty of CBT talks about, but it also seems related to taking some very small datum from the bottom-up system and filling in a very unlikely top-down prediction, which part of me assigns unreasonably high probability to.

    Example two, more directly connected: I worry a lot that people (strangers or very distant acquaintances) are talking about me (in a negative way.) Unrelated to the anxiety, I have somewhat poor hearing, especially for speech in vocally crowded situations; it’s difficult for me to follow conversations with people who don’t speak clearly, doubly so at parties or loud bars. Put those two things together: I start to worry that Alice and Bob’s conversation is about how much they hate me having done X…and I _swear_ I start to hear them use my name or say things about me. This makes a lot of sense with bidirectional predictions: I’m filtering very noisy sound data; I have a top-down theory (they’re throwing shade at me); I find some set of phonemes about me that seem to match the heard sounds. Maybe this makes it less bad and I should worry less, but having read this post I now worry more (oh good!)–because this seems worryingly close to the model of schizophrenic perception.

    (No, I can’t tickle myself. Nor have I knowingly had any classically schizophrenic experience, or anything I know to be a major delusion…though would I say that if I had?) Are these experiences that are typical for highly anxious people? Or should I be worrying I’m having actual delusions?

    Putting aside the medical-advice-please part of this, I wonder what everyone thinks of my way of matching anxiety to this model.

    • Aapje says:

      I think that there is substantial bias towards considering dangerous hypothesis more likely, because it really pays off to avoid dying from relatively rare events which occur once every 20 years. Kids often have this to an extreme extent, which is why they often fear monsters under their bed and such. It is likely that a decent number of adults also have this to an extreme extent, which would explain that kind of social anxiety.

  61. aciddc says:

    Fascinating stuff, thanks! I guess it barely needs to be said, but it’s amazing how analogous this low-level sensory processing stuff is with high-level conceptual thinking. You have a paradigm, whether it’s neoclassical economics or Christianity or whatever, and you try to fit everything into it while ignoring discrepancies, until and unless some discrepancy is too blatant to explain away and forces you to come up with a different paradigm to explain what you’re seeing.

  62. shakeddown says:

    This made me realize something I never noticed was really weird: if we can’t tickle ourselves, why can we masturbate?

    • sandoratthezoo says:

      I think that Scott overstates the “ignore it” part of the mechanism and understates the surprisal part.

      You can’t tickle yourself because tickling is directly related to surprise.

      You can hurt yourself by pinching yourself, or pleasure yourself by masturbating, because those aren’t much to do with surprise. I mean, you can also feel things that you expect to feel: clearly your body does not suppress all sensory input that doesn’t surprise you.

      • baconbacon says:

        In my experience tickling is not related at all to surprise. My 4 year old wants to be tickled and asks for it, switching between “don’t tickle me for real” and “don’t tickle me for pretend” (the former meaning really don’t tickle me, and the latter meaning tickle me). We have been playing this game for months, probably over a thousand individual tickles, and yet he is ticklish, he squirms and laughs often before the actual contact. You can tickle him without touching him, just reaching your arms out and wiggling your fingers. If you surprise tickle him he will usually laugh, but sometimes he just squirms away and says he doesn’t want to be tickled. I don’t see how any of his actions can be taken with “surprise” as a real factor.

        • sandoratthezoo says:

          Tickling is, like… tactical surprise, not strategic. You don’t know exactly what the tickling fingers are going to do, even if you know the overview.

          (My two year old is the same: “Stop!” Okay I stopped. “Again!”)

          • baconbacon says:

            My kids will say “tickle my feet”, “tickle my belly”, “tickle THIS foot”. Its not like I have hundreds of tickling techniques. It seems unlikely to me that surprise is a factor for any definition of surprise that holds meaning for this sort of discussion on brain function.

    • anaisnein says:

      FWIW, I can think of several examples of pleasurable sensations arising from specific ways of being touched in specific places that, if I’m the one doing the touching, either feel like nothing at all or are merely and boringly unpleasant, much like trying to tickle oneself. That phenomenon came to mind strongly as I was reading this post. Masturbation per se works fine, fwiw. The idea that surprisal-as-intrinsic could be the differentiating factor in the things that don’t work solo doesn’t feel like The Intuitively Obvious Answer, but ehh, maybe. (Tbh I’m reluctant to introspect too hard about it in case it throws off the well-working stuff.)

    • armorsmith42 says:

      There are some people who don’t feel pleasure from the motions of masturbation. It would possibly be worth testing if they had particularly strong top-down sensory perception or particularly top-down

      What would one use to test that? Maybe see if they are particularly prone to psychosomatic pain?

  63. romeostevens says:

    If you think you’ve found a theory that explains everything, at least check for critiques by domain experts:

    in particular, having read both Surfing Uncertainty and The Predictive Mind, I did notice what the reviewer here points out: equivocation between evidence for this particular theory and a broader set of hypotheses consistent with this data. Not that this is a damning indictment, this happens constantly, and a book that went too far out of its way to be rigorous in this way would be tiresome to read. But one should not assign particularly high priors to this theory, especially as one notices that it doesn’t allow one to make very many concrete…well…predictions.

  64. Cerastes says:

    So this article once again proves Cerastes’ First Law – Everything gets much more interesting when you apply it to reptiles. ;D

    All of the above seems like a perfectly reasonable way to wire up a sensory and motor system. Now watch this video of baby tegu lizards hatching:

    That’s on the extreme end, but it’s replicated across Reptilia (Amphibia is more complex because of metamorphosis, but it’s replicated in direct-developing species like Plethodontid salamanders and Eleutherodactylid frogs). Zero real sensory input/feedback while in the egg aside from a poorly-informative sample – muted sounds, no light input in nature (most are buried), minimal movement beyond twitches – yet they emerge from the egg and *immediately* can perform any and all locomotor tasks (newborn sidewinders can sidewind), perceive and respond to their environments, even strike with accuracy (as some people have learned to painful effect when dealing with newly born/hatched venomous species).

    So how do you put together a fully functioning sensorimotor system without any decent feedback or input? Is it *all* priors? How can you encode that so reliably in a genome that you can get feats like the tegu in the video? Do such heavy priors prevent motor learning later in life?

    And, presumably, our Bayseian mind method evolved *from* this, not the other way around, since our early ancestors looked like this: How did that happen? And why?

    • hlynkacg says:

      I would guess that increased plasticity/adaptability came at the expense of being able to “pre-load” data.

    • Protagoras says:

      I was under the impression that reptiles aren’t much good at learning (basically impossible to train to do much of anything). If so, that does seem to fit the hypothesis that there’s a trade off between heavy priors and learning.

      • publiusvarinius says:

        So if the top-down/bottom-up processing explanation of schizophrenia were true, we’d expect reptiles to hallucinate a lot. Is this the case?

        • Cerastes says:

          Well, I can’t really be inside their heads, but nothing behaviorally suggests that. Even when drugged up, they just act slower and show coordination difficulties (e.g. trouble standing).

          That said, they also have somewhat different sensory systems. A common trick they use is to sit absolutely still with their eyes still, allowing the photoreceptors to accomodate and basically stop responding to the background (our eyes do this too if forcibly prevented from moving, but otherwise they’re always darting around in millisecond-long “saccades”. Then, when a bug moves in their field of view, it’s literally the only thing their photoreceptors see, making targeting easier. That’s where the “T. rex can’t see you if you don’t move” thing came from, though butchered terribly (since the animal’s own head movements would prevent accomodation).

          And your question just highlights how weird they are – they pop out of mom or the egg with a fully-functional sensorimotor system, can move and see and even recognize the smells of their preferred prey, but they also don’t fall into autistic or schizophrenic excesses of bottom-up of top-down weighting.

          Rational scientific conclusion – suck it, furballs! 😉

      • Cerastes says:

        I guess it depends if we’re lumping all learning in with motor learning (e.g. a human learning to do cartwheels). General learning, the few studies around show that they’re actually pretty comparable to many mammals – I taught my tegu lizard to defecate on the tile floor of the bathroom rather than the carpet by rewarding it with chicken bits, that sort of thing. For motor learning, it’s less clear – I’m not sure anyone’s really tried to teach them something totally “unnatural” (like humans doing cartwheels) to test it. Then again, I’m not sure that’s been tested in other taxa either.

  65. Doug S. says:

    In this context, hypnosis seems to be a way of strengthening the “top-down” system so that your perceptive system conforms to a message like “you are on a beach” by making you actually “see” a beach instead of the chair you’re sitting in…

  66. ThrustVectoring says:

    >So (and I’m asserting this, but see Chapters 4 and 5 of the book to hear the scientific case for this position) if you want to lift your arm, your brain just predicts really really strongly that your arm has been lifted, and then lets the lower levels’ drive to minimize prediction error do the rest.

    If I understand Hal Galper’s Jazz education material correctly, this is also how people play jazz. Your brain predicts really really strongly that the sort of jazz you want is going to happen, and your body just sort of does what it needs to for your high-level drive to be correct. Look up “The Illusion of An Instrument” on youtube – at the end, there’s a striking demonstration of the difference between playing normally vs “imagining the sound you want as loud as you can” (ie, strengthening the top-down “this is the jazz I want” signal).

    There’s also been experiments showing that “imagine yourself practicing sports” improves real sports performance. IIRC it’s not quite as good as actual practice, but this theory seems like it predicts imaginary practice being helpful.

  67. Kaj Sotala says:

    This model also neatly explains the weirdness that is “the illusion of independent agency“: fiction authors perceiving their characters as being real and talking to them. If you’ve got a sufficiently developed model of what your characters would do in different situations, then that has to be top-down driven, because obviously you’re not going to see your characters in your ordinary surroundings…

    At least, not before that model gets so strong that it starts predicting what your characters would do in various ordinary situations, your bottom-up stream starts cooking the books, and then those characters do start up popping up in your ordinary sensory stream. Fun quotes from some authors (taken from the linked paper):

    “I see my characters like actors in a movie. I just write down what they say.”

    I live with all of them every day. Dealing with different events during the day, different ones kind of speak. They say, “Hmm, this is my opinion. Are you going to listen to me?”

    I was out for a walk and on my way to the grocery store. I wasn’t really thinking all that deliberately about the novel, but suddenly, I felt the presence of two of the novel’s more unusual characters behind me. I had the sense that if I turned around they would actually be there on the sidewalk behind me.

    Interestingly, the authors of that paper tested their writers on the Dissociative Experience Scale, on which people such as schizophrenics tend to score more highly than the general population. The writers had a mean score of 19; the general population has a mean of 8 (significant at p < .001), and schizophrenics a mean score of 18. However, the profile of the writer's scores was different than it is with schizophrenics. To quote from the paper:

    … the writers’ scores are closer to the average DES score for a sample of 61 schizophrenics (schizophrenic M = 17.7) [27]. Seven of the writers scored at or above 30, a commonly used cutoff for “normal scores” [29]. There was no difference between men’s and women’s overall DES scores in our sample, a finding consistent with results found in other studies of normal populations [26].

    With these comparisons, our goal is to highlight the unusually high scores for our writers, not to suggest that they were psychologically unhealthy. Although scores of 30 or above are more common among people with dissociative disorders (such as Dissociative Identity Disorder), scoring in this range does not guarantee that the person has a dissociative disorder, nor does it constitute a diagnosis of a dissociative disorder [27,29]. Looking at the different subscales of the DES, it is clear that our writers deviated from the norm mainly on items related to the absorption and changeability factor of the DES. Average scores on this subscale (M = 26.22, SD = 14.45) were significantly different from scores on the two subscales that are particularly diagnostic for dissociative disorders: derealization and depersonalization subscale (At = 7.84, SD = 7.39) and the amnestic experiences subscale (M = 6.80, SD = 8.30), F(1,48) = 112.49, p < -001. These latter two subscales did not differ from each other, F(1, 48) = ,656, p = .42. Seventeen writers scored above 30 on the absorption and changeability scale, whereas only one writer scored above 30 on the derealization and depersonalization scale and only one writer (a different participant) scored above 30 on the amnestic experiences scale.

    (also, tulpas)

    • Nornagest says:

      I’ve thought for a while that the process by which we model others (including fictional others) must be basically the same as the process by which our own consciousness works. It just doesn’t make sense for us to evolve two completely different mechanisms to do basically the same stuff.

  68. Furslid says:

    I thought of an interesting experiment for confirming the prediction theory of motions. I think we have the tech to pull this off.

    1. Give the subject VR goggles. Have the goggles set to show their own hands and arms. It might be necessary to pare down the graphics to wireframe or similar.
    2. Have them do a task that requires moving their hands in their field of view so they get used to accepting what they see as predicting their movements.
    3. Without telling the subject, have the display show their hands moving.

    Do people move their hands to match the display? Or do they realize that the display has become unreliable?
    Has anyone done a similar experiment?

  69. Eponymous says:

    I’d be curious how various common biases fit into this. I mean, one reading here is that our brains are doing something very close to Bayesian processing, and so should be getting things about right on average, unless this process is somehow fundamentally miscalibrated. But that seems unlikely, since it’s presumably been fine-tuned by evolution, and there’s no reason it wouldn’t be calibrated accurately, at least for the training set of the EEA. In other words, it sounds like we shouldn’t find lots of huge cognitive biases such as we in fact do find.

    To be concrete, let’s take one of the biggest cognitive biases: confirmation bias.

    Here’s an interpretation within the PP paradigm: our brain uses our current beliefs about the world to make predictions. But then it “tunes out” a lot of the conflicting observations, and likewise doesn’t notice various logical contradictions. So it doesn’t rethink the hypothesis. It views the world as being generally consistent with its existing hypothesis, and so doesn’t question it, even in the face of countervailing evidence.

    But if we’re calling this a bias, it seems we’re saying our brains are wrong to do this. So why didn’t evolution tweak our weights to fix it? Why does it assign too much weight to our priors, and put such wide confidence intervals on our observations?

    Here’s my guess: the PP system was designed for navigating the concrete world, where our beliefs are pretty good relative to the noise in the sense data. Thus it’s set to tune out a lot of small niggling doubts. But our brain (improperly) uses the same cognitive machinery for analyzing complex abstract questions, where we have much less evidence to work with, and thus shouldn’t hold our beliefs with as much certainty. Thus it inappropriately assigns too much weight to our existing beliefs.

    But there’s another factor at play: there’s a difference between how you want to deal with contradictory data if you’re trying to navigate the concrete physical world, vs. assess the evidence on a particular question. When you’re trying to navigate the world, you want to tune out the noise that would just distract and confuse you. But when you’re assessing the evidence about a proposition, you have to keep track of all the little contradictory facts, to judge whether they cumulatively add up to enough counter-evidence to reduce your confidence in that belief.

    But here’s the thing that’s got me really curious: given that we have this cool built-in Bayesian subprocessor, can we repurpose it for other uses?

  70. JohnBuridan says:

    One thing I do is teach and read philosophy of language and bit of language acquisition research. Teaching is largely about taking methods and tools and background knowledge that are outside of your students’ model of the world, and trying to make them part of their top-down processing.

    Concerning language acquisition PP seems to be commensurate with the scientific literature.
    -When students understand 80% of linguistic input, they can predict the remaining 20% fairly well.
    -Grammar is acquired fastest when there is lots of context-significant exposure to a limited vocabulary
    -Humans AND many animals use so-called “baby-talk”, simplified versions of their own language, to talk to the little versions of themselves.
    -Some people are always confounded that babies acquire language ridiculously quickly, and make surprisingly few mistakes.
    -In learning to read, the rich get richer. The more a student comprehends today, the more that same student will comprehend tomorrow.
    -Reading Comprehension requires really good guesswork.
    -On 80,000 Hours 8 of the top 10 most desirable job skills requires good prediction skills of people, systems, oneself, and products.

  71. carvenvisage says:

    The gorilla test is a total bait and switch. “Narrow your focus on one thing”… Ha, got you! you narrowed your focus too much..

    That’s also my simpler hypothesis for why some people don’t see the gorilla.

    • armorsmith42 says:

      Isn’t that just a less-general statement of the very same hypothesis?

      • carvenvisage says:

        It would also be much simpler and more intuitive, but no, the theory in the post is that your brain sees the gorilla, but filters it out, I’m saying if you focus hard enough you shouldn’t see the gorilla at all. The whole point of the exercise is to make your brain go nothing but ‘ball…balll…balll..ballballball…’ and regard everything else as noise.

        If there’s a silhouette in the background, the process should be

        ‘Vague background something, not the ball, discard’,


        ‘vague background something, look closer.. is that fur? Makes no sense. Discard’

    • Eponymous says:

      But most people would naively predict that they would notice a person in a gorilla suit even if they were concentrating on people passing basketballs, so the result is still surprising.

      Besides, most of the time people are concentrating at least somewhat on something. This experiment suggests that this leaves fairly significant blindspots, even if less extreme than literally a man in a gorilla suit beating his chest in the middle of the scene.

  72. Tibor says:

    This looks like an incredibly interesting book. Also probably quite hard to read. Luckily, I know a few neuroscientists, so I can bother them if I get stuck. I am very curious what their opinion of the book is/will be.

  73. armorsmith42 says:

    > Rick Astley’s “Never Going To Give You Up” repeated again and again for ten hours (you can find some weird stuff on youtube

    I work in an open-plan office and although I have noise-canceling headphones, I don’t know of a place to buy signal-canceling headphones. Consequently, when there is somebody behind me asking a question about rspec tests, I would ordinarily notice this and be distracted by it. To counteract this, I usually put on some sort of music. Sometimes instrumental music like Taiko drumming or House works well, but sometimes I need to have a human voice in order to *expect* a human voice and not pay attention to the actual humans near me. However, I don’t want to listen to an ever-changing array different things being said. So, I will sometimes listen to Rick Astley’s “Never Going To Give You Up” repeated again and again for ten hours.

    My suspicion has long been that this leaves me more attentional space to notice things like syntax errors.


    • Dedicating Ruckus says:

      The really surprising thing is why, at doubtless stupendous cost of disk space, there are a whole lot of ” repeated for 8-20 hours” videos on Youtube at all.

      The answer, of course, is their failure to provide such a trivial interface affordance as a “repeat video” mode.

    • TheEternallyPerplexed says:

      Nobody else noted you closed the open ‘(‘ ?
      Or is this sooo ordinary a joke here that one doesn’t mention it?

      (Read the comment in near-total silence.)

  74. smopecakes says:

    So, after about a decade of being quite sensitive to light this is the second day that I have not experienced that and I think it was influenced by reading the “It’s Bayes All The Way Up” a week or two ago.

    With apologies, here’s my cognitive Bayesian theory/timeline of my experience from about one year before it started:

    After years of slowly, sort of in the background, thinking about it I came to a point where I definitely decided that my highest level top down predictive theory was not valid – that biblical Christianity was not exclusive and definitely not perfect and literal truth. As I remember that acceptance was perfectly correlated with a mental breakdown. I had been on the way to shower, did not fall, did not lie down, but ended up lying down will-less on the floor looking towards the stove. “Will-less” itself seems to convey too much agency. I was calm and aware and after maybe 20 minutes to maybe 2 hours I continued to the shower. It was very memorable as I felt strongly as if I needed to be specifically aware in every moment that I should keep standing if I wanted to keep standing, and that included unusual specific awareness of how to keep standing that I might have gone through in learning it. I wondered if I would be able to be a person anymore and modeled/predicted that I would feel no emotion hugging family members which is how it did feel at first.

    I had already felt like retreating from the world because of my loss of a sense of meaning and got a job at a family farm. It was not at all a fix for me. A favourite uncle as a kid was not my favourite uncle but a person who was sort of a conventional crabby and even arguably racist conservative – which was also harder for me to compartmentalize because I am conservative to libertarian. At that point I had really never had personal experience with someone being “racist-y”, now it was my favourite uncle. My grandma had been in our church branch pretty much since it began which highly emphasized unconditional love for family, friends, neighbors and enemies. She had also been a North American farm woman all her life which had one commandment: Do Not Be Weak. In my system you could and even should show weakness to others, which enabled you to signal to each other that you had unconditional love or respect for the inherent value of the other. Being a child of farm kid parents meant that I had a strong expectation that others would naturally share and be aware of my background belief that you should be strong. I expected others to know I valued that and was proud to work on days that I barely could without mentioning it, as you would have had to in her generation even. One day I was sick and my previous lack of signaling strength in things like not eating junk food pretty much ever, and sleeping upstairs in the living room where it was warmer since I was that sick had my grandma gloriously express her One Commandment by kicking me in the morning and saying “Been eatin’ too much junk?”.

    So now my favourite uncle wasn’t and I had to actively signal my basic value to my own Christian grandmother. Not only was the external truth validity of my highest top down predictor gone but my expectation for people who believed in it as the truth to actually closely and consistently believe in it was gone. My predictive system for people including the closest ones to me was done and my newly highest predictive system of conservative/libertarianism was devalued as well.

    With less certainty, after and not before that I became really sensitive to light. I started wearing hats all the time so that I could never see above the horizon. I interpreted the pain of looking at the sky as the pain of seeing beauty when I no longer believed in it. Now it seems to me that I had internally discarded my top down predictive systems to the extent that I was now overwhelmed with details if I even looked in the distance, which I interpreted as newer more curved prescription glasses making it hard to look at the landscape. Predictively the sky is really simple while in detail it is a vast array of strong signals without boundaries.

    As far as I remember I was never comfortable with daylight again until yesterday. The morning light through cracked shades would leave me unable to just get up and going without specific effort or closing the curtain until it was too dark to realistically get dressed, so I would put sunglasses on before clothes. Today and yesterday I only wore them around half the day in broad sunlight, almost wearing them more to not push my luck. I have been able to just look around easily as well which is amazing and is bringing back old memories of that feeling.

    It seems to me that I made a sub to semi-conscious shift to re-engaging top down modeling even though I can’t trust it to be as true as I would like. I had a four day weekend from work which might have created space for the revaluing to happen. On the first day back suddenly the morning wasn’t uncomfortably bright. Of course I don’t know! The theory could be untrue. It could be that I was coming back around to a top down engagement just as I turned it down in the first place but then it would make even more sense that it was influenced by reading about the shape and effect of it. If so, thank you for writing about it.

    tl;dr Top down predictive systems were devalued. Became really sensitive to light. Read post about theory of top down and bottom up cognition and maybe sub-consciously revalued predictive cognition enough to handle brightness.

  75. ygbm says:

    re: “hyperpriors”

    It’s plausible that large swaths of this system that could be driven by very simple priors. But there’s a lot of complex behavior that does seem to be innate, and I’ve always wondered what the representation of this looks like. How do things like status seeking, or very specific sexual attractions get encoded in genes which produce specific brain structures?

  76. waltonmath says:

    If I’m understanding right, you say that the motor system is caused by predicting actions on a high level and then making them happen with low level actions. If this is true in more generality than just the motor system, it might explain why frustrating-sounding advice like “believe in yourself” works. I’m squinting a bit here, but it feels related to things like Löb’s Theorem and maybe Newcomblike problems.

  77. Kuiperdolin says:

    Not sure how other glass-wearers can relate:

    I wear glasses. Most of the time I tune out the frame (except when they’re new glasses) but if I king of “focus out” I can see their outline.

    The thing is, when I do that I can see their outline even when I’m not wearing them!

    In that case it feels like we have three layers that work in opposite directions:

    My lying eyes : there are no glass frames here.
    My low-level brain : of course there are glass frames. There are always glass frames.
    My expanding, fully-lit brain : there are definitely glass frames, but for convenience let’s pretend there aren’t.

    Eventually the two corrections compensate each other, like two lies in a convoluted Our-Man-in Havana plot. It’s funny.

  78. Eponymous says:

    I know we have a number of ASD-ish people here, but I wonder if there are many people who are sort of the opposite of ASD. I think this describes me.

    Basically, if you take this perspective as saying that autistic people are excessively bottom up, and tend to be overwhelmed by the volume of detailed sensory data, then I’m just the opposite. I easily zone out from my surroundings, enter flow states quickly, and notice and complete patterns very quickly (sometimes subconsciously). But the downside is that I have a tendency to zone out when I’m talking to people, I’m pretty absent-minded and forgetful, and I’m prone to careless mistakes. I never notice the double “the”s, which makes me a lousy proofreader.

    Anyone else think this describes them? And what’s the psych disorder corresponding to the extreme form of this? Dyslexia?

  79. Doctor Mist says:


    Given that imagination seems to be sort of free-riding on complicated machinery that is needed just to explain perception, I wonder what inferences one can draw about animal consciousness. (Is anything known about whether a dog can imagine a house? Is anything even knowable?)

    Also, it would be very interesting to understand how this mechanism, driven by “the brain really hates prediction error”, interacts with Dennett’s theory of humor, which at the risk of oversimplification is “the brain strongly rewards figuring out something that is unexpected”. I suppose the latter might be confined to the neocortex or something. Or maybe they are describing the same thing but at two adjacent levels in the hierarchy. (Or maybe Dennett is wrong. Or maybe I am oversimplifying.)

    • Paul Brinkley says:

      I wonder what inferences one can draw about animal consciousness.

      This is exactly what I was asking in my comment about sophonts.

      We can assume that a dog has plenty of bottom-up sensory data. We can assume that at least a little top-down is going on, since that would explain how training can happen. How much? is an interesting question. And how does it compare to other animals? How does it compare between animals that are easily domesticated and those that are not? Between social animals and solitary? Between mammals and non-mammals? Between near-sophonts and nowhere-near?

  80. moridinamael says:

    Since you mentioned meditation but didn’t dive into it, I’ll take a stab at it.

    In many (most?) traditions of meditation, the objective is to keep the attention focused on some very slightly interesting sense input. This might be the flame of a candle, or a complex colorful pattern, or the sensation of your breath at your abdomen or at the tip of your nose. I think it’s interesting that the meditation objects are usually not completely uninteresting. After all, the sensation of your breath changes minutely but delectably from breath to breath, as does the exact shape of the candle’s flame.

    Still, though, it’s really difficult to hold your attention on something that is giving such infinitessimal levels of surprisal. By the 1000th hour of meditation, you have such a strong predictive model of what your next breath is going to feel like that you cancel it out, the same way you cancel out the sensation of your shirt on your back when you’re not paying attention to it. What’s weird is that you’ll cancel out the sensation of your breath even if you’re trying to pay attention to it, which is not something most people get to experience.

    I once reached the point where I canceled out my perception of every aspect of having a body. I couldn’t perceive my breathing, or the light behind my eyelids, or anything.

    At very very high levels of meditative practice, you learn to predict and then attenuate out deeper and deeper mental constructs, until you lose all sense of identity, of being a separate entity, and of having an existence at all, while nonetheless remaining conscious. Thoughts and feelings become classified as perceptions no different from the sensation of the shirt on your back and eventually lose all emotional salience, and eventually stop intruding into consciousness in the first place.

    You can immediately see ways in which this would be beneficial – improved self-awareness and self-control, and a reduction in impulsiveness – and also potentially pathological – a sense of complete meaninglessness, which is indeed a failure mode of advanced meditators.

  81. manga3dmann says:

    This seems to makes sense out of my first and only experience of getting drunk. My motor functions were not acting as expected, so I felt like I was having an out of body experience.

  82. padster says:

    For possible biological underpinnings, I’d recommend looking into the work of Fabienne Picard:

    tl;dr: there’s a type of epilepsy where afterwards, you report feeling extremely happy, Many people also report feeling complete certainty in how the world works (i.e. zero prediction error). Some findings suggest that this is through screwing with prediction calculations in the insula, supressing error detection which results in huge doses of happiness.

  83. venkyclement says:

    It’s kind of amazing to think that brains can do both fine grained differential geometry as well as coarse grained algebraic topology (in addition to probability and analysis).

  84. Krzysztof Wolyniec says:

    This is all very interesting, but I find the final comment bizzare: it’s irrational in small sample inference to minimize bias. Optimal cognition involves the trade-off between variance and bias.

  85. Roakh says:

    How does this fit in with the phasic/tonic dopamine distinction? Seems pretty relevant:

  86. fion says:

    Sorry if it’s already been pointed out: but do you know you have two section IVs?

  87. PB says:

    Great article, thanks Scott. Many similarities between this predictive processing model and the information bottleneck theory of deep learning, see e.g. recent Quanta article