[Related to: It’s Bayes All The Way Up, Why Are Transgender People Immune To Optical Illusions?, Can We Link Perception And Cognition?]
I.
Sometimes I have the fantasy of being able to glut myself on Knowledge. I imagine meeting a time traveler from 2500, who takes pity on me and gives me a book from the future where all my questions have been answered, one after another. What’s consciousness? That’s in Chapter 5. How did something arose out of nothing? Chapter 7. It all makes perfect intuitive sense and is fully vouched by unimpeachable authorities. I assume something like this is how everyone spends their first couple of days in Heaven, whatever it is they do for the rest of Eternity.
And every so often, my fantasy comes true. Not by time travel or divine intervention, but by failing so badly at paying attention to the literature that by the time I realize people are working on a problem it’s already been investigated, experimented upon, organized into a paradigm, tested, and then placed in a nice package and wrapped up with a pretty pink bow so I can enjoy it all at once.
The predictive processing model is one of these well-wrapped packages. Unbeknownst to me, over the past decade or so neuroscientists have come up with a real theory of how the brain works – a real unifying framework theory like Darwin’s or Einstein’s – and it’s beautiful and it makes complete sense.
Surfing Uncertainty isn’t pop science and isn’t easy reading. Sometimes it’s on the border of possible-at-all reading. Author Andy Clark (a professor of logic and metaphysics, of all things!) is clearly brilliant, but prone to going on long digressions about various esoteric philosophy-of-cognitive-science debates. In particular, he’s obsessed with showing how “embodied” everything is all the time. This gets kind of awkward, since the predictive processing model isn’t really a natural match for embodiment theory, and describes a brain which is pretty embodied in some ways but not-so-embodied in others. If you want a hundred pages of apologia along the lines of “this may not look embodied, but if you squint you’ll see how super-duper embodied it really is!”, this is your book.
It’s also your book if you want to learn about predictive processing at all, since as far as I know this is the only existing book-length treatment of the subject. And it’s comprehensive, scholarly, and very good at giving a good introduction to the theory and why it’s so important. So let’s be grateful for what we’ve got and take a look.
II.
Stanislas Dehaene writes of our senses:
We never see the world as our retina sees it. In fact, it would be a pretty horrible sight: a highly distorted set of light and dark pixels, blown up toward the center of the retina, masked by blood vessels, with a massive hole at the location of the “blind spot” where cables leave for the brain; the image would constantly blur and change as our gaze moved around. What we see, instead, is a three-dimensional scene, corrected for retinal defects, mended at the blind spot, stabilized for our eye and head movements, and massively reinterpreted based on our previous experience of similar visual scenes. All these operations unfold unconsciously—although many of them are so complicated that they resist computer modeling. For instance, our visual system detects the presence of shadows in the image and removes them. At a glance, our brain unconsciously infers the sources of lights and deduces the shape, opacity, reflectance, and luminance of the objects.
Predictive processing begins by asking: how does this happen? By what process do our incomprehensible sense-data get turned into a meaningful picture of the world?
The key insight: the brain is a multi-layer prediction machine. All neural processing consists of two streams: a bottom-up stream of sense data, and a top-down stream of predictions. These streams interface at each level of processing, comparing themselves to each other and adjusting themselves as necessary.
The bottom-up stream starts out as all that incomprehensible light and darkness and noise that we need to process. It gradually moves up all the cognitive layers that we already knew existed – the edge-detectors that resolve it into edges, the object-detectors that shape the edges into solid objects, et cetera.
The top-down stream starts with everything you know about the world, all your best heuristics, all your priors, everything that’s ever happened to you before – everything from “solid objects can’t pass through one another” to “e=mc^2” to “that guy in the blue uniform is probably a policeman”. It uses its knowledge of concepts to make predictions – not in the form of verbal statements, but in the form of expected sense data. It makes some guesses about what you’re going to see, hear, and feel next, and asks “Like this?” These predictions gradually move down all the cognitive layers to generate lower-level predictions. If that uniformed guy was a policeman, how would that affect the various objects in the scene? Given the answer to that question, how would it affect the distribution of edges in the scene? Given the answer to that question, how would it affect the raw-sense data received?
Both streams are probabilistic in nature. The bottom-up sensory stream has to deal with fog, static, darkness, and neural noise; it knows that whatever forms it tries to extract from this signal might or might not be real. For its part, the top-down predictive stream knows that predicting the future is inherently difficult and its models are often flawed. So both streams contain not only data but estimates of the precision of that data. A bottom-up percept of an elephant right in front of you on a clear day might be labelled “very high precision”; one of a a vague form in a swirling mist far away might be labelled “very low precision”. A top-down prediction that water will be wet might be labelled “very high precision”; one that the stock market will go up might be labelled “very low precision”.
As these two streams move through the brain side-by-side, they continually interface with each other. Each level receives the predictions from the level above it and the sense data from the level below it. Then each level uses Bayes’ Theorem to integrate these two sources of probabilistic evidence as best it can. This can end up a couple of different ways.
First, the sense data and predictions may more-or-less match. In this case, the layer stays quiet, indicating “all is well”, and the higher layers never even hear about it. The higher levels just keep predicting whatever they were predicting before.
Second, low-precision sense data might contradict high-precision predictions. The Bayesian math will conclude that the predictions are still probably right, but the sense data are wrong. The lower levels will “cook the books” – rewrite the sense data to make it look as predicted – and then continue to be quiet and signal that all is well. The higher levels continue to stick to their predictions.
Third, there might be some unresolvable conflict between high-precision sense-data and predictions. The Bayesian math will indicate that the predictions are probably wrong. The neurons involved will fire, indicating “surprisal” – a gratuitiously-technical neuroscience term for surprise. The higher the degree of mismatch, and the higher the supposed precision of the data that led to the mismatch, the more surprisal – and the louder the alarm sent to the higher levels.
When the higher levels receive the alarms from the lower levels, this is their equivalent of bottom-up sense-data. They ask themselves: “Did the even-higher-levels predict this would happen?” If so, they themselves stay quiet. If not, they might try to change their own models that map higher-level predictions to lower-level sense data. Or they might try to cook the books themselves to smooth over the discrepancy. If none of this works, they send alarms to the even-higher-levels.
All the levels really hate hearing alarms. Their goal is to minimize surprisal – to become so good at predicting the world (conditional on the predictions sent by higher levels) that nothing ever surprises them. Surprise prompts a frenzy of activity adjusting the parameters of models – or deploying new models – until the surprise stops.
All of this happens several thousand times a second. The lower levels constantly shoot sense data at the upper levels, which constantly adjust their hypotheses and shoot them down at the lower levels. When surprise is registered, the relevant levels change their hypotheses or pass the buck upwards. After umpteen zillion cycles, everyone has the right hypotheses, nobody is surprised by anything, and the brain rests and moves on to the next task. As per the book:
To deal rapidly and fluently with an uncertain and noisy world, brains like ours have become masters of prediction – surfing the waves and noisy and ambiguous sensory stimulation by, in effect, trying to stay just ahead of them. A skilled surfer stays ‘in the pocket’: close to, yet just ahead of the place where the wave is breaking. This provides power and, when the wave breaks, it does not catch her. The brain’s task is not dissimilar. By constantly attempting to predict the incoming sensory signal we become able – in ways we shall soon explore in detail – to learn about the world around us and to engage that world in thought and action.
The result is perception, which the PP theory describes as “controlled hallucination”. You’re not seeing the world as it is, exactly. You’re seeing your predictions about the world, cashed out as expected sensations, then shaped/constrained by the actual sense data.
III.
Enough talk. Let’s give some examples. Most of you have probably seen these before, but it never hurts to remind:

This demonstrates the degree to which the brain depends on top-down hypotheses to make sense of the bottom-up data. To most people, these two pictures start off looking like incoherent blotches of light and darkness. Once they figure out what they are (spoiler) the scene becomes obvious and coherent. According to the predictive processing model, this is how we perceive everything all the time – except usually the concepts necessary to make the scene fit together come from our higher-level predictions instead of from clicking on a spoiler link.

This demonstrates how the top-down stream’s efforts to shape the bottom-up stream and make it more coherent can sometimes “cook the books” and alter sensation entirely. The real picture says “PARIS IN THE THE SPRINGTIME” (note the duplicated word “the”!). The top-down stream predicts this should be a meaningful sentence that obeys English grammar, and so replaces the the bottom-up stream with what it thinks that it should have said. This is a very powerful process – how many times have I repeated the the word “the” in this paragraph alone without you noticing?

A more ambiguous example of “perception as controlled hallucination”. Here your experience doesn’t quite deny the jumbled-up nature of the letters, but it superimposes a “better” and more coherent experience which appears naturally alongside.
Next up – this low-quality video of an airplane flying at night. Notice how after an instant, you start to predict the movement and characteristics of the airplane, so that you’re no longer surprised by the blinking light, the movement, the other blinking light, the camera shakiness, or anything like that – in fact, if the light stopped blinking, you would be surprised, even though naively nothing could be less surprising than a dark portion of the night sky staying dark. After a few seconds of this, the airplane continuing on its (pretty complicated) way just reads as “same old, same old”. Then when something else happens – like the camera panning out, or the airplane making a slight change in trajectory – you focus entirely on that, the blinking lights and movement entirely forgotten or at least packed up into “airplane continues on its blinky way”. Meanwhile, other things – like the feeling of your shirt against your skin – have been completely predicted away and blocked from consciousness, freeing you to concentrate entirely on any subtle changes in the airplane’s motion.
In the same vein: this is Rick Astley’s “Never Going To Give You Up” repeated again and again for ten hours (you can find some weird stuff on YouTube). The first hour, maybe you find yourself humming along occasionally. By the second hour, maybe it’s gotten kind of annoying. By the third hour, you’ve completely forgotten it’s even on at all.
But suppose that one time, somewhere around the sixth hour, it skipped two notes – just the two syllables “never”, so that Rick said “Gonna give you up.” Wouldn’t the silence where those two syllables should be sound as jarring as if somebody set off a bomb right beside you? Your brain, having predicted sounds consistent with “Never Gonna Give You Up” going on forever, suddenly finds its expectations violated and sends all sorts of alarms to the higher levels, where they eventually reach your consciousness and make you go “What the heck?”
IV.
Okay. You’ve read a lot of words. You’ve looked at a lot of pictures. You’ve listened to “Never Gonna Give You Up” for ten hours. Time for the payoff. Let’s use this theory to explain everything.
1. Attention. In PP, attention measures “the confidence interval of your predictions”. Sense-data within the confidence intervals counts as a match and doesn’t register surprisal. Sense-data outside the confidence intervals fails and alerts higher levels and eventually consciousness.
This modulates the balance between the top-down and bottom-up streams. High attention means that perception is mostly based on the bottom-up stream, since every little deviation is registering an error and so the overall perceptual picture is highly constrained by sensation. Low attention means that perception is mostly based on the top-down stream, and you’re perceiving only a vague outline of the sensory image with your predictions filling in the rest.
There’s a famous experiment which you can try below – if you’re trying it, make sure to play the whole video before moving on:
…
…
About half of subjects, told to watch the players passing the ball, don’t notice the gorilla. Their view of the ball-passing is closely constrained by the bottom-up stream; they see mostly what is there. But their view of the gorilla is mostly dependent on the top-down stream. Their confidence intervals are wide. Somewhere in your brain is a neuron saying “is that a guy in a gorilla suit?” Then it consults the top-down stream, which says “This is a basketball game, you moron”, and it smooths out the anomalous perception into something that makes sense like another basketball player.
But if you watch the video with the prompt “Look for something strange happening in the midst of all this basketball-playing”, you see the gorilla immediately. Your confidence intervals for unusual things are razor-thin; as soon as that neuron sees the gorilla it sends alarms to higher levels, and the higher levels quickly come up with a suitable hypothesis (“there’s a guy in a gorilla suit here”) which makes sense of the new data.
There’s an interesting analogy to vision here, where the center of your vision is very clear, and the outsides are filled in in a top-down way – I have a vague sense that my water bottle is in the periphery right now, but only because I kind of already know that, and it’s more of a mental note of “water bottle here as long as you ask no further questions” than a clear image of it. The extreme version of this is the blind spot, which gets filled in entirely with predicted imagery despite receiving no sensation at all.
2. Imagination, Simulation, Dreaming, Etc. Imagine a house. Now imagine a meteor crashing into the house. Your internal mental simulation was probably pretty good. Without even thinking about it, you got it to obey accurate physical laws like “the meteor continues on a constant trajectory”, “the impact happens in a realistic way”, “the impact shatters the meteorite”, and “the meteorite doesn’t bounce back up to space like a basketball”. Think how surprising this is.
In fact, think how surprising it is that you can imagine the house at all. This really high level concept – “house” – has been transformed in your visual imaginarium into a pretty good picture of a house, complete with various features, edges, colors, et cetera (if it hasn’t, read here). This is near-miraculous. Why do our brains have this apparently useless talent?
PP says that the highest levels of our brain make predictions in the form of sense data. They’re not just saying “I predict that guy over there is a policeman”, they’re generating the image of a policeman, cashing it out in terms of sense data, and colliding it against the sensory stream to see how it fits. The sensory stream gradually modulates it to fit the bottom-up evidence – a white or black policeman, a mustached or clean-shaven policeman. But the top-down stream is doing a lot of the work here. We are able to imagine the meteor, using the same machinery that would guide our perception of the meteor if we saw it up in the sky.
All of this goes double for dreaming. If “perception is controlled hallucination” caused by the top-down drivers of perception constrained by bottom-up evidence, then dreams are those top-down drivers playing around with themselves unconstrained by anything at all (or else very weakly constrained by bottom-up evidence, like when it’s really cold in your bedroom and you dream you’re exploring the North Pole).
A lot of people claim higher levels of this – lucid dreaming, astral projection, you name it, worlds exactly as convincing as our own but entirely imaginary. Predictive processing is very sympathetic to these accounts. The generative models that create predictions are really good; they can simulate the world well enough that it rarely surprises us. They also connect through various layers to our bottom-level perceptual apparatus, cashing out their predictions in terms of the lowest-level sensory signals. Given that we’ve got a top-notch world-simulator plus perception-generator in our heads, it shouldn’t be surprising when we occasionally perceive ourselves in simulated worlds.
3. Priming. I don’t mean the weird made-up kinds of priming that don’t replicate. I mean the very firmly established ones, like the one where, if you flash the word “DOCTOR” at a subject, they’ll be much faster and more skillful in decoding a series of jumbled and blurred letters into the word “NURSE”.
This is classic predictive processing. The top-down stream’s whole job is to assist the bottom-up stream in making sense of complicated fuzzy sensory data. After it hears the word “DOCTOR”, the top-down stream is already thinking “Okay, so we’re talking about health care professionals”. This creeps through all the lower levels as a prior for health-care related things; when the sense organs receive data that can be associated in a health-care related manner, the high prior helps increase the precision of this possibility until it immediately becomes the overwhelming leading hypothesis.
4. Learning. There’s a philosophical debate – which I’m not too familiar with, so sorry if I get it wrong – about how “unsupervised learning” is possible. Supervised reinforcement learning is when an agent tries various stuff, and then someone tells the agent if it’s right or wrong. Unsupervised learning is when nobody’s around to tell you, and it’s what humans do all the time.
PP offers a compelling explanation: we create models that generate sense data, and keep those models if the generated sense data match observation. Models that predict sense data well stick around; models that fail to predict the sense data accurately get thrown out. Because of all those lower layers adjusting out contingent features of the sensory stream, any given model is left with exactly the sense data necessary to tell it whether it’s right or wrong.
PP isn’t exactly blank slatist, but it’s compatible with a slate that’s pretty fricking blank. Clark discusses “hyperpriors” – extremely basic assumptions about the world that we probably need to make sense of anything at all. For example, one hyperprior is sensory synchronicity – the idea that our five different senses are describing the same world, and that the stereo we see might be the source of the music we hear. Another hyperprior is object permanence – the idea that the world is divided into specific objects that stick around whether or not they’re in the sensory field. Clark says that some hyperpriors might be innate – but says they don’t have to be, since PP is strong enough to learn them on its own if it has to. For example, after enough examples of, say, seeing a stereo being smashed with a hammer at the same time that music suddenly stops, the brain can infer that connecting the visual and auditory evidence together is a useful hack that helps it to predict the sensory stream.
I can’t help thinking here of Molyneux’s Problem, a thought experiment about a blind-from-birth person who navigates the world through touch alone. If suddenly given sight, could the blind person naturally connect the visual appearance of a cube to her own concept “cube”, which she derived from the way cubes feel? In 2003, some researchers took advantage of a new cutting-edge blindness treatment to test this out; they found that no, the link isn’t intuitively obvious to them. Score one for learned hyperpriors.
But learning goes all the way from these kinds of really basic hyperpriors all the way up to normal learning like what the capital of France is – which, if nothing else, helps predict what’s going to be on the other side of your geography flashcard, and which high-level systems might keep as a useful concept to help it make sense of the world and predict events.
5. Motor Behavior. About a third of Surfing Uncertainty is on the motor system, it mostly didn’t seem that interesting to me, and I don’t have time to do it justice here (I might make another post on one especially interesting point). But this has been kind of ignored so far. If the brain is mostly just in the business of making predictions, what exactly is the motor system doing?
Based on a bunch of really excellent experiments that I don’t have time to describe here, Clark concludes: it’s predicting action, which causes the action to happen.
This part is almost funny. Remember, the brain really hates prediction error and does its best to minimize it. With failed predictions about eg vision, there’s not much you can do except change your models and try to predict better next time. But with predictions about proprioceptive sense data (ie your sense of where your joints are), there’s an easy way to resolve prediction error: just move your joints so they match the prediction. So (and I’m asserting this, but see Chapters 4 and 5 of the book to hear the scientific case for this position) if you want to lift your arm, your brain just predicts really really strongly that your arm has been lifted, and then lets the lower levels’ drive to minimize prediction error do the rest.
Under this model, the “prediction” of a movement isn’t just the idle thought that a movement might occur, it’s the actual motor program. This gets unpacked at all the various layers – joint sense, proprioception, the exact tension level of various muscles – and finally ends up in a particular fluid movement:
Friston and colleagues…suggest that precise proprioceptive predictions directly elicit motor actions. This means that motor commands have been replaced by (or as I would rather say, implemented by) proprioceptive predictions. According to active inference, the agent moves body and sensors in ways that amount to actively seeking out the sensory consequences that their brains expect. Perception, cognition, and action – if this unifying perspective proves correct – work together to minimize sensory prediction errors by selectively sampling and actively sculpting the stimulus array. This erases any fundamental computational line between perception and the control of action. There remains [only] an obvious difference in direction of fit. Perception here matches hural hypotheses to sensory inputs…while action brings unfolding proprioceptive inputs into line with neural predictions. The difference, as Anscombe famously remarked, is akin to that between consulting a shopping list (thus letting the list determine the contents of the shopping basket) and listing some actually purchased items (thus letting the contents of the shopping basket determine the list). But despite the difference in direction of fit, the underlying form of the neural computations is now revealed as the same.
6. Tickling Yourself. One consequence of the PP model is that organisms are continually adjusting out their own actions. For example, if you’re trying to predict the movement of an antelope you’re chasing across the visual field, you need to adjust out the up-down motion of your own running. So one “hyperprior” that the body probably learns pretty early is that if it itself makes a motion, it should expect to feel the consequences of that motion.
There’s a really interesting illusion called the force-matching task. A researcher exerts some force against a subject, then asks the subject to exert exactly that much force against something else. Subjects’ forces are usually biased upwards – they exert more force than they were supposed to – probably because their brain’s prediction engines are “cancelling out” their own force. Clark describes one interesting implication:
The same pair of mechanisms (forward-model-based prediction and the dampening of resulting well-predicted sensation) have been invoked to explain the unsettling phenomenon of ‘force escalation’. In force escalation, physical exchanges (playground fights being the most common exemplar) mutually ramp up via a kind of step-ladder effect in which each person believes the other one hit them harder. Shergill et al describe experiments that suggest that in such cases each person is truthfully reporting their own sensations, but that those sensations are skewed by the attenuating effects of self-prediction. Thus, ‘self-generated forces are perceived as weaker than externally generated forces of equal magnitude.’
This also explains why you can’t tickle yourself – your body predicts and adjusts away your own actions, leaving only an attenuated version.
7. The Placebo Effect. We hear a lot about “pain gating” in the spine, but the PP model does a good job of explaining what this is: adjusting pain based on top-down priors. If you believe you should be in pain, the brain will use that as a filter to interpret ambiguous low-precision pain signals. If you believe you shouldn’t, the brain will be more likely to assume ambiguous low-precision pain signals are a mistake. So if you take a pill that doctors assure you will cure your pain, then your lower layers are more likely to interpret pain signals as noise, “cook the books” and prevent them from reaching your consciousness.
Psychosomatic pain is the opposite of this; see Section 7.10 of the book for a fuller explanation.
8. Asch Conformity Experiment. More speculative, and not from the book. But remember this one? A psychologist asked subjects which lines were the same length as other lines. The lines were all kind of similar lengths, but most subjects were still able to get the right answer. Then he put the subjects in a group with confederates; all of the confederates gave the same wrong answer. When the subject’s turn came, usually they would disbelieve their eyes and give the same wrong answer as the confederates.
The bottom-up stream provided some ambiguous low-precision bottom-up evidence pointing toward one line. But in the final Bayesian computation, those were swamped by the strong top-down prediction that it would be another. So the middle layers “cooked the books” and replaced the perceived sensation with the predicted one. From Wikipedia:
Participants who conformed to the majority on at least 50% of trials reported reacting with what Asch called a “distortion of perception”. These participants, who made up a distinct minority (only 12 subjects), expressed the belief that the confederates’ answers were correct, and were apparently unaware that the majority were giving incorrect answers.
9. Neurochemistry. PP offers a way to a psychopharmacological holy grail – an explanation of what different neurotransmitters really mean, on a human-comprehensible level. Previous attempts to do this, like “dopamine represents reward, serotonin represents calmness”, have been so wildly inadequate that the whole question seems kind of disreputable these days.
But as per PP, the NMDA glutamatergic system mostly carries the top-down stream, the AMPA glutamatergic system mostly carries the bottom-up stream, and dopamine mostly carries something related to precision, confidence intervals, and surprisal levels. This matches a lot of observational data in a weirdly consistent way – for example, it doesn’t take a lot of imagination to think of the slow, hesitant movements of Parkinson’s disease as having “low motor confidence”.
10. Autism. Various research in the PP tradition has coalesced around the idea of autism as an unusually high reliance on bottom-up rather than top-down information, leading to “weak central coherence” and constant surprisal as the sensory data fails to fall within pathologically narrow confidence intervals.
Autistic people classically can’t stand tags on clothing – they find them too scratchy and annoying. Remember the example from Part III about how you successfully predicted away the feeling of the shirt on your back, and so manage never to think about it when you’re trying to concentrate on more important things? Autistic people can’t do that as well. Even though they have a layer in their brain predicting “will continue to feel shirt”, the prediction is too precise; it predicts that next second, the shirt will produce exactly the same pattern of sensations it does now. But realistically as you move around or catch passing breezes the shirt will change ever so slightly – at which point autistic people’s brains will send alarms all the way up to consciousness, and they’ll perceive it as “my shirt is annoying”.
Or consider the classic autistic demand for routine, and misery as soon as the routine is disrupted. Because their brains can only make very precise predictions, the slightest disruption to routine registers as strong surprisal, strong prediction failure, and “oh no, all of my models have failed, nothing is true, anything is possible!” Compare to a neurotypical person in the same situation, who would just relax their confidence intervals a little bit and say “Okay, this is basically 99% like a normal day, whatever”. It would take something genuinely unpredictable – like being thrown on an unexplored continent or something – to give these people the same feeling of surprise and unpredictability.
This model also predicts autistic people’s strengths. We know that polygenic risk for autism is positively associated with IQ. This would make sense if the central feature of autism was a sort of increased mental precision. It would also help explain why autistic people seem to excel in high-need-for-precision areas like mathematics and computer programming.
11. Schizophrenia. Converging lines of research suggest this also involves weak priors, apparently at a different level to autism and with different results after various compensatory mechanisms have had their chance to kick in. One especially interesting study asked neurotypicals and schizophrenics to follow a moving light, much like the airplane video in Part III above. When the light moved in a predictable pattern, the neurotypicals were much better at tracking it; when it was a deliberately perverse video specifically designed to frustrate expectations, the schizophrenics actually did better. This suggests that neurotypicals were guided by correct top-down priors about where the light would be going; schizophrenics had very weak priors and so weren’t really guided very well, but also didn’t screw up when the light did something unpredictable. Schizophrenics are also famous for not being fooled by the “hollow mask” (below) and other illusions where top-down predictions falsely constrain bottom-up evidence. My guess is they’d be more likely to see both ‘the’s in the “PARIS IN THE THE SPRINGTIME” image above.

The exact route from this sort of thing to schizophrenia is really complicated, and anyone interested should check out Section 2.12 and the whole of Chapter 7 from the book. But the basic story is that it creates waves of anomalous prediction error and surprisal, leading to the so-called “delusions of significance” where schizophrenics believe that eg the fact that someone is wearing a hat is some sort of incredibly important cosmic message. Schizophrenics’ brains try to produce hypotheses that explain all of these prediction errors and reduce surprise – which is impossible, because the prediction errors are random. This results in incredibly weird hypotheses, and eventually in schizophrenic brains being willing to ignore the bottom-up stream entirely – hence hallucinations.
All this is treated with antipsychotics, which antagonize dopamine, which – remember – represents confidence level. So basically the medication is telling the brain “YOU CAN IGNORE ALL THIS PREDICTION ERROR, EVERYTHING YOU’RE PERCEIVING IS TOTALLY GARBAGE SPURIOUS DATA” – which turns out to be exactly the message it needs to hear.
An interesting corollary of all this – because all of schizophrenics’ predictive models are so screwy, they lose the ability to use the “adjust away the consequences of your own actions” hack discussed in Part 5 of this section. That means their own actions don’t get predicted out, and seem like the actions of a foreign agent. This is why they get so-called “delusions of agency”, like “the government beamed that thought into my brain” or “aliens caused my arm to move just now”. And in case you were wondering – yes, schizophrenics can tickle themselves.
12. Everything else. I can’t possibly do justice to the whole of Surfing Uncertainty, which includes sections in which it provides lucid and compelling PP-based explanations of hallucinations, binocular rivalry, conflict escalation, and various optical illusions. More speculatively, I can think of really interesting connections to things like phantom limbs, creativity (and its association with certain mental disorders), depression, meditation, etc, etc, etc.
The general rule in psychiatry is: if you think you’ve found a theory that explains everything, diagnose yourself with mania and check yourself into the hospital. Maybe I’m not at that point yet – for example, I don’t think PP does anything to explain what mania itself is. But I’m pretty close.
IV.
This is a really poor book review of Surfing Uncertainty, because I only partly understood it. I’m leaving out a lot of stuff about the motor system, debate over philosophical concepts with names like “enactivism”, descriptions of how neurons form and unform coalitions, and of course a hundred pages of apologia along the lines of “this may not look embodied, but if you squint you’ll see how super-duper embodied it really is!”. As I reread and hopefully come to understand some of this better, it might show up in future posts.
But speaking of philosophical debates, there’s one thing that really struck me about the PP model.
Voodoo psychology suggests that culture and expectation tyrannically shape our perceptions. Taken to an extreme, objective knowledge is impossible, since all our sense-data is filtered through our own bias. Taken to a very far extreme, we get things like What The !@#$ Do We Know?‘s claim that the Native Americans literally couldn’t see Columbus’ ships, because they had no concept of “caravel” and so the percept just failed to register. This sort of thing tends to end by arguing that science was invented by straight white men, and so probably just reflects straight white maleness, and so we should ignore it completely and go frolic in the forest or something.
Predictive processing is sympathetic to all this. It takes all of this stuff like priming and the placebo effect, and it predicts it handily. But it doesn’t give up. It (theoretically) puts it all on a sound mathematical footing, explaining exactly how much our expectations should shape our reality, and in which ways our expectation should shape our reality. I feel like someone armed with predictive processing and a bit of luck should have been able to predict that placebo effect and basic priming would work, but stereotype threat and social priming wouldn’t. Maybe this is total retrodictive cheating. But I feel like it should be possible.
If this is true, it gives us more confidence that our perceptions should correspond – at least a little – to the external world. We can accept that we may be misreading “PARIS IN THE THE SPRINGTIME” while remaining confident that we wouldn’t misread “PARIS IN THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE SPRINGTIME” as containing only one “the”. Top-down processing very occasionally meddles in bottom-up sensation, but (as long as you’re not schizophrenic), it sticks to an advisory role rather than being able to steamroll over arbitrary amounts of reality.
The rationalist project is overcoming bias, and that requires both an admission that bias is possible, and a hope that there’s something other than bias which we can latch onto as a guide. Predictive processing gives us more confidence in both, and helps provide a convincing framework we can use to figure out what’s going on at all levels of cognition.











Brain is a Kalman filter. Got it.
You know, I’ve been thinking along these lines for a while, and I never put that together.
A couple comments:
1. This suggests that people are natural coherentists, in the philosophical sense. (http://www.iep.utm.edu/coherent/) We don’t build up all their beliefs from some kind of totally reliable source, like “all our knowledge is derived from sense experience” or “all our knowledge is derived from a priori reason”; instead we test our tentative beliefs to see if they’re consistent with the other stuff we know, believe, and perceive.
2. There’s something suggestive about generative adversarial networks that seems to be going on here, but they’re clearly not quite the same thing. https://en.wikipedia.org/wiki/Generative_adversarial_networks
3. The model would also predict, more specifically, autistic people’s sensory strengths. There’s a lot of perfect pitch, hyperacute hearing or smell, and so on. Also more ability to detect small details.
4. This really isn’t a model of much more than sensory and motor processing, as you’ve presented it. Symbolic logic, how we define categories, and emotions aren’t in here.
Surfing Uncertainty focused on sensory and motor processing, but I think we probably use the same machinery for cognition and reasoning. Obviously the schizophrenia example is suggestive of this, since that includes some very bizarre reasoning errors. And it seems to me (will need to justify this further later) that autistic people (eg especially some extreme libertarians) relying on their explicit reasoning rather than on popular social consensus is somehow aligned with more bottom-up and less top-down.
I didn’t include it here because it’s my own speculation and not in Surfing Uncertainty, and I’ll need to write a post justifying it further, but it seems to me like emotions (at least happiness and sadness) are probably global filters that increase/decrease confidence levels of predictions.
Wouldn’t emotions be a reaction to certain predictions? If you are surprised by something positive, you feel happy. Surprised by something negative, you feel sad. We know that one’ happiness depends on how happy others around them are, so the brain is constantly changing its standard on what makes it happy.
I’ve always thought depression is several cognitive disorders with the same general symptoms, and after reading this I bet it all ties back to PP. Perhaps some people with depression over-predict that good things will happen and are constantly surprised that they don’t. Or maybe their surprise function doesn’t work at all so positive changes don’t cause happiness, which itself might be a surprise at the meta-conscious-level and exacerbate the depression (“Things in my life are going so well, I can’t understand why I’m depressed”).
An idea I’ve seen elsewhere is that it’s a reaction to too much stress without emotional reward. So your brain is adjusting downwards the predicted value of applying effort until eventually it doesn’t seem worth getting out of bed anymore.
Have an anecdote about a woman with bad depression (eventually properly treated when bipolar 2* was identified): I was with her in a car. We saw that there was a fireworks display and stopped to watch it.
It seemed to me that she enjoyed it, but when it was over, her pleasure just evaporated as though it never happened. Maybe problems with remembering pleasure are related to problems with imagining pleasure?
*Bipolar 2 is a form of bipolar which has really tiny amounts of mania. It is easily mistaken for ordinary depression, but meds for ordinary depression make bipolar 2 depression worse. I’ve never heard a theory for why that happens.
So how does this new finding correspond with your previous attitude on schizophrenia as too much top-down and not enough bottom-up? It seems like now it’s more like “that, and the top-down is screwy”. Or maybe the bottom-up is so screwy that it messes with everything else?
Some sort of horrible cybernetic chain of cause-and-effects relating to the body’s responses to responses to responses to the original problem. It’s spelled out more in the chapters I cite above, although I don’t understand it on an intuitive enough level to be comfortable explaining it.
In statements like “the AMPA glutamatergic system mostly carries the bottom-up stream”, I think I might be confused about how fast neurotransmitters work. Stuff like sensory data is probably generating quite a lot of bits per second, and the signal needs to change pretty quickly so we can react to it. If that signal is getting transmitted through neurons firing electrical impulses, well, that can happen pretty quickly. But if the brain is actually communicating that data to itself by having some cells emit actual molecules of substance, and other cells absorb those molecules — how fast can that happen, really?
Maybe I could kind of round that statement off to: “the AMPA glutamatergic system controls how much faith the brain puts in the bottom-up stream” or something of that nature?
Good point, not sure. But we know acetylcholine and dopamine are involved in motor responses, and those seem to be pretty fast.
Hi. I took a neuroscience class recently. Almost all the synapses in the central nervous system use chemical synapses like this. Direct electrical connections do exist, but they’re less useful for data processing because they allow backflow. Good for synchronizing the network, bad for building logic circuits. Anyway, chemical synapses are very fast. At inter-cellar distances, diffusion of particles is far faster than our intuition suggests. Concentrations equalize across small cavities practically instantly, and the release mechanism for neurotransmitters is very fast. The cleanup is the slow part.
But you might also be making a mistake about the bitrate required here. The processing speed of the brain is achieved through the large number of pathways, large numbers of synapses, branch structures that share the results of decisions with different circuits, and efficient pruning methods that yield effective heuristics with little data. It’s parallel, not serial, so the rate at which neurons can respond isn’t so important. In fact, keeping the response speed of a neuron low can be important. There are some toxins that shorten the “recharge” rate of neurons, causing death by paralysis.
I argued in a few recent blog posts that decisions are just self fulfilling prophecies: you believe that you are going to do something, and this belief causes you to do it. It sounds like this is true on many levels, not only on the general level that I discussed.
As Nate Soares (Rob Bensinger? not sure who to credit for this quote) put it, decisions are for making bad outcomes inconsistent.
What do you mean by “embodied”, contextually?
Embodied cognition is the idea that “processing” of information happens in parts of your body other than your brain, as well as “decisionmaking”. Eg, if you step on something sharp, your foot flinches away from it faster than your brain could actually process and decide to do so, because your nervous system in the foot/leg/spine has patterns for this situation.
There’s also a lot of physical processing, in the sense that the structure of bones and tendons encodes certain ratios into your physical motions.
I strongly recommend Principles of Neural Design, which approaches this topic in a far more bottom-up way, ie information theory, chemistry, and basic electrical signaling, to build up a general picture of how brains do the vitally important thing that they do.
Uh, maybe you can’t tickle yourself…
Edit: am not schizophrenic btw
Any “pronounced schizotypal traits”?
Not really (some “excessive social anxiety” and some “constricted affect” but, well, those are pretty standard for ASDs; otherwise nope).
Is there a list of those traits that doesn’t require a PubMed subscription?
It doesn’t help in this particular case, but for next time: There’s no such thing as a pubmed subscription. Pubmed is a collating service for scientific articles, both public and subscription journals. See the right side where it says “full text links”? Under there you can find links to the source. In this case you would need an elsevier subscription to read it, but in many cases you can find a truly free source. Or, if you have access to a university network, they often have credentials you can use.
I’ve noticed that depression has something to do with lacking the ability to make positive predictions.
This might manifest in different ways– good outcomes can’t be imagined even for small actions, so you get lethargy, or good outcomes can’t be imagined on the large scale so life seems like nothing but pain.
However, this doesn’t explain the pattern of being able to pursue hobbies (low threshold of effort hobbies?) while having trouble taking care of oneself.
A few years ago, on the tail end of an acid trip, I independently hypothesized an idea that was very, very similar to the Motor Behaviour section. I was thinking on the nature of self, and how psychedelics can affect your sense of self. What I ended up concluding, I tried to summarize as “your ‘Self’ is all and only that which you can predict with near-100% certainty”. The idea being that, I can predict with near-certainty at all times what my arm will do, therefore it is part of my self. I cannot, on the other hand, predict that of other people, and so they are separate selves.
I followed up this idea to some cyberpunk futurism. Essentially, if this is the case, then if technology can get reliable enough and give us a fast enough feedback loop on, well, anything, then in a very real sense we might be able to say that it becomes an extension of ourselves.
I find it really interesting that there appears to be a rigourous foundation behind my idle thoughts, and I’m excited to see where this goes. Thanks for writing this up for us, Scott!
Isn’t this basically what they mean when skilled people say “X feels like an extension of myself.” where X is some tool they’re really good with? When they understand their machine enough to expect its behavior as well as their own body, the machine becomes as much a part of the agent as their body is.
One significant thing that this framework leaves out, or at least seems at odds with, is the notion that perception and many other parts of the mind are modular. Not sure how familiar you are with this idea, but Jerry Fodor proposed that significant parts of cognition are informationally encapsulated – they don’t take into account everything the mind knows. Classic examples are visual illusions; even when you become convinced that the two lines in the Muller illusion are identical length (i.e. you measure them), you can’t help but see them as different. Other examples are geometric realignment, maaaaybe some language stuff. And there’s a bunch of reason to think that most of perception is modular (see e.g. this review). This is in contrast to a “central” process, which is not encapsulated, and allows humans to make inferences that seem to incorporate everything they know.
It seems like this theory, at least as you describe it Scott, proposes that nothing is modular. All parts of the brain can use everything that the brain knows; top-down influence is inherent to the process. But that seems at odds with a ton of evidence (see above). Maybe this theory is an attempt to describe a central process, but not modular processes? What are your thoughts?
The book talked about this briefly. Yes, it would be nice to, once you realize that the hollow face illusion is a misperception, correct it and see it accurately. But brute-forcing this would require changing your prior that faces are usually convex, and you might only see the illusion correctly at the cost of seeing real people’s faces on the street as concave sometimes. Your brain pretty rationally decides it isn’t worth it.
I guess this raises the harder problem of why the brain can’t learn to be context sensitive and only see the illusion that way. My guess is to some degree it can (I think I’ve almost trained myself out of the face illusion) and to other degrees it’s just not “worth” it.
Speaking of selective inattention, I first read this as “science was invented by straight white men, and so probably just reflects straight male pattern baldness, and so we should ignore it completely”.
I’m not sure what that means about me. Except the obvious.
Also, of course, my version is just as convincing as their version.
This is exactly what I’m studying in my interaction design program, so thanks for the relevant book review! I’m only a couple chapters into my textbook at the moment, but it’s extremely similar to this. So far it’s mostly centered on Signal Detection Theory, which refers to how humans differentiate between 2 discrete world states: one in which the signal exists, and one in which the signal doesn’t not.
Here’s a Wikipedia article about it
One interesting thing here is that people are good at adjusting their signal detection if the payoffs change (“If you miss a nuke on the Nuke Detection System, the payoff will be horrible and we’ll all die”) but comparatively bad at adjusting when the probabilities change (“The manufacturing equipment is faulty this week so expect a 10% higher chance of faulty products that must be caught by quality control.”) As usual, narratives > probabilities.
Not to get too pop-psychology with you all here, but the “positive thinking” crowd might love this. If it is indeed the case that limbs move in response to your brain’s belief that they will move, perhaps good things happen to people who think good things happen to them because the brain makes all kind of adjustments to cause the expected future to occur.
I’m not saying it’s true, and I mostly really dislike the positive thinking crowd, but it might be worth thinking about a little.
A second thought. If actions are the result of your brain doing things your brain predicts that you will do, we may have an excellent and simple explanation for why habits are hard to break.
Or, in evolutionary terms, an added benefit to any creature that works via obeying its own predictions — it’s a really effective mechanism for that creature to learn habits.
I think the distinction here is between doing things and things happening to you. You are in complete control over your arm, it’s pretty well wired up with control systems. Not so with the part of the world that the positive thinking crowd wants to influence. It makes sense that you can do things by expecting them to happen, because the part of the world doing the thing is under your control. It doesn’t make sense that things will happen to you because you believe they will happen, unless you are already in complete control of whatever part of the world is acting on you.
However, it does make sense that you will *notice* more good things happening to you if you expect good things. They’d be happening anyway if you weren’t expecting them, but the brain would hide them as noise.
I was coming to say something similar. Particularly the kind of positive-thinking visualisations that involve motor skills, like repeatedly visualising yourself doing a perfect dive or a perfect tennis serve. I had thought that was woo-woo, but this prediction model makes it more plausible.
This is something you can see pretty clearly when playing with hypnosis. Basically, you can play games that cause people to anticipate their arm “raising on its own”, and when you succeed their brains will actually raise their arms to match the predictions.
This has some cool implications along the lines of “intentions” and “expectations” being sort of interchangeable, so when you are struggling with something as an anticipation, it’s often helpful to play with the “intention” framing, and when you struggle with something as an “intention” its often helpful to look at it as an anticipation. For example, people have a hard time “intending” to do things that they don’t anticipate will work, but if you can make room for that possibility by removing the pressure to succeed (basically, allowing them to dismiss the alarm), people can often do stuff that they were convinced they “didn’t know how” to do. The first example, and therefore the one that sticks out in my mind, is telling my wife to remind me to turn the oven off when I get into bed. She couldn’t intend to do it because her expectations were being dominated by “I will be asleep, and I cannot remind you to do thing while I am asleep”, but I told her that it’s okay if she fails and to intend to do it anyway. She reminded me even though she was asleep and didn’t even remember the next morning. I use this kind of thing *all the time*.
Goddamnit.
Every time.
The minute I saw the tip of the triangle as I was scrolling I knew what was coming and tried to start reading one word at a time so I wouldn’t miss them.
And I still missed all of them.
Including the one in the damn question asking me if I’d noticed them.
Epistemic status: so, so speculative. The process doesn’t matter if the prediction is testable. Citations are a form of oppression.
I think this suggests an etiology for the connection between transgender identity and autism. Not all gender-atypical people are transgender. Since gender expectations are socialized, then gender-atypical people who process normally might tend to go with a mental model of “society says I’m my birth gender, so whatever” (cis by default). Gender-atypical people who process bottom-heavy might be more likely to notice a persistent mismatch between the social expectations of their gender, and their actual gender-atypical presentation, and choose to transition or ID as nonbinary based on that.
Wait, this doesn’t match up with another transgender experience. The classic phenomenon of transgender identity is gender dysphoria, the feeling of being trapped in the wrong body, persistently having a mental expectation of a different body. Perhaps gender identity is a hyperprior, which is not modified by sensory input. This could explain gender dysphoria as a cognitive dissonance between top-down insisting on one gender and bottom-up insisting on another.
Transgender is bottom-up, transgender is top-down; how to resolve this? While Blanchard’s autogynephilic/homosexual typology of transsexuality fails to capture the lived experiences of transgender people, he did have (possibly flawed) research data to support the presence of distinct etiologies. Perhaps this is a better typology? I can’t say whether they’re non-overlapping, or that there’s only two, but I hope this indicates some areas for potential research.
The “motor behavior as anticipation of perception” idea seems to me to explain a frequent experience among those of us who are scared of height. When looking down from a high point, we do not only feel a fear of falling down, but quite often also an urge to jump. Though this experience is frequent, it feels disturbing and hard to account for. But of course if the brain makes little difference between “anticipating a sensation” and “initiating the motor command to elicit such a sensation”, then imagining a fall should be experienced exactly as wanting to jump.
(Same thing, maybe, for the widespread urge to squeeze cute things: the natural protective thought “This cute baby would be defenseless, would something happen to it” is experienced as “I want to hurt this cute baby”, as one imagines oneself as the agent of the situation anticipated)
Regarding the autism connection…
I am not autistic, but I am definitely over in the corner of personality space closest to autism. As a separate datum, I’ve always been able to reproduce unfamiliar vocal sounds (usually just from foreign languages) easily and well after hearing them once or twice. (This has been to the point where people drastically overestimate my fluency in a language based on the quality of my accent.)
During language study, it became obvious that not everyone is like this. In particular, where a foreign language has a sound that doesn’t quite map to English (the Japanese ‘r’ was notable here), many people will mispronounce the sound, hear a corrected pronunciation, and be unable to distinguish between the two or correct their earlier mistake. For some time I had a hypothesis, supported by some other anecdotes from similarly autistic-adjacent acquaintances, that the semi-autism was related to this, and caused me to successfully parse out differences that were discarded pre-consciously by a neurotypical’s brain. This seems consistent with the PP model’s description of autism.
Is this a thing that’s ever been studied? It certainly seems consistent with hypersensory abilities as mentioned above.
Another data point. I am in a similar place on the spectrum (i.e. not autistic but close) and I have had a similar experience with languages and accents. I can definitely hear sounds in foreign languages which many other native English speakers do not. This helps in the quality of my reproduction of the sounds (if I am singing a song in the language, for example) but I think it actually hurts my level of confidence (which seems to be tied into the amount of progress made in learning to speak) when trying to learn or speak a new language, since I am painfully aware of any error in pronunciation that I make.
I would bet that women tend to view the world from a more top-down perspective and men a more bottom-up perspective. Women tend to be more religious and superstitious which I assume entails prioritizing predictions over sensory data. Men tend to be more conservative which seems to me like bottom-up processing – this is how things are, don’t try and change them (too much surprise = alarms). Evolutionarily this makes sense as well. Men need to be detail-oriented while hunting, while women taking care of the family need to trust what they know about the world first – “women’s instinct.”
It’s also possible this is a whole lot of retroactive analysis. But we already know men and women process things differently and at the very least PP fits right in.
I know you already mentioned it, but I still want to pick on the single sentence of evo-psych.
Why don’t women taking care of the family need to notice new and surprising details, or men hunting need to be able to pick patterns out of noise? Evolutionary just-so stories are just too easy, especially when we’re just playing around with a vague and biased understanding of what life was like for hunter-gatherers.
Also, it might be interesting to see if there’s a gender imbalance in who’s annoyed by tags on clothing.
My mom raised me to think only classless people left tailor’s marks on clothing- the better element has a tailor with the good taste not to put them on, and when poverty forces us to buy mass-produced we cut them off. Legible clothing is worse. Legible clothing with an alligator or some mass-production tailor’s name on the front is the lowest. I don’t bother, but there you are.
Being more conservative strikes me as top-down processing more than anything – “let’s assume that that jaguar is going to behave like jaguars usually behave, because if it’s learned to think outside the box I’m probably screwed anyway, and if I pause to second-guess myself every two seconds I’m definitely going to get eaten.”
In fact, I think that just generally, my preconceived notions of gender are the exact opposite of your pre-conceived notions of gender. Which is kind of interesting.
Thanks Scott, I’ve always liked Andy Clark but didn’t know about this big step up!
There are four good blog posts by Andy summarizing his thinking (from 2015 when the book was published). The second one is especially helpful in understanding how much PP focuses on organism relevant action, and how little it aims at “accurate representation” which has historically gotten much more love.
First, Second, Third, Fourth.
On the brain producing dreams – I’ve found that after a year of studying Chinese, I still can’t follow anything that native Chinese speakers say, unless they speak very slowly. But I have dreamt multiple times that people are talking to me in Chinese, even if I have no idea what they’re saying.
I’m very curious whether my subconscious is actually generating something approaching correct Chinese, but unfortunately it seems impossible to test.
>This also explains why you can’t tickle yourself – your body predicts and adjusts away your own actions, leaving only an attenuated version.
Why can you slap yourself and pinch yourself? Why doesn’t my body predict and adjust away that?
Why does it hurt a lot when I deliberately punch a wall, even though the result is completely predictable and predicted? Shouldn’t it hurt *a lot* less than if I punch a wall disguised as something spongy?
This fascinating, largely because it corresponds quite well to a set of beliefs that I have long held. I was an undergraduate philosophy major at one of the few schools in the United States with a program that taught continental philosophy. My interest was always in phenomenology and particularly the French phenomenologist Maurice Mareleu-Ponty, who in 1942 and 1945 published two books, The Structure of Behavior and Phenomenology of Perception.
The Stanford Encycopedia of Philosophy says this about The Structure of Behavior:
Much of Merleau-Ponty’s work grows out of gestalt psychology and in Phenomenology of Perception, he argues that the phenomenon, the interaction of subject and object, is the basic unit of perception and that attempting to separate the whole and posit a philosophy from purely the subjective or the objective experience would always be incomplete. That jives quite well with this:
I find this interesting to compare to an old post by someone relevant to this forum 🙂 http://lesswrong.com/lw/e25/bayes_for_schizophrenics_reasoning_in_delusional/
Both make Bayesian models out of delusion, but it seems the new model is dramatically different here in which part is breaking down–though I suppose we’re not talking about precisely the same delusions. What say you about that piece, Scott? Do you think its theory fits in this framework?
I dunno, even after seeing your spoiler, the left picture being a dalmatian makes about as much sense to me as this. The cow was instantly obvious, however, not even a moment of blur.
EDIT: Hah. Right after writing this, I looked at the spoiler for I think the fourth time, and this time I paid attention to the word “drinking”, and only THAT resolved the picture for me, including the non-dalmatian parts of it. As long as I didn’t understand the rest of the picture, I couldn’t see the dog either.
> how many times have I repeated the the word “the” in this paragraph alone without you noticing?
Joke’s on you. I’m always on the the lookout for these sorts of shenanigans when reading about optical illusions.
The part about the brain achieving motor functions by predicting what it would seem like if the motion was already done reminds me of a party trick. Get a string with a bead or something heavy on one end. A necklace might work. Hold it between thumb and index finger with so the heavy part is at the bottom. Look at it, and imagine really hard that the bead is swinging sideways, while trying not to move your hands. You will notice the bead starts moving with the power of your mind!. Similarly, you can stop the movement by imagining it swinging back and forth.
You should do a study!
Are there any artificial agents or robots implementing some of these ideas?
It’s easy to explain some things and think you’ve found The Explanation. But if you imagine building a brain in your garage, it becomes easier to see where the explanation is only partial, and where some boundaries are, beyond which we are very ignorant.
Take the structure of the visual cortex. How come everyone has such similar visual cortices? Well, it turns out that a bunch of complicated genes are active in a complicated spatial structure in your head. Your brain changes as it learns, and does a lot of learning of how to see, but it learns how to see in a particular guided way, especially early in life (here is a pretty paper showing gene expression in mouse visual cortices at 0, 14, 28, and 60 days), that ensures that you end up with a fairly standard visual cortex.
So if you wanted to build a visual cortex in your garage, it’s not enough to just know that it should have hierarchy and try to process the data and make predictions that have low surprise factor, you might have to have a bunch of structure imposed on it from the start that has something to do with what the sensory processing task is like (and machine learning research seems to bear this out). Similar degrees of structure is probably also imposed on other parts of the brain we understand less well – maybe trying to make higher reasoning without understanding this structure would be like trying to use a completely unstructured neural net for vision.
But back to the visual cortex – how well does predictive processing pin down how much feed-forward versus feedback signalling you should see, and how does this compare to what we actually see? I’m going to guess that it explains a lot, but that those design principles alone aren’t enough to let you build a visual cortex in your garage. Large chunks of our (well, mammals’) early visual processing are mostly feed-forward, taking in sense data and doing some quasi-fixed computation to it, with feedback signals coming down from higher levels of abstraction on a timescale much longer than a single feedforward step. This is in contrast to the predictive processing picture of brain function, where your brain is constantly predicting individual neuron activation levels, even at the lowest levels of abstraction.
This makes sense – many functions of your visual cortex, just like artificial neural nets for recognizing images, can work pretty well with only feed-forward processing, and predicting everything all the time seems like a huge amount of work, so your brain can and should save resources by not constantly predicting everything. So this implies that to build a visual cortex, we need some understanding that tells us how much prediction to do, and where.
In addition, there are multiple types of feedback we might be interested in (cool paper about feedback in the visual cortex). The simplest type of feedback follows the same connections and weights as feedforward reasoning. For example, the “cat” concept gets activated in my brain, so I am (theoretically, when applying this kind of feedback) primed to perceive cat-associated high-level percepts like meows and fur-texture, which filters down into predicting certain correlations among sensory neurons, as well as certain average properties like color. But you can also have feedback that takes a “loop” rather than just inverting perception. For example, if it’s bright out, a part of my visual cortex might take on a certain “bright” state, which decreases the activity of the low-level neurons so that their activation stays more similar in bright versus dim conditions. What sets this kind of feedback apart is that the additional connections leading back to the low-level neurons both shortcut the full tree of associations with “brightness,” and encode a specific function that the feedback performs.
In the unrealistically-pure predictive coding model, prediction and perception share the same neurons. Predictions only flow down while raw perception only flows up, your brain can sort of try to make them meet in the middle, and where the predictions don’t fit the perceptions you update both the prediction and the perception software simultaneously. But if the brain has more types of feedback, and even the first type of feedback isn’t applied constantly and uniformly, then you need rules for how how to learn where feedback is necessary, rules for learning more complicated feedback loops and integrating them into prediction, and rules for updating prediction and perception that work even when you’ve got all this variation and complication.
And maybe a slightly different set of neural learning rules works best for different functions and different stages of learning, controlled by spatiotemporal patterns of gene expression in the brain. But maybe then, once you’ve got that figured out, maybe then you have The Explanation that can get a human brain from a chimp brain just by scaling up the neocortex (but specifically the neocortex).
The dalmatian spoiler picture looks weird to me: it totally misses the left hind leg.
So are schizophrenic people like over-fitting machine learning classifiers?
This post felt like the best insight porn I’ve read here in a long while, perhaps ever. The motor system part especially tickled my fancy as it made me think about mental cues used in barbell movements. The most effective verbal form of communicating the right kind of mental cue seems to descriptions of the consequences of movements instead literally describing the movements. Like “spread the floor” for squatting without collapsing knees and “bend the bar” for bench pressing with the elbows tucked. The subtle “push the floor away instead of pulling the bar up” for deadlifting doesn’t make sense initially as it seems like a useless tautology but does seem to make a difference in making it easier to keep a straight spine when you’re close to not having enough strength to complete the lift.
I can now picture how the prediction of “the barbell moves up” approaches an inevitable contradiction to the prediction “maintain posture of limbs like this” when certain changes in posture would improve leverages and make it easier to complete the movement (but come at the cost of injury risk). The sensation of your body is screaming at you that this shit is seriously heavy and you need some serious effort or no movement is going to happen. So there’s a battle between two motor programs where both are trying to have their predictions fulfilled and it’s almost but not quite impossible to accomplish both. Maintaining the correct balance in the strength of each prediction is required; you need to really really strongly intend to lift the weight because a maximal effort lift is really damn heavy but at the same time you need to intend to maintain good form even more strongly or that form is sacrificed.
What about transgenderism? I’ve recently came to accept myself as trans, due to undeniable subjective evidence. But I still find the idea frigging weird. If I put on a woman’s coat, if I act in the way I imagine to be “womanly”, or when I’m occasionally read as a woman by other people, I feel instant and total relief of a misery that haunted me for thirty years without me grasping why (I thought it was just part of my personality, a core feature of my identity; I’m “just a gloomy person”, or “congenitally depressed”; I didn’t know I could just go and feel happiness, or emotions in general, and I certainly didn’t expect this much psychological relief merely from the prospect of not crossplaying “man” anymore). And yet, this makes no frigging sense. What counts as “womanly” clothing, gestures, or social treatment is obviously arbitrary, and part of an oppressive, senseless caste system. I am indeed philosophically opposed to the idea that you can define “woman” by those things, and I stand by the validity and importance of challenging gender norms. In transhumanist luxury queer space communist utopia, everyone will just wear whatever clothing they want, right? And yet, in this present, gendered society, I know, from direct experience, that I get immense psychological relief from embodying the arbitrary social construct of “feminine”, and immense misery from its counterpart; but I don’t know why.
What if, for whatever reason, my brain just latched, early on and irrevocably, on “myself” as (the large cluster of arbitrary signs that construct the social idea of) “female”, and is constantly annoyed when sense data doesn’t match this prediction? So whatever stupid tokens my society declares to be “feminine” are added to the top-down model and contrasted with sensory data, and end up influencing dysphoria or its relief.
(For the record: I’m blind to the spinning-mask illusion in the GIF above, and apparently this is correlated with transness. I do see the illusion in other versions of it, but not this particular one. When the picture was posted earlier in this blog, I said that I was questioning, but probably not trans (because how can a hairy gorilla like me not be a man?); I now believe that the benefits of transition far outweigh the costs in my case, and even if I don’t have trans midichlorians or whatever, that’s as good a definition of “trans” as any (thanks Ozy).)
For a less technical, more popular review of this subject, take a look at: “Making up the mind” by Chris Frith.
This is really interesting. As a practicing tulpamancer, this matches my experience with trying to force the top-down assumptions to do a hard override on the bottom-up data. I’m unusually bad at it, as in “after working my ass on it for years, I can reach about the same results as all my friends can in a month, and I’m pretty sure it’s not just a reporting issue”. I am also having trouble reading the jumbled-up word mess (it’s possible, but not easy), can’t recognize the plane in the video (all I see is blinking lights moving in random directions with no rhyme or reason), don’t have any problems at all with the Stroop effect task, and am unusually bad at recognizing spoken words despite having (measurably) very acute hearing.
Could it be that my top-down processes are just not strong or confident enough, and even a pretty weak bottom-up data makes them reconsider all their assumptions? Does my brain have a particularly strong affinity for the Virtue of Lightness, to the point when the lightest push of evidence can overcome my priors?
Heh! When I was reading through the article, I was actually thinking, “say, wouldn’t this go a long way towards explaining autism?” And then I got to the section where it turns out to explain autism. How’s that for prediction?
Seriously, speaking as an autistic person, I’d say that “being in a near-constant state of surprisal” sounds a lot like how I experience my condition. Very interesting.
Data point on the dalmation: Last night, I couldn’t make it out at all, even though I’ve seen it before, possibly in Goedel, Escher, Bach. I was wondering whether it was a low-quality copy. This morning, the dog popped out immediately.
In the second paragraph I read “surprisal” as “surprise!” the first time. Then I did two more rereads trying to understand what kind of weird meta-joke was saying that “surprise is a gratuitiously-technical neuroscience term for surprise” before finally reading it correctly. So my brain helpfully exemplified exactly what the paragraphs are talking about – taking first the option of the first paragraph to rewrite “al” subconsciously as the expected “e!”, then stopping on its tracks when that lead to an unavoidable higher level conflict.