Translating Predictive Coding Into Perceptual Control

Posted on March 20, 2019 by Scott Alexander

Wired wrote a good article about Karl Friston, the neuroscientist whose works I’ve puzzled over here before. Raviv writes:

Friston’s free energy principle says that all life…is driven by the same universal imperative…to act in ways that reduce the gulf between your expectations and your sensory inputs. Or, in Fristonian terms, it is to minimize free energy.

Put this way, it’s clearly just perceptual control theory. Powers describes the same insight like this:

[Action] is the difference between some condition of the situation as the subject sees it, and what we might call a reference condition, as he understands it.

I’d previously noticed that these theories had some weird similarities. But I want to go further and say they’re fundamentally the same paradigm. I don’t want to deny that the two theories have developed differently, and I especially don’t want to deny that free energy/predictive coding has done great work building in a lot of Bayesian math that perceptual control theory can’t match. But the foundations are the same.

Why is this of more than historical interest? Because some people (often including me) find free energy/predictive coding very difficult to understand, but find perceptual control theory intuitive. If these are basically the same, then someone who wants to understand free energy can learn perceptual control theory and then a glossary of which concepts match to each other, and save themselves the grief of trying to learn free energy/predictive coding just by reading Friston directly.

So here is my glossary:

FE/PC: prediction, expectation
PCT: set point, reference level

And…

FE/PC: prediction error, free energy
PCT: deviation from set point

So for example, suppose it’s freezing cold out, and this makes you unhappy, and so you try to go inside to get warm. FE/PC would describe this as “You naturally predict that you will be a comfortable temperature, so the cold registers as strong prediction error, so in order to minimize prediction error you go inside and get warm.” PCT would say “Your temperature set point is fixed at ‘comfortable’, the cold marks a wide deviation from your temperature set point, so in order to get closer to your set point, you go inside”.

The PCT version makes more sense to me here because the phrase “you naturally predict that you will be a comfortable temperature” doesn’t match any reasonable meaning of “predict”. If I go outside in Antarctica, I am definitely predicting I will be uncomfortably cold. FE/PC obviously means to distinguish between a sort of unconscious neural-level “prediction” and a conscious rational one, but these kinds of vocabulary choices are why it’s so hard to understand. PCT uses the much more intuitive term “set point” and makes the whole situation clearer.

FE/PC: surprise
PCT: deviation from set point

FE/PC says that “the fundamental drive behind all behavior is to minimize surprise”. This leads to questions like “What if I feel like one of my drives is hunger?” and answers like “Well, you must be predicting you would eat 2000 calories per day, so when you don’t eat that much, you’re surprised, and in order to avoid that surprise, you feel like you should eat.”

PCT frames the same issue as “You have a set point saying how many calories you should eat each day. Right now it’s set at 2000. If you don’t eat all day, you’re below your calorie set point, that registers as bad, and so you try to eat in order to minimize that deviation.”

And suppose we give you olanzapine, a drug known for making people ravenously hungry. The FE/PCist would say “Olanzapine has made you predict you will eat more, which makes you even more surprised that you haven’t eaten”. The PCTist would say “Olanzapine has raised your calorie set point, which means not eating is an even bigger deviation.”

Again, they’re the same system, but the PCT vocabulary sounds sensible whereas the FE/PC vocabulary is confusing.

FE/PC: Active inference
PCT: Behavior as control of perception

FE/PC talks about active inference, where “the stimulus does not determine the response, the response determines the stimulus” and “We sample the world to ensure our predictions become a self-fulfilling prophecy.”. If this doesn’t make a lot of sense to you, you should read this tutorial, in order to recalibrate your ideas of how little sense things can make.

PCT talks about behavior being the control of perception. For example, suppose you are standing on the sidewalk, facing the road parallel to the sidewalk, watching a car zoom down that road. At first, the car is directly in front of you. As the car keeps zooming, you turn your head slightly right in order to keep your eyes on the car, then further to the right as the car gets even further away. Your actions are an attempt to “control perception”, ie keep your picture fixed at “there is a car right in the middle of my visual field”.

Or to give another example, when you’re driving down the highway, you want to maintain some distance between yourself and the car in front of you (the set point/reference interval, let’s say 50 feet). You don’t have objective yardstick-style access to this distance, but you have your perception of what it is. Whenever the distance becomes less than 50 feet, you slow down; whenever it becomes more than 50 feet, you speed up. So behavior (how hard you’re pressing the gas pedal) is an attempt to control perception (how far away from the other car you are).

FE/PC: The dark room problem
PCT: [isn’t confused enough to ever even have to think about this situation]

The “dark room problem” is a paradox on free energy/predictive coding formulations: if you’re trying to minimize surprise / maximize the accuracy of your predictions, why not just lie motionless in a dark room forever? After all, you’ll never notice anything surprising there, and as long as you predict “it will be dark and quiet”, your predictions will always come true. The main proposed solution is to claim you have some built-in predictions (of eg light, social interaction, activity levels), and the dark room will violate those.

PCT never runs into this situation. You have set points for things like social interaction, activity levels, food, sex, etc, that are greater than zero. In the process of pursuing them, you have to get out of bed and leave your room. There is no advantage to lying motionless in a dark room forever.

If the PCT formulation has all these advantages, how come everyone uses the FE/PC formulation instead?

I think this is because FE/PC grew out of an account of world-modeling: how do we interpret and cluster sensations? How do we form or discard beliefs about the world? How do we decide what to pay attention to? Here, words like “prediction”, “expectation”, and “surprise” make perfect sense. Once this whole paradigm and vocabulary was discovered, scientists realized that it also explained movement, motivation, and desire. They carried the same terminology and approach over to that field, even though now the vocabulary was actively misleading.

Powers was trying to explain movement, motivation, and desire, and came up with vocabulary that worked great for that. He does get into world-modeling, learning, and belief a little bit, but I was less able to understand what he was doing there, and so can’t confirm whether it’s the same as FE/PC or not. Whether or not he did it himself, it should be possible to construct a PCT look at world-modeling. But it would probably be as ugly and cumbersome as the FE/PC account of motivation.

I think the right move is probably to keep all the FE/PC terminology that we already have, but teach the PCT terminology along with it as a learning aid so people don’t get confused.

This entry was posted in Uncategorized and tagged psychiatry, psychology. Bookmark the permalink.

103 Responses to Translating Predictive Coding Into Perceptual Control

Reverse order

toastengineer says:

March 20, 2019 at 10:11 pm

I don’t pretend to actually understand PCT or neural networks or anything, but from what I do understand I think it’s important to realize that the idea of “predicting” that you will eat 2,000 calories a day is only a “prediction” because we’re hacking a giant prediction engine to work as an animal brain. The neuron that represents “I predict that the entity that is me” has its inputs disconnected from the normal prediction machinery and is instead wired up to the mechanism that actually detects if your body doesn’t have enough food in it, similarly to how the neuron that represents “I will flex this muscle in my leg” has its output actually wired up to that muscle in your leg.

I think. This is just me trying to grok the theory.
- Scott Alexander says:
  
  March 20, 2019 at 10:40 pm
  
  “is only a “prediction” because we’re hacking a giant prediction engine to work as an animal brain”
  
  I’m not sure I understand this – evolutionarily, we started with the animal brain, and only started doing predictions once we got much smarter.
  - deciusbrutus says:
    
    March 20, 2019 at 11:41 pm
    
    That’s an interesting place to start.
    
    I don’t know how to find what the early nervous systems looked like 600MYA, because everything currently alive has evolved since then.
    
    The ‘prediction engine’ capability is, in its niche, worth the added cost- or else it never would have evolved. But once it evolved, it likely made some other capability irrelevant or redundant, and that capability would likely atrophy.
    
    The general-purpose prediction engine it either developed independently one or more times, or exists in a shared ancestor of every animal that exhibits general-purpose predictive behavior. That arguably includes humans, dolphins, ravens, and octopi.
    
    What I’m saying is that there’s been enough time for the part of the brain that the predictive engine made irrelevant to stop doing the thing it used to do.
  - Enkidum says:
    
    March 21, 2019 at 5:14 am
    
    If you’re talking about predictions in the sense I thought you were using them in this article, then no, they’re pretty much shared among most animals (or at least that’s the assumption). I.e. efferent copies of motor movements, etc.
  - Ezra says:
    
    March 21, 2019 at 8:20 am
    
    evolutionarily, we started with the animal brain, and only started doing predictions once we got much smarter.
    
    Consider this alternate account, which I’ve gleaned from what little I understand of predictive coding:
    
    Imagine beginning with a small invertebrate with a very simple nervous system. It doesn’t do any especially complex calculations; it’s just wired with a response for every stimulus, and to the extent that these nerves cross, it’s just to prevent trying too many things at once.
    
    (My mental model here is a little fellow who goes “if I smell food right in front of me, then eat it; otherwise if I smell food right nearby, then turn right; otherwise move and try again.”)
    
    This little guy can make better moves (and so be more energy-efficient and outcompete similar creatures) if he can predict a short chain of actions (or equivalently pre hoc, of stimuli) leading to food. From a general sensory input he might even find it convenient to hallucinate a more detailed series of sensory impressions, to prompt the exact series of movements required.
    
    Then on that predictively (or equivalently perceptually) coded framework, his descendants can build concepts and models.
  - caethan says:
    
    March 21, 2019 at 8:29 am
    
    As I said last time this got brought up, Friston explicitly claims his model is applicable to things without brains, like cells. Hell, I could in broad strokes describe a thermostat in Fristonian terms. Now, I still haven’t gotten a straight answer from his advocates whether I could describe an arbitrarily designed thermostat precisely in Fristonian terms, but let’s suppose you can, that it’s fully generalizable. We build the full free energy function over the internal states of the thermostat and the external states of the temperature. Now, this certainly does let you predict – sort of – how the thermostat would act. You can look at the fully defined function and say “well, at F(furnace off, temperature 20 degrees) the local gradient is down in such and such direction, so “surprisal” is minimized by switching the furnace on”.
    
    What this doesn’t give you, though, is any information about how the thermostat works. You can’t tell whether it’s using a thermocouple or a bimetallic strip. Similarly, let’s suppose you can describe some human action using Fristonian terms. Does that give you some deep insight into the structure of the human brain?
    
    Fuck no.
    - VivaLaPanda says:
      
      March 21, 2019 at 11:00 am
      
      Knowing the goal of a system is really useful in analyzing it. If you show me a thermostat/heater system and I don’t know what it’s trying to do (keep the room at temperature X), it’ll be hard to figure out the function of the individual pieces. If the brain does work via these theories, it helps us understand how the parts of the brain fit together and analyze them.
      - caethan says:
        
        March 21, 2019 at 2:18 pm
        
        Explain to me what Fristonian free energy provides when explaining the behavior of a system over and above the actual inputs and outputs.
        
        Commenters here repeatedly claim that this model means that neurons somehow encode specific predictions or that the brain is a prediction engine or that the brain specifically uses “free energy” in determining action. Even on the model’s own terms, this is bullshit.
      - Enkidum says:
        
        March 21, 2019 at 7:04 pm
        
        I’m not a Fristonian, but I work in a Friston-sympathetic neuroscience lab (although, caveat, my formal training is in cognitive psychology, not neuroscience as such).
        
        I can’t tell you what his model specifically says, because I haven’t read more than a few of his abstracts. But… are you suggesting that neurons don’t encode specific predictions? I mean, in the realm of movement, don’t we have several fairly well-understood systems where precisely that occurs – i.e. efferent copies? I’m not sure if we know the wiring down to the single-neuron levels, would have to dig out old textbooks to check, but I know that we have a pretty good idea of the way efferent copies are used in movement guidance. Is this disputed by many people?
      - warrenmansell says:
        
        March 21, 2019 at 11:51 pm
        
        Not many people challenge the reafference principle, but Powers does:
        http://www.livingcontrolsystems.com/intro_papers/Reafference_principle.pdf
        That’s not to say Powers says it never occurs, just that a negative feedback control of perception model is a more parsimonious account of most of the experimental findings. It may be specific to sudden eye movements and very quickly stabilised by perceptual control, if it does occur. It seems to exist to attempt to explain why our world appears to be stable within conscious awareness when our eyes move, rather then being necessary for optimal performance on any specific task.
      - Enkidum says:
        
        March 22, 2019 at 5:07 am
        
        @warrenmansell – thanks, will check this out.
    - heXdot says:
      
      March 21, 2019 at 3:02 pm
      
      The theory centers around the nervous system (particular the brain) as a biological neural network. That is the ‘black box’ between the inputs and outputs. So at this stage what people can do is compare machine learning models and neuroscience findings. The brain is in many regards still a black box but we do know a lot about what neurons are and how they work.
      It’s ironic that by trying to remodel neural networks we again arrived at a similar black box problem, since it is very hard to comprehend the “knowledge” models coded in the structure of the artificial neural networks (also interesting to compare that to neuroscience theories about how human memory works; or what it means to be “good” at something practical like playing an instrument).
      Someone in the comments on an earlier Predictive Coding blog entry here referred to the ‘Information Bottleneck Method’ which is a interesting inside in this regard.
    - A. Korba says:
      
      March 21, 2019 at 3:25 pm
      
      Friston explicitly claims his model is applicable to things without brains, like cells. […] Now, I still haven’t gotten a straight answer from his advocates whether I could describe an arbitrarily designed thermostat precisely in Fristonian terms, but let’s suppose you can, that it’s fully generalizable.
      
      The difference between a cell and a thermostat is that the thermostat does not need to maintain itself in the face of external perturbations. As such you’re correct, using the Free Energy formalism doesn’t get you anything in this case.
      - caethan says:
        
        March 21, 2019 at 4:26 pm
        
        All right, I’ll give you the same example I gave the last Fristonian: Describe bacterial chemotaxic behavior in terms of the “free energy” formalism and explain what the formalism gives me that the biochemical understanding of the behavior doesn’t.
        
        As a refresher, since none of the Fristonians I’ve talked to before have known a lick of actual biology:
        
        1) There’s a bacterium with flagella that can be in one of two states. In state 1, they all rotate counter-clockwise, the flagella form a bundle, and the bundle propels the bacterium forward. In state 2, one or more rotate clockwise, which disrupts the bundle, randomly orienting the bacterium.
        2) Chemical attractants (e.g., food) can suppress clockwise rotation of the flagella in a dose-dependent manner.
        
        The combined result is that the bacterium random walks, with the length of the walk steps being proportional to the local concentration of food, so that it tends to swim up concentration gradients towards higher concentrations of sugar.
        
        So: what is the formalism under the Fristonian model for this behavior, and why should I care about that formalism instead of the biochemical underpinnings of the behavior?
      - A. Korba says:
        
        March 21, 2019 at 5:20 pm
        
        @caethan
        
        To make this bacteria into an FEP agent, we need sensors, actuators, internal state, and external state. This is easily done:
        
        Sensors: the various transmembrane receptors, characterized by a rate of activation
        Actuators: each flagellum, characterized by its rotation CW or CCW
        Internal state: concentrations of relevant metabolites and proteins, e.g. active CheY
        External state: concentrations of relevant molecules in local environment
        
        We see that the value of the sensors depends on the external environment. These sensors affect the internal state of the organism by changing the concentration of active proteins. The actuators state depends on this concentration, switching between CW and CCW based on the proteins binding to the flagellum.
        
        The bacteria has an expectation as to the percent of active chemotaxis proteins, the critical point at which the flagella switch between states. If this value drops below the expectation, (i.e. there is free energy to minimize), the bacteria acts (i.e. stops tumbling and swims) changing the environment (i.e. the local concentration of relevant molecules) until the sensory expectation is met.
        
        explain what the formalism gives me that the biochemical understanding of the behavior doesn’t.
        
        No doubt you, knowing so much more biology than us humble Fristonians, realize that there is no single “biochemical understanding” of an organism and that relevant biochemical processes happen at a variety of spatial and temporal scales. Whether a formalism that has descriptive power across scales is useful or exciting depends entirely on you.
      - warrenmansell says:
        
        March 21, 2019 at 11:26 pm
        
        Here is the PCT model of bacterial chemotaxis, with an empirical study in humans to illustrate the universal principle. Note also that Powers was fully aware of the relevance of his theory for entropy in physics and he does write on it. But he saw that in living organisms it relates fundamentally to control – not prediction.
        https://www.researchgate.net/profile/Richard_Marken/publication/20462846_Random-walk_chemotaxis_Trial_and_error_as_a_control_process/links/5964ee38aca2720a5ccdf2a3/Random-walk-chemotaxis-Trial-and-error-as-a-control-process.pdf
      - Eli says:
        
        March 22, 2019 at 9:11 am
        
        http://arxiv.org/abs/1707.01806 — An active inference implementation of phototaxis
      - anonymousskimmer says:
        
        March 23, 2019 at 8:32 am
        
        @caethan
        
        What it provides is a method of understanding certain things about nature to those people who think along those lines.
        
        http://www.ptypes.com/idiosyncratic.html
        “Gradually it has become clear to me what every great philosophy so far has been: namely, the personal confession of its author and a kind of involuntary and unconscious memoir” – Friedrich Nietzsche.
        
        I hypothesize that the personality theories of personality theorists best describe themselves and those of their own type.
        
        And another Nietzsche quote: “There are horrible people who, instead of solving a problem, tangle it up and make it harder to solve for anyone who wants to deal with it. Whoever does not know how to hit the nail on the head should be asked not to hit it at all.”
    - Winja says:
      
      March 25, 2019 at 12:54 pm
      
      That sounds like high-level, solution-neutral systems engineering work. Sketch out the issue to be solved and ways to solve it without explicitly spelling out how you’re going to actually build the thing, and then work from there to get more specific.
  - Strawman says:
    
    March 21, 2019 at 12:43 pm
    
    I think it depends on the level of abstraction (or ontology?) on which we are talking about prediction.
    As an analogy GPT-2 almost being able to count to ten can be paraphrased as “hacking a giant arithmetic engine hacked to work as a [complicated machine learning thingy] to do arithmetic”. (and which also makes me want to write something like “Eight Signs Clickbait Listicle Writers are the Next Profession Facing Technological Unemployment: If You’re Anything Like Me, Number Four Will Make You Rethink Your Career Choices and also Your Life Choices in General”).
    “Predicting that you will eat 2000 calories a day” in the sense above does indeed sound like “hacking a giant prediction engine to work as an animal brain”, whereas
    consciously applying Bayes’ Rule, or navigating a maze to get to the spot where those careless researchers always forget their cheese, sounds more like “hacking a (giant prediction engine hacked to work as an animal brain) to work as a prediction engine”.
    Or perhaps one should simply say that the ~~mind~~ brain is ~~Buddha~~ a giant prediction engine…
durumu says:

March 20, 2019 at 10:12 pm

I’m not exactly sure what FE/PC brings to the table with regard to the explanations of movement, motivation, and desire. It seems a whole lot more confusing, so why teach it at all? Is it just because it was the first theory to come along so everyone uses the words, or are there some deeper insights that you can really only come to by framing the whole situation in terms of free energy and prediction errors?
- Matthias says:
  
  March 20, 2019 at 10:37 pm
  
  If I understand it right, FE/PC managed to naturally come up with some math out of their theory that worked well enough?
  - vV_Vv says:
    
    March 21, 2019 at 9:37 am
    
    Control theory has a ton of math as well, and it works well enough that people have used it to land spacecrafts on the Moon, what does the math of FE/PC bring to the table?
Ketil says:

March 21, 2019 at 12:04 am

The “dark room problem” is a paradox on free energy/predictive coding formulations: if you’re trying to minimize surprise / maximize the accuracy of your predictions, why not just lie motionless in a dark room forever?

The downside to lying in a dark room, is that when you then feel somebody’s hand on your body, it will be maximally surprising and unpleasant. Continuous sensory input is important to make our surroundings predictable, and the fact that nothing actually happens doesn’t matter – on the contrary, expectation without release was a main element used by Hitchcock to build suspense. Our imaginations are geared to expecting tigers and muggers everywhere, and we need to reassure ourselves there are none – just like you can feel your skin itching more if you cannot confirm there actually aren’t any ants crawling on you.

Come to think of it, there are some people who insist on exactly the dark room situation, patients with ME/CFS. They also seem to suffer from sensory overload, reporting high degrees of pain and unease and fatigue. I’d be interested to hear if the pathology can be/have been explained or framed in a CPT/FE/PC context.
- Hyperfocus says:
  
  March 21, 2019 at 10:28 am
  
  Interestingly, a naive human put into the dark room situation (a child) will *invent* an expectation that is never fulfilled (there’s a monster under my bed!), and this will keep them awake. Add a night light, and they go to sleep.
  
  Perhaps the dark room scenario would play out differently if we were nocturnal?
Akhorahil says:

March 21, 2019 at 2:12 am

Is this really any different from saying “You have preferences which you try to fulfil”?

You can talk about predictions and set points about temperature, but in the end, what happens is surely that your body signals discomfort with the temperature (because of evolutionary reasons) and that, unless you have some stronger reason not to, you respond to this in order to increase your comfort? I mean, what else is there supposed to be here?

The text makes it seem like the proposition “your body temperature is handled through homeostasis” is something new.
- Grek says:
  
  March 21, 2019 at 5:02 am
  
  The exciting insight here is that surprisal, learning, perception, muscle coordination and goal orientation are all homeostatic processes as well.
- sabre51 says:
  
  March 21, 2019 at 5:47 am
  
  Exactly! All this talk about “free energy” and “expectations” just makes things more confusing without adding explanatory power. You made my point much more concisely. 🙂
- toastengineer says:
  
  March 21, 2019 at 9:30 am
  
  I think so, because I think it’s trying to explain exactly how “having preferences” works, which is a neuron that represents a specific prediction (“I will inhale”, “I will move away from the thing that is hurting me”) is being forced to a high/low value by an external controller (the CO2 detector in your brain, the heat-sensitive nerves in your hands) and thus your prediction-making engine becomes a decision-making engine that you can force imperatives in to. I think.
  
  Without an explanation of how the body detects heat, produces heat, and wires the heat detector to the heat producer in order to control it, “homeostasis” is just like “waves” in Guessing the Teachers Password. It sounds like predictive coding is an attempt to explain how the mercury tube inside the thermostat works.
  - caethan says:
    
    March 21, 2019 at 2:21 pm
    
    But “free energy” adds no information about mechanisms either! All it is, at best, is a way to describe the observed behavior of a system — assuming that it can. All this nonsense about how “neurons represent specific predictions” is unsupported speculation!
    - heXdot says:
      
      March 21, 2019 at 3:39 pm
      
      There are some interesting findings like this though:
      
      In Spoken
      Word Recognition, the Future Predicts the Past
      
      “These findings provide evidence that future input determines the perception of earlier speech sounds by maintaining sensory features until they can be integrated with top-down lexical information.”
      
      There are other empirical works too:
      
      Brain potentials during reading reflect word expectancy and semantic association
      
      Predictive neural processing in adult zebrafish depends on shank3b
      
      Dynamic predictive coding by the retina
      
      Selective Activation of the Deep Layers of the Human Primary Visual Cortex by Top-Down Feedback
      
      Less Is More: Expectation Sharpens Representations in the Primary Visual Cortex
    - emiliobumachar says:
      
      March 22, 2019 at 9:58 am
      
      “neurons represent specific predictions”
      “neurons somehow encode specific predictions” (from another of your comments)
      This shouldn’t be surprising at all. Every piece of information available to the brain is somehow encoded by neurons, from the appearance of the people you know to the memory of what you ate last weekend. I’m sure my neurons do encode specific predictions because I have a couple conscious ones, and there’s no other way I’d have them.
      
      I should disclaim that I do not understand FE well enough to accept or reject it, but I take issue that one of your reasons to oppose it seems to be predictions being encoded by neurons.
    - Enkidum says:
      
      March 22, 2019 at 10:36 am
      
      @Emilio
      
      Right. I said much the same above.
      
      I think @caethan’s actual argument is more that prediction is not a fundamental feature of ALL neuronal activity, whereas Friston et al say it is.
      
      Like you, I don’t understand the theory well enough to accept or reject it.
AC Harper says:

March 21, 2019 at 2:27 am

I can’t help feeling that we are (naturally) looking at predictions or reducing ‘free energy’ from a top level viewpoint. However if you look at cognition/sensory processing as a vaguely structured pile of associated neurons passing low level predictions on and receiving error signals back, then the issue of predictions or reducing errors is a low level process, the whole set of which may appear in concious awareness, or more likely subside into background noise.

Using ‘top level’ vocabulary is a distraction, an oversimplification that results in confusion.
entirelyuseless says:

March 21, 2019 at 3:56 am

>The PCT version makes more sense to me here because the phrase “you naturally predict that you will be a comfortable temperature” doesn’t match any reasonable meaning of “predict”. If I go outside in Antarctica, I am definitely predicting I will be uncomfortably cold. FE/PC obviously means to distinguish between a sort of unconscious neural-level “prediction” and a conscious rational one, but these kinds of vocabulary choices are why it’s so hard to understand.

You’re not getting it. *Even consciously and rationally*, you should predict that if you go outside in Antarctica, you will as soon as possible get out of that situation by going back and inside. Because that is what you would do, rather than sitting there and freezing to death. So even consciously and rationally, you are predicting that will be comfortable.

It is just not true that this does not match the normal meaning of predict. It matches it quite well.
- Peter Gerdes says:
  
  March 31, 2019 at 6:42 am
  
  But now your theory is circular. The facts about predictions are supposed to explain (indeed predict) your actions. The reason you go inside and get warm in Antarctica is because of this supposed mismatch with prediction.
  
  But now to derive the conclusion that you will try to stay warm when you visit antartaca I have to assume that you predict that you will be comfortable when you visit and I can only derive that by assuming that you will try and stay warm when you get there. The theory is equally compatible with you both predicting you will be cold and you in fact not trying to get warm.
  
  Moreover, you will still try to get warm even in situations where you predict you will be unsuccessful, eg, if your stranded in the wilderness you’ll try and start a fire with sticks even if you think it’s unlikely to succeed (even though it’s your best shot).
  - warrenmansell says:
    
    April 1, 2019 at 9:01 am
    
    I think you are proving Scott right about the unnecessary complexity of predictive coding by this answer! There is no prediction – a person just varies their actions to counteract disturbances until their perceived temperature matches their desired temperature.
sabre51 says:

March 21, 2019 at 5:44 am

Thanks for posting- this gets to my confusion with the previous articles on Friston. Namely, where is the extra explanatory power of “free energy?” Going inside because of an “expectation” that I should be warm vs. going inside because my body has an instinct for optimal tempurature give the same prediction, plus the latter matches my subjective experience much more closely. So why is there this whole theory around expectations and free energy? Always felt like I was missing something but maybe it is just weird terminology that means the same thing.

Also, the point about “minimizing surprise” doesn’t seem to make sense. Humans are naturally curious and like to explore unknown things, which leads to surprises and new ideas (see: Friston himself).

It seems like Friston is just a smart guy trying to impress people with a new theory, but not actually advancing knowledge. The fact that I don’t fully understand it makes me a little unsure, but new knowledge should make things easier to understand- his theory being so confusing is a very bad sign.
- Freddie deBoer says:
  
  March 21, 2019 at 7:50 am
  
  “new knowledge should make things easier to understand”
  
  If we stuck to that, we’d never have discovered quantum mechanics….
  - eyeballfrog says:
    
    March 21, 2019 at 11:22 am
    
    QM made physics easier to understand because it took previously inexplicable phenomena (blackbody spectrum, atomic orbitals, etc) and made them understandable. The objection to Friston here seems to be that it becomes more confusing without extra explanatory power making up for it.
    - sabre51 says:
      
      March 21, 2019 at 1:02 pm
      
      Precisely. QM is hard to understand (at first), but it allows you to predict things that previous theories could not. That is my question: what problems does the theory solve that were mysteries before? If it is really the same paradigm as Scott said, then it is going against Occam’s razor for no benefit.
      
      I am open to being convinced; if someone knows the answer, I would enjoy hearing it.
  - Eli says:
    
    March 22, 2019 at 9:15 am
    
    Sure. I think the real question is: what does the Free Energy Principle bring to the table that other Bayesian brain or predictive mind accounts (such as rational analysis, pursued in a sister lab to the one I work in, or allostasis, our lab’s account) don’t?
- A. Korba says:
  
  March 21, 2019 at 3:33 pm
  
  The explanatory power of the Free Energy Principle doesn’t exist just in understanding the terms `free energy` and `surprise.` It links the way an agent acts on the environment, the way an environment acts on an agent (sensation), and the internal state of the agent in a way that Bayesian inference over causes of sensation naturally results.
VirgilKurkjian says:

March 21, 2019 at 6:18 am

Friston and others are 100% aware of this. I don’t want to assume that they are being deliberately obtuse, but… god, it really seems that way. They’re familiar with the PCT paradigm, they cite it, but they really refuse to use those terms, even though they’re (as you explain) so much clearer.

In Surfing Uncertainty, Andy Clark cites Powers directly, and quotes Perceptual Control Theory’s tagline:

“Instead of seeing perception as the control of action, it becomes fruitful to think of action as the control of perception [Powers 1973a, Powers et al, 2011]”.

Friston is only slightly more cagey:

“So how can one specify optimal behaviour in terms of prior beliefs? Imagine a (Bayesian) thermostat that infers the ambient temperature through noisy thermoreceptors. … Crucially, both action and perception (estimating the hidden causes of sensory input) are trying to minimise the same thing — roughly speaking, prediction error or the surprise associated with sensations.”
(Friston, Samothrakis, & Montague, 2012).

And this was published in the journal Biological Cybernetics, no less!

A more charitable explanation is that they cite PCT but don’t use those terms because there is some critical addition that FE/PC makes on top of control theory; but if that is the case, I have little idea what that is.
entirelyuseless says:

March 21, 2019 at 6:49 am

First, you are a physical object. You behave in physical ways, like rocks fall.

Second, you are a physical object that behaves in ways that we call living, like eating and reproducing.

Third, you are an animal with sensation. You notice yourself doing the first two kinds of things. You begin to predict that you will keep doing it. There is nothing strange about this idea of prediction: it is totally reasonable to predict that what was happening anyway, will continue to happen.

Fourth, some of your predictions start being off in minor ways. The fact that you are making the predictions, however, is itself part of your behavior, and causes other behaviors: in particular, you not only eat when you were going to eat anyway, but you also eat on a few other occasions, when you happened to expect it, but in fact, you would not have if you had not expected it. This is predictive control: you are now eating *because* you expect yourself to eat.

Fifth, you are an intelligent being. You notice all of the above happening. You predict that it will continue to happen. There is nothing irrational about this, or anything lacking to this sense of prediction: it is totally reasonable to think that everything that has been happening, will continue to happen.

Sixth, step 4 repeats but using your intelligence, which consists in a process where you think you are going to do something, you call this thought “deciding to do it,” and the fact that you think you will do it physically causes you to actually do it.
realitychemist says:

March 21, 2019 at 7:04 am

Does anyone have any good resources to read up on the math of all this? I’m very curious what kind of formalism is at play here. I would expect it to involve a system of differential equations, maybe something a little bit like the Hodgkin-Huxley model of action potentials, but if anyone has any links to good resources on the topic I’d love to look them over.
- nightmartyr says:
  
  March 21, 2019 at 9:06 am
  
  This appears to be a reasonably up-to-date review of the formalisms involved. Definitely more variational calculus & probability theory than classical physics notions of free energy (which after the variational part give differential equations). In Section 6 some discussion of “equations of motion” arise to minimize these functionals, but this seems more like an aside rather than the driving part of the theory. I don’t know much about this particular field, but the mathematics used is mostly standard so happy to provide references or try and explain any parts which seem unclear.
  
  https://www.sciencedirect.com/science/article/pii/S0022249617300962
  
  arXiv version:
  
  https://arxiv.org/abs/1705.09156
- nightmartyr says:
  
  March 21, 2019 at 9:18 am
  
  Also this may not be exactly PCT as Scott has described it, but one can find a few papers discussing different mathematical formalisms for these sorts of verbal/intuitive models. This seems like a reasonably older review of some of them:
  
  https://www.tandfonline.com/doi/abs/10.1207/S15326969ECO1202_02?casa_token=ivWZPVVjscIAAAAA:9R5_4sUTfo770hvPVCb0R5V9CuRio4xX0yatgvNkhpU8GmtxFQiBO58O5e7Rqscs2igXtMCRSb4
- heXdot says:
  
  March 21, 2019 at 4:18 pm
  
  also this: Derivation of the Variational Bayes Equations
warrenmansell says:

March 21, 2019 at 9:04 am

Hi everyone, I approve wholeheartedly of Scott’s analysis and it reflects my own thinking for a long time. The recognition of the simple elegance, scientific validity and practical utility of PCT is long overdue. Importantly, the free energy principle has done very little to alter our understanding of mental health problems whereas PCT has set the foundations of a transformative, transdiagnostic approach. Please see https://youtu.be/m5_AKjDdqaU
- theroomgotheavy says:
  
  March 21, 2019 at 10:06 am
  
  Wrong video!
  - eyeballfrog says:
    
    March 21, 2019 at 11:23 am
    
    Or was it a test of your predictive processing?
dfv says:

March 21, 2019 at 9:48 am

I understand the free energy principle. It means what it says it does:

Friston’s free energy principle says that all life…is driven by the same universal imperative…to act in ways that reduce the gulf between your expectations and your sensory inputs. Or, in Fristonian terms, it is to minimize free energy.

The problem people have with it is just that it’s complete nonsense. It’s saying that not only all human, but all biological life forms, worms and smallpox included, center their lives around generating valid Bayesian predictions. The primary goal of human action is not to “maximize their ability to predict sensory input”. That doesn’t make sense on any level. It’s just wrong. It’s wrong in the way that encourages smart people to think about it over and over, because the initial interpretation is so obtusely different from reality, because they think “we’ll maybe I’m just reading it incorrectly, or the statement is just on a higher level than I can understand”, etc. etc.

Here’s the deal: The emperor is bear ass naked. This joke of a proposition was held up by media and the rationalist community as something “only the smart people can understand”, and then had everyone scrambling like literary critics to decipher the “true meaning of free energy theory”. The true meaning of the free energy theory is the definition it’s creator gives it. People do not gain satisfaction from predicting their spouses’ impending death and being correct, so it’s wrong. You don’t need to have an IQ of 150 and a PHD in Quantum Mechanics to understand that the words in this statement are false. The whole “framework” for understanding human action is just this long, sophistrous extrapolation of the idea that – shocker – don’t like entropy and explosions.
- Robert Jones says:
  
  March 21, 2019 at 10:10 am
  
  Certainly the claim that all life is driven by the universal imperative to minimise free energy requires a strong justification, which I don’t see.
  - Eli says:
    
    March 22, 2019 at 9:18 am
    
    A system responding to a stochastic driving signal can be interpreted as computing, by means of its dynamics, an implicit model of the environmental variables. The system’s state retains information about past environmental fluctuations, and a fraction of this information is predictive of future ones. The remaining nonpredictive information reflects model complexity that does not improve predictive power, and thus represents the ineffectiveness of the model. We expose the fundamental equivalence between this model inefficiency and thermodynamic inefficiency, measured by dissipation. Our results hold arbitrarily far from thermodynamic equilibrium and are applicable to a wide range of systems, including biomolecular machines. They highlight a profound connection between the effective use of information and efficient thermodynamic operation: any system constructed to keep memory about its environment and to operate with maximal energetic efficiency has to be predictive.
    
    https://arxiv.org/abs/1203.3271
- vV_Vv says:
  
  March 21, 2019 at 10:17 am
  
  Here’s the deal: The emperor is bear ass naked. This joke of a proposition was held up by media and the rationalist community as something “only the smart people can understand”, and then had everyone scrambling like literary critics to decipher the “true meaning of free energy theory”. The true meaning of the free energy theory is the definition it’s creator gives it. People do not gain satisfaction from predicting my impending death and being correct, so it’s wrong.. The whole “framework” for understanding human action is just this long, sophistrous extrapolation of the idea that – shocker – don’t like entropy and explosions.
  
  I think the appeal is that it uses terms like “free energy” which are borrowed from physics, which has an aura of purity and high status, while PCT derives from control theory, an engineering field that makes you think of thermostats and rockets.
  
  In general there seems to be an industry of physicists venturing into other fields to “formalize” and “explain” them, the latest fad seems to be trying to reduce biological evolution to thermodynamics. The problem is that these “formalizations” don’t seem to add any explanatory power, rather they add confusion.
  - dfv says:
    
    March 21, 2019 at 10:25 am
    
    @vV_Vv
    
    That’s definitely a part of it. The “free energy principle” isn’t psychology, or psychiatry, or EVEN neuropsychology…It’s PHYSICS!, the most sciency discipline of all!!! It has the word “entropy”! Just whisper it to yourself. “Free Energy”. It’s basically thermodynamics. Friston? He’s like Einstein, but for psychiatry.
- caethan says:
  
  March 21, 2019 at 2:31 pm
  
  As far as I can see, Friston’s continually equivocating between two positions:
  
  1) We can define a function that we’ll call “free energy” such that actions observed by creatures always move from states of higher “free energy” to states of lower “free energy”. This is trivially true: let F(x, t) be -t, and tada! It might be true for definitions with more constraints, but I don’t see that it’s worth my time to go through the formalisms given my strong impression that Friston’s a crackpot or a bullshit artist.
  
  2) “free energy” is the telos of an organism. Or organisms “know” free energy and consciously or unconsciously act to minimize it. Or that “free energy” is somehow embedded in the brain. Or all the other bullshit.
  - Lambert says:
    
    March 21, 2019 at 10:23 pm
    
    I think he’s trying to compare to the principle of least action, where classical mechanics gets reformulated in a way that is kind of teleological.
    
    Not sure whether it’s an appropriate comparison or not.
    - warrenmansell says:
      
      March 22, 2019 at 12:04 am
      
      Effective control engineering devices are the embodiment of teleology
spinystellate says:

March 21, 2019 at 10:41 am

I think I have this all figured out: PCT author William Treval Powers may or may not be related to Francis Gary Powers, shot down over The Soviet Union (well known for its control theories) in a U2 spy plane. But one of the greatest hits of the band U2 was “I still haven’t found what I’m looking for”, presumably a sad tale of one man’s inability to bring action in line with perception.
heXdot says:

March 21, 2019 at 11:07 am

1. You might also add Piaget’s theory of cognitive development to the mix! It could be seen as a psychological extension of Friston’s neural concept to the cognitive development of humans and how we ‘learn’ by constructing a ‘mental model’ of the world and by bringing this mental world into an equilibrium with our ‘experience’ aka minimizing free energy.

2. I recently finished my master thesis (in public choice) based on both of these and my argument, that humans are ‘rational’ only in the limitations of this subjective mental model of the world and that they only extend it when forced by unfitting experiences (read free energy) within their utility function. It would be irrational to try to be rational on the bases of our shared reality that lies outside of the own mental world model cause it requires effort to improve your world knowledge (imagine having to find the objective perfect mix of goods based on your subjective utility in a casual groceries shopping trip; you would need days to consider every available option within observable reality). In this way it can be subjectively rational (cost effective) to be objectively irrational. The problem many economic models of human actors have, is that they believe rationality mus be based on our shared reality, instead of the subjective world model of the actor (which following Friston are actually separated by a Markov blanket). This becomes meaningful when most political arguments are about complex cause-effect relationships that can’t be ‘experienced’ (which makes them dependent on trust), are discussed on the internet (which erodes many established trust institutions like politics, science, and everything ‘elite’) and end in voting decision that have no practical costs for single voters (see public choice theory).

3. Neural Networks can help to distinguish between tactile knowledge and codified knowledge too (think the AI that counts like a child). It’s interesting how Kahnemann and Tversky’s ‘Heuristics and Biases Program’ arrived at distinctly that differentiation of ‘fast’ and ‘slow’ thinking.

4. For fun here is the way this all could lead to general intelligence in AI: Build a human like robot. Give her digital sensory organs (think camera for eyes, microphones for ears and so on). Have a deep neural network that tries to predict the inputs of these real time data streams according to Friston’s theory. Give her some basic needs (this might be the hard part; dark room problem) and the ability to “express” herself (think speaker for language, motor functions for actions). Try to socialize the robot similar to how you would do it to a “new born” human. … Profit?
- realitychemist says:
  
  March 21, 2019 at 2:39 pm
  
  I’ve been thinking about (4) for a while… It seems like these kinds of theories about human consciousness could have direct applications on AGI systems and I haven’t seen anyone working with them (although I’m not in the field so I could easily have missed it). One thing I’m not sure about is how safe such a system might be. If its “needs” are satisficing and it follows some kind of least-action principal with regards to disturbing its environment, it might be safe? I’m pretty sure Bostrom discussed the idea of an AI which “wants” to disturb its environment as little as possible while pursuing its goals in Superintelligence, but I don’t remember what his conclusion was about the idea. Probably that it’s not safe for some reason I’ve forgotten, that seemed to be the theme of the book.
  - heXdot says:
    
    March 21, 2019 at 4:00 pm
    
    Better don’t give it the goal to construct paperclips.
    
    If AGI needs a form of ’embodiment’ (also see this) it would also be limited by this body. Since its “brain” would be closer to our digital hard and software it would be easier for it to create a interface and transfer it’s “ghost in the shell”. 🙂
    - warrenmansell says:
      
      March 22, 2019 at 12:06 am
      
      So true. This is why http://www.perceptualrobots.com robots do embodiment first…
    - realitychemist says:
      
      March 22, 2019 at 6:32 am
      
      I am a bit confused by your reply w.r.t. safety. I was talking about an AGI with a satisficing goal (not a maximizing goal like a naive paperclip maximizer) and strong domesticity motivations (i.e. it wants to disturb its environment as little as possible in pursuit of satisfying its goal). It may be unsafe, but if so it’s unsafe in some unintuitive way rather than in the fairly obvious way a paperclip maximizer is unsafe.
      
      Shifting gears away from safety…
      
      Anyway, I think people have worked on this approach under the name of subsumption. Last I heard, such AIs have a lot of trouble learning any amount of language since their “minds” don’t run on a symbolic logic like more traditional AIs. Maybe there have been advances in this area?
dfv says:

March 21, 2019 at 11:08 am

I especially love how we’re now calling it “The Dark Room Problem”. It’s not “the discrepancy between reality and Friston’s posturing”. It’s The Dark Room Problem^TM. Now that we’ve given it “Problem” status, every layman who wants to earn big brain points can explain the theory to other people, and when they inevitably object with “but that’s not what people do”, they nod their heads and say: “Yes, that’s the Dark Room Problem. It’s an open question. one that many, many magic supergenius science men are studying. The Theory is correct, and I, as a 150IQ man of science, understand it completely. It’s just that, there’s this Problem, like the Three Body Problem, this corner case where our theory leads to unexpected outcomes.” I wish I was a tenured psychiatrist so I could just use this to spur my own career.

Me: Guys, I’ve developed an underlying theory for all animal behavior: I call it the “Photon Density Principle”. It says that all life forms seek to maximize the brightness of their surroundings as their primary and total goal.

Bystander: But Dean, doesnt that just say people have an imperative to stand next to light bulbs all day?

Me: Actually, you bring up a good point called the “Bright Light Problem”. Its an ongoing research area. Very cutting edge stuff. You should definitely give me more research money and press coverage so I, galaxy brain supergenius can go “investigate” this problem.
- sabre51 says:
  
  March 21, 2019 at 1:14 pm
  
  +1000 lofl. “many, many magic supergenius science men.”
  
  Wish I could give you a high five for this one.
Strawman says:

March 21, 2019 at 11:42 am

Hopefully I’m just missing some trivial-but-crucial, obvious-in-hindsight piece of background knowledge, but I keep feeling, well, suprised by Friston’s use of free energy as a term for (pretty much, sort of, I guess) prediction error. I’m not sure how to understand either “free” or “energy” in this context, let alone in a way where they conjointly make sense as a good choice of terminology. (I imagine it’s “free” as in “variable”, with a slight chance of “electron”, but probably not “speech” or “beer”, but as for “energy” I’m stumped).
- caethan says:
  
  March 21, 2019 at 2:35 pm
  
  He’s stolen a term from physics he doesn’t understand to make himself sound smart. In the original context, “free” energy is the part of the energy of a system that is available to do work.
  - vV_Vv says:
    
    March 21, 2019 at 5:10 pm
    
    To expand:
    
    The concept of free energy is used in thermodynamics and related fields, in particular chemistry.
    
    For instance, if you burn a piece of coal, the oxygen in the air and the carbon in the coal combine to form carbon dioxide. But the reverse reaction is also physically possible: the carbon dioxide in the air can split into oxygen and carbon. So why do we observe the former but not the latter? Microscopically in fact both reactions occur, but the former is more thermodynamically favorable than the latter: it lowers the Gibbs free energy of the system, therefore it occurs at a higher rate and the piece of coal ends up being consumed by the flames rather than being accreted. The general principle is that systems tend to minimize their free energy over time (subject to certain conditions).
    
    Friston’s took this concept from thermodynamics and tried to adapt it to the behavior of animals and humans: he tries to explain behavior as a tendency of an agent to minimize some sort of cognitive “free energy”, which he tries to connect to information-theoretic (Shannon) entropy in a way that is analogous to how thermodynamic free energy is connected to thermodynamic (Boltzmann) entropy.
    
    The problem is that he created a confusing theory which, AFAIK, makes no useful predictions. Perhaps you could salvage some equations out of it if you want to use them to design some kind of control system, but I haven’t seen it been done in practice. Control systems are mostly designed according to the well-established math of control theory, which deals with set points and so on.
    - warrenmansell says:
      
      March 22, 2019 at 12:10 am
      
      I agree. Powers uses the math of control theory but switches the set points to inside the organism and internally set as the outputs of the next system up in the hierarchy. Hierarchies are present in neuroscience, robotics and FE of course, but not specified in the same way as Powers.
mustacheion says:

March 21, 2019 at 11:52 am

As somebody whose only exposure to Friston / PCT has come from SSC, but who does have a good mind for math and control systems, it has always seemed obvious to me that both paradigms were essentially the same thing, and I was confused how Scott and others could be so confused about Friston but not by PCT. But seeing you lay out the two theories side-by-side like this makes it clear why it was so confusing to so many: the language Friston uses is just extremely poor and confusing. I suspect he chose to use words like ‘surprise’ because he wanted to use words laypeople could understand to help make the theory more easy to understand. But this backfired, because to use the word ‘surprise’ as Friston does requires loading it with tons of extra context-specific meaning that it does not carry in lay-usage and this redefinition actually makes it harder for a person new to the theory to understand what is going on. Better to coin a whole new term and give the reader a chance to learn that new term in its own context than to borrow a common word and redefine it. And I think Friston’s use of the term ‘free energy’ is an attempt to make an analogy to the various thermodynamic potentials that share a similar sounding name (like Gibbs free energy). Pushing that analogy could make sense to a computer scientist trying write neurological simulation software; it is possible that we can make averaging approximations that greatly simplify neurological simulation in the same way we do in thermodynamics. But thermodynamics is such a terribly arcane and unintuitive field (easily the physics domain where I my understanding is weakest) that this analogy really does not help build understanding in most contexts. In fact, I think that most people would interpret ‘free energy’ in a way closer to new-age pseudoscientific magic, which can only confuse matters.

My recommendation on the matter is to recognize Friston as having had a really great insight about how the brain works, but to admit that he did a really poor job of translating his thinking into language, and to abandon his terms altogether and stick with PCT terminology.
- A. Korba says:
  
  March 21, 2019 at 3:08 pm
  
  I suspect he chose to use words like ‘surprise’ because he wanted to use words laypeople could understand to help make the theory more easy to understand.
  
  `Surprisal` is actually an existing term in information theory, it’s the information content of a single sample. An unlikely event corresponds to a higher surprisal.
  
  And I think Friston’s use of the term ‘free energy’ is an attempt to make an analogy to the various thermodynamic potentials that share a similar sounding name (like Gibbs free energy).
  
  As I understand it there’s a direct correlation to free energy in statistical mechanics, though I’m not familiar enough with the latter to make it explicit. I do agree that, like Shannon’s use of entropy, it’s a poor choice of words.
- warrenmansell says:
  
  March 22, 2019 at 12:12 am
  
  I fully agree! Now we need to get these sentiments into Wired and other science outlets rather than Scott’s enlightened backstream of followers!
A. Korba says:

March 21, 2019 at 2:57 pm

You have set points for things like social interaction, activity levels, food, sex, etc, that are greater than zero.

Then: the FE/PC formulation would also have an equivalent expectation for these levels. The dark room problem is only a problem if you strip out necessary expectations for organic life.

If the PCT formulation has all these advantages, how come everyone uses the FE/PC formulation instead?

I’m not familiar with PCT, did it derive a relationship between action, sensation, and Bayesian inference? This is the fundamental insight of Friston’s work.

I’m working on putting Friston’s work in more understandable terms, I will answer questions on it best I can!
- warrenmansell says:
  
  March 21, 2019 at 4:47 pm
  
  Hi, I’m not sure why your using the past tense to ask about PCT. It’s a current theory. See http://www.pctweb.org.
- warrenmansell says:
  
  March 21, 2019 at 4:50 pm
  
  I think it’s worth remembering that Scott’s point is that PCT has the advantage of explanatory elegance and ease of understanding. So why the focus in these comments about the obfuscatory complexity of free energy principle? Just use PCT!
  - A. Korba says:
    
    March 21, 2019 at 5:33 pm
    
    My bad on past tense, just an assumption based on the linked paper being from the 70’s.
    
    I agree that PCT presents similar concepts in a more intuitive way. It’s not clear to me from a cursory look if PCT is as powerful as FEP in terms of its link to learning and Bayesian inference (that is: how to update the parameters of the model based on observations).
    
    (Also I don’t think FEP is necessarily obfuscatory, just that Friston communicates core ideas using highly precise math.)
Gerry Quinn says:

March 21, 2019 at 3:23 pm

I think this is just munging together too different issues. Set-points work great as a model for eating etc., in an already well-defined reality. They are high-level stuff.

Prediction theory works well for me as a conceptual model for how sensory perceptions filter up (and down again) to generate a model of reality with which we interact. But we have abstract monitoring processes feeding off this model, including the set point mechanisms, which also interact with what our livers, stomachs etc. are telling us hormonally.

As for ‘free energy’, that’s just pointless jargon from another sphere that debases what is a very reasonable theory.
- warrenmansell says:
  
  March 21, 2019 at 11:38 pm
  
  I agree. The learning algorithm in PCT is reorganisation. This has been modelled for motor coordination in Powers’ 2008 book which has accompanying visual computer demos. An earlier study demonstrates that reorganisation may be universal to life by showing the parallels between bacterial chemotaxis and human learning.
  https://www.researchgate.net/profile/Richard_Marken/publication/20462846_Random-walk_chemotaxis_Trial_and_error_as_a_control_process/links/5964ee38aca2720a5ccdf2a3/Random-walk-chemotaxis-Trial-and-error-as-a-control-process.pdf
  They said, the development of the ‘input functions’ that extract perceptual variables from the environment have not been modelled to date as far as I am aware.
imoimo says:

March 21, 2019 at 4:28 pm

Because I haven’t seen it anywhere yet, I want to clarify what free energy ACTUALLY means in physics. In statistical thermodynamics (the set of tools describing how large collections of particles behave), there are several “free energies”, with fancy names like Gibbs free energy and Helmholtz free energy. (The wiki page gives more history than I ever could on these: https://en.m.wikipedia.org/wiki/Thermodynamic_free_energy .) In stat therm, free energy is *the quantity a system is trying to minimize*. You use different free energies depending on what parameters you’re holding fixed (like volume or pressure), and what you’re letting change freely. In the simplest example, I believe plain ol energy counts as a free energy, in the right context. So a free energy is by definition whatever a system minimizes.

This jives with Friston’s use of the word, so I think he’s using the term appropriately. However (opinion incoming), I worry he’s just saying something general that’s maybe true or useful but not very insightful. His free energy principle may just be equivalent to: “systems that have many moving parts can usually be modeled as minimizing something”. This is true a surprising amount of the time, and in physics this gets used constantly. For example in quantum computing there’s a concept called quantum annealing where you just map some computational task onto the task “minimize the energy of this carefully constructed quantum system”, then go and physically do the minimization. This literally only works because so many tasks can be phrased as the minimization of something.

In Friston’s defense though, he actually defines the free energy he’s using. According to Observing Ideas (https://observingideas.wordpress.com/2015/02/15/free-energy-principle-a-quick-look-and-some-concerns/) Friston’s free energy is “the upper bound of entropy”. So minimizing it means “keeping entropy low”. But uh… don’t we already know this? Life is the thing that maintains locally low entropy? Or rather, life and engines, but engines produce “work” (physical motion) while life produces internal structure plus work plus whatever else. So Friston seems to me (a casual observer) to be dressing up a well-established idea (life = entropy minimizer) in a fancy new coat. I suspect his theory does this a few times (see Scott’s post above about redressing PCT).

To defend Friston again, the Wired article Scott links to says: “Friston himself tended to take a more measured tone as our talks went on, suggesting only that active inference and its corollaries were quite promising. Several times he conceded that he might just be “talking rubbish.” ” I get the impression that active inference is a concrete, useful tool that is only roughly conceptually inspired by Friston’s free energy principle. So I think Friston is quite wise to consider the possibility that active inference is his only real contribution in all this.
- Gerry Quinn says:
  
  March 22, 2019 at 5:22 am
  
  Exactly: what we colloquially call ‘energy’ is precisely what physical chemists call ‘free energy’. Simply put, it is a combination of raw energy and entropy changes. Due to the latter, it is defined slightly differently under different environmental constraints.
misini says:

March 21, 2019 at 8:01 pm

Life forms, as a direct consequence of their ability to reproduce and pass mutations in their reproduction to their descendents behave, grow, and reproduce in whatever ways allow them to be as numerous as possible and to outcompete other, similar life forms for resources. If this FE/PC/PCT stuff does not strictly follow from the previous explanation of why life does what it does, an explanation we are very confident in, it is not worthy of our attention.

I find explanations of how our behavior is rooted in instincts and sensations that result in maximum reproductive success (like eating when we are hungry, avoiding injury, and having sex) far clearer than this theory. It is true that human behavior sometimes contradicts what would be predicted by natural selection (contraception being an obvious example), but I think this is a case of there not having been enough time for natural selection to operate on the full range of human intelligence, as we are the first species that can understand the selective forces that create us in the abstract, rather than relying on simple instincts to drive us.
I don’t think many generations will pass before our descendants have behavior directly driven by high-level and abstract concerns of natural selection rather than more concrete instincts (e.g., with every major decision they make the consider how it will impact how many children they have and/or how many ideas they can spread, as memes are also subject to natural selection, and simpler instincts like hunger, sex-drive and survival are easy to overpower if other considerations are strong enough, like self-sacrifice for the sake of your children or your sibling’s children). Once the first such beings are born into the world it would not take long for them to dominate the population and outcompete all who are not instinctually concerned with such things.
- warrenmansell says:
  
  March 22, 2019 at 12:01 am
  
  PCT does, very nicely. See Gary Cziko’s book:
  Book details
  The Things We Do: Using the Lessons of Bernard and Darwin to Understand the What, How, and Why of Our Behavior
  
  https://books.google.co.uk/books/about/The_Things_We_Do.html?id=jiqLW2m54MwC&printsec=frontcover&source=kp_read_button&redir_esc=y
  https://www.goodreads.com/book/show/1753896.The_Things_We_Do
- Enkidum says:
  
  March 22, 2019 at 6:39 am
  
  Life forms, as a direct consequence of their ability to reproduce and pass mutations in their reproduction to their descendents behave, grow, and reproduce in whatever ways allow them to be as numerous as possible and to outcompete other, similar life forms for resources. If this FE/PC/PCT stuff does not strictly follow from the previous explanation of why life does what it does, an explanation we are very confident in, it is not worthy of our attention.
  
  This is, with all due respect, a very silly way of approaching science. Perhaps I’m a little grumpy, so apologies for the stridency here, but I think you’re completely and utterly wrong about this.
  
  Quantum mechanics is even more fundamental than evolutionary theory, and evolutionary theory can only be true insofar as it is possible in a world that is described by QM. Should we only accept psychological theories that are directly derived from quantum mechanics? This is just incoherent. They are explanations aimed at different levels of phenomena. We don’t even derive evolution from QM. (Please don’t try and explain to me how this is possible, I’m sure someone has done it, but as a matter of historical fact it’s not how evolution was discovered, nor how it is taught or understood today by the vast majority of people.)
  
  We don’t understand how the mind works very well. Some evolutionarily-minded explanations of some cognitive phenomena have been very successful. Some of these explanations have been utter crap.
  
  The visual and oculomotor system is probably the best-understood part of the human brain / mind. We have a huge number of very precise findings in this domain, obtained through the standard messy scientific processes including observation, hypothesis generation, experimentation, model generation, peer review, etc. No one ever sat down, read The Origin of Species and derived any of these findings through logical inference. That’s not how science works.
  
  I have no idea about the validity of Friston’s theories. However the appropriate way to generate psychological theories involves looking at the phenomena they are attempting to describe. If these theories are somehow incompatible with evolution (whatever that would mean) or with QM, then clearly there is something wrong with them. But that’s not using evolution as a generative principle.
  - misini says:
    
    March 22, 2019 at 8:48 am
    
    Never mind, I had not read well enough. He considers, at least briefly, how the first principles of natural selection and thermodynamics gives rise to his theory in “A free energy principle for the brain”.
    
    Friston is not making statements restricted to human psychology and not restricting his explanations to explaining human brain phenomena. He claims his statements pertain to all life, so he considers what all life has in common. Natural selection is the highest level abstraction that all life has in common, above the underlying physics and chemistry.
    
    As far as I can tell from this paper his theory is that all life strives to make its external environment more similar to its internal environment, or at least move to a more similar external environment, as without such behavior the organism’s internal environment would begin to equilibrate with a very different external environment and it would fall apart and die. It calculates the difference between its external and internal environments based on sensory inputs and its effectiveness at doing so depends on how well its model of how to deduce environmental state from sensory input aligns with its environment, with selection pushing organisms to have models that align with the reality of their environments.
    I am starting to understand how this would describe the behavior of single-celled organisms but I do not see how it can apply to the behavior of humans and animals very well. We experience boredom in warm, humid environments with abundant food and water despite the fact that all that we need to maintain equilibrium is being met. We crave arbitrary stimulation and and activity from games and exploration. I don’t understand how such behavior is “”minimizing free energy”.
anonymousskimmer says:

March 23, 2019 at 8:16 am

Isn’t it more explanatory to posit evolutionary drives and natural selection as the basis of the descriptive framework? But that’s my cognitive bias talking.

How do we decide what to pay attention to? Here, words like “prediction”, “expectation”, and “surprise” make perfect sense. Once this whole paradigm and vocabulary was discovered, scientists realized that it also explained movement, motivation, and desire.

Based on what you wrote, it doesn’t explain it, it describes it. The basis of life’s desire to live is still unknown. The basis of consciousness’ drive to fulfill desires is still unknown.

Models are great. One-off explanations are nice. Neither are ground-up unified theories.

——–
I didn’t take olanzapine long (it was prescribed to moderate a side effect of the SSRI I was on – a recipe for disaster), but I wonder if the hunger side-effect can stick around.
formid0 says:

March 23, 2019 at 12:15 pm

Scott’s blog introduced me to Friston and I found it immediately appealing because it seems to explain how I make decisions.

It could explain why I like to eat very similar meals all the time but why Jeff Bezos, operating under different conditions, likes to aggressively pursue new meal sensations like scrambled eggs and octopus.

I like the same meals because I always know what I’m going to get. This saves me time and money.

But you could understand food better by sampling a lot more dishes. If you’re smarter or richer, perhaps that becomes a more worthwhile domain for uncertainty reduction.

Presumably different genes also make different uncertainty domains more interesting to you. You probably more easily understand some domains than others and get more aggregate uncertainty reduction by spending more time on those domains.

In these comments I see a tide turning against free energy and Friston and it feels like a lot of the objections are pretty superficial and would be answered easily by studying what Friston has actually said.

Let’s take the Dark Room Problem. I thought about what my answer to the problem would be: This is only a problem if you think organisms can only try to minimize free energy over very short periods. But why would a person who was born, and learned a bunch about the world, ever think lying in a dark room would minimize lifetime free energy?

So google turns up:

“This means that a dark room will afford low levels of surprise if, and only if, the agent has been optimized by evolution (or neurodevelopment) to predict and inhabit it. Agents that predict rich stimulating environments will find the “dark room” surprising and will leave at the earliest opportunity.”

Kind of duh, right?

I certainly don’t know anything, I’m not a neuroscientist, but free energy also seems more general than PCT. Friston’s work seems like it potentially can (or does) explain why we have set points. So, it may be that PCT is handier most of the time. Maybe it’s like classical mechanics being handier than general relativity, even though Einstein’s work ultimately explains more. Just because SSC’s readers don’t happen to know what additional explanatory power free energy has over PCT doesn’t necessarily mean it’s not there.

The other thing I found very intriguing after visiting the link above is this sentence:

“Free energy, as here defined, bounds surprise, conceived as the difference between an organism’s predictions about its sensory inputs (embodied in its models of the world) and the sensations it actually encounters.”

For years I’ve been using something that rhymes with the foregoing as the definition of free will. People have a lot of opinions about free will so it’s not like anyone will want to be particularly charitable about entertaining some random stranger’s new definition, but this is mine:

Free will is the moment-by-moment reconciliation of how you make choices based on your predictions of events and how events actually unfold.

If you’ve followed the rationalist community for a while you may be familiar with Eliezer’s Probability is in the Mind:

https://www.lesswrong.com/posts/f6ZLxEWaankRZ2Crv/probability-is-in-the-mind

…which he derives from “the late great Bayesian Master E. T. Jaynes”.

If probability is in the mind, then so is determinism.

There can never be a computer in this universe that can forecast the moment-by-moment events of the universe, so the universe has free will, to the extent it has a will. Certainly parts of it have a will, like you and me.

As far as we know, there is no mind or computer that can forecast the moment-by-moment decisions of all the minds in the universe, so our minds mostly have free will.

There is no mind or computer that can forecast your moment-by-moment decisions. Only the grossest forecasts of your decisions can be made from your history. So you mostly have free will.

Your predictions and decisions are your own, until such time as any mind or computer can mimic them on a moment-by-moment basis.

You can object that gross predictions of key life events take away some of your free will and maybe they do. But those predictions at best are only a melody. Every different arrangement of a melody is beautiful and worth hearing.

I find it pretty likely that there will never be a computer that can forecast your moment-by-moment decisions, so you will likely always have free will.

Movies like Minority Report tap into our intuition of this sense of free will. The idea that a mind or computer COULD predict your precise and distant momentary behavior would mean that we no longer have free will, which would be viscerally terrifying.
- warrenmansell says:
  
  March 25, 2019 at 5:50 am
  
  With respect, it does t found like you’ve read Powers (1973) to make this comparison?
Nootropic cormorant says:

March 24, 2019 at 8:49 am

For me, an advantage of FE/PC is exactly that it doesn’t put any hard limits between cognition and sensations. So if I cognitively know that I would be cold in Antartica, I will not go there for this exact same reason, as I predict that I will not be cold.
In the same way, a letter concerning a medical diagnosis or death of a loved one can “trickle down” from a higher cognitive level of parsing meaning of written word to physical sensations of sickness and pain.

In PCT, you seem to have this control system which receives information from these set-pointed sensors that acts through an unexplained mechanism so as to bring them back to the set-point. It is not clear how does it know which actions produce which effects and also why does it care about some set-points and not others.

IIRC, FE/PC is not supposed to be a competitor to lower-level paradigms like PCT, but a formalism that encompasses their findings on a higher level of abstraction.
- warrenmansell says:
  
  March 25, 2019 at 5:52 am
  
  Hi, I’m not sure what source you used but PCT is not one control system. It is a cascading, branching hierarchy of multiple control systems and Powers (1973) hypothesis in some detail how neurones might fulfil the functions you are describing.
  - Nootropic cormorant says:
    
    March 25, 2019 at 10:29 am
    
    Thank you for your correction, I am just comparing the emphasis given by these two vocabularies as presented by Scott. In his description it is not clear that it is uncertainity reduction all the way down, which is a crucial point.
    - warrenmansell says:
      
      March 25, 2019 at 1:15 pm
      
      Hi, in PCT it is error reduction, and a chronic cause of error is conflict between control systems.
John David Ward says:

March 24, 2019 at 1:00 pm

I don’t know anything about perceptual control theory, but this reminds me a lot of how “culture” is defined in environmental anthropology: namely, as a set of common strategies for altering one’s environment to bring it within acceptable parameters. For example, if it’s cold, build a fire.
- warrenmansell says:
  
  March 25, 2019 at 5:54 am
  
  Another advantage of PCT, yes is that it has been used to model at all of these levels. See work by Ted Cloak for the social anthropology and Kent McClelland for the sociology.
TheRadicalModerate says:

March 24, 2019 at 2:06 pm

Neither of these makes much sense in an evolutionary context. There’s no way a mechanism that works to maintain minimal deviation from set points could evolve from whole cloth. And, while it’s easy to see how minimization of free energy, or at least minimization of energy consumption, has global survival value, you have to have a plausible process where that principle can move from thermodynamic good sense to an animating mechanism for cognition.

I like to frame the problem as a matter of regulating attention in an energy-constrained environment. Think of the following evolutionary innovations:

1) Organisms that mediate their environment through receptors on their surface. This is a global, indiscriminate attention mechanism: If some molecule binds to a receptor, the response will occur. If too many molecules bind to too many receptors simultaneously, the organism will become exhausted and die.

2) Organisms that mediate their environment through distributed nervous systems. This is still global attention, but the neural net can learn to allow the various sensory inputs to compete for an appropriate response. This is considerably better than just receptors, but the same energy problem applies: Too many stimuli at once and the learned competitive patterns become useless, the organism becomes exhausted, and dies.

3) Organisms that have a centralized, or hierarchical, nervous system. Now neural nets learn to filter stimuli along the journey to the “brain”, which minimizes the amount of energy expended in dealing with spurious stimuli. The brain has the ability to allocate attention to the highest priority stimuli. But the brain’s morphology has to support hard-coded procedures for dealing with various stimulus patterns that enhance its survival, or it has a very difficult learning problem, which may ultimately lead to the aforementioned exhaustion and death.

4) Organisms that have centralized attention systems that learn efficiently. Now we’re finally in full-up FE/PC or PCT territory, because the attention mechanism will be most efficient when the act of attending to a stimulus and responding to it results in the least surprising next set of stimuli. Surprise results in learning, which is energetically painful, so learning as efficiently as possible is important (to avoid the aforementioned exhaustion and death).

5) Finally, we have organisms that can attend to the internal learned states. This is an improvement over simple learning, because the organism can allocate attention to these states to allow them to learn when the conditions for learning are energetically favorable. It can walk chains of learned perceptual patterns and response behaviors and become mildly surprised when things don’t seem to add up, allowing planning. Planning results in many fewer situation where the organism can become exhausted and die.

This seems to be fully consistent with the basic “minimization of surprise” narrative, but also provides insight into how such a principle could become so central to perception and behavior.
- warrenmansell says:
  
  March 25, 2019 at 5:56 am
  
  A bit like this PCT account then?
  http://www.livingcontrolsystems.com/intro_papers/evolution_purpose.pdf
- vV_Vv says:
  
  March 25, 2019 at 10:07 am
  
  There’s no way a mechanism that works to maintain minimal deviation from set points could evolve from whole cloth.
  
  citation needed.
  - TheRadicalModerate says:
    
    March 25, 2019 at 2:41 pm
    
    Don’t have one, but I realized I didn’t put this very well. You can’t go from nothing to a set-point maintenance dynamic in one step, or even any series of steps that are simple refinements. It’s an emergent property, to the extent that it’s even a decent paradigm for describing behavior. I believe that it emerged from the need to manage attention.
    
    I don’t have a citation for that, either.
GammaSQ says:

March 25, 2019 at 6:08 am

I have only glanced at perceptual control theory, and hardly understood free energy minimisation though I made and effort to do so.
Intuitively, I’d have said the difference between the two is FE presenting a framework of how the PCT set-point evolves. As far as I understood, FE assigns a cost to:
– updating your model
– doing something to align reality with your model
– doing nothing (and accepting the reality-model difference)
A control system decides to do as much of these three options as will minimise total cost (i.e. “free energy”).

What I don’t get is e.g. hunger. Say hunger is an external sensory input to my brain. It would predict to either eat or feel more hunger. I assume I rarely get _really_ hungry in my life, so the more hungry I get, the less data points exist, the less certain my brains model get. Soon, aligning reality (i.e. eating) is cheaper than venturing into unknown territory. This would pretty much explain the metabolic set point drift. (A brain having experienced massive hunger often has much data there.)
However, I should be able to describe my body as FE minimising system as well. Why would it suddenly experience hunger? Why should low stomach content predict a full stomach? Is hunger just trained behavior? Did I misinterpret something?

Finally, I recently found a solution to the dark room paradox in nuclear physics. Ionising radiation is everywhere and a background always exists. So every measurement requires a background measurement. Usually, the time to do a measurement is fixed and needs to be divided between background and actual measurement. There is a simple theorem to calculate the optimal time-fraction to measure the background based on the expected signal-to-noise-ratio. Essentially, if the signal-to-noise-ratio is high, background measurement can be short, if it’s low, background measurement needs to be longer.

Apply that to the dark room: There is essentially no background, any measurement is expected to produce a perfect signal. Now is the perfect time to do some measurements and correct your model. Run around and see if you can break the everything-is-dark expectation, because any slight deviation would create a good signal.
- warrenmansell says:
  
  March 25, 2019 at 6:21 am
  
  An array of internal states experienced in consciousness, including emotional states, arise when the multiple reference points for different sensory experiences are in conflict with one another, according to PCT. So you feel hungry when there are other things you are controlling for that don’t allow you to eat what you need to. Clearly, thus is exaggerated when people are starting to diet or restrict their food intake for what they regard as important reasons (superordinate control systems often to do with keeping to self ideals or socially prescribed ideals).
Daniel Friedman says:

April 3, 2019 at 1:22 pm

In 2012, Karl Friston et al. wrote a paper:
“Free-Energy Minimization and the Dark-Room Problem”
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3347222/
So it is not true that FE is confused on this point. Nor it is true that “The main proposed solution is to claim you have some built-in predictions (of eg light, social interaction, activity levels), and the dark room will violate those.”. This has been Ex Friston Cathedra for 7 years now.

Here is one way to think about it. While you are in a dark room, you are indeed getting high precision on your sensory data (e.g. you expect it to be dark, and it is dark), this is rewarding. However, your deep priors about the world are becoming progressively less precise (e.g. is your car still safe? where will you next get food? is your family safe?). Thus this “paradox” is resolved in the FE perspective by seeing that our behavior always consists of hierarchically nested priors (for example, Reading consists of oculomotor priors about where the letters are, syntactic priors about grammar, semantic priors about word definitions, etc). The result is that dark rooms (or a sensory deprivation chamber) can be relaxing and rewarding for short periods of time in certain contexts, it is not a stable behavioral attractor state.

Blogroll

Economics

Effective Altruism

Rationality

Science

SSC Elsewhere

Archives

Translating Predictive Coding Into Perceptual Control

103 Responses to Translating Predictive Coding Into Perceptual Control

Meta