God Help Us, Let’s Try To Understand Friston On Free Energy

I’ve been trying to delve deeper into predictive processing theories of the brain, and I keep coming across Karl Friston’s work on “free energy”.

At first I felt bad for not understanding this. Then I realized I wasn’t alone. There’s an entire not-understanding-Karl-Friston internet fandom, complete with its own parody Twitter account and Markov blanket memes.

From the journal Neuropsychoanalysis (which based on its name I predict is a center of expertise in not understanding things):

At Columbia’s psychiatry department, I recently led a journal club for 15 PET and fMRI researhers, PhDs and MDs all, with well over $10 million in NIH grants between us, and we tried to understand Friston’s 2010 Nature Reviews Neuroscience paper – for an hour and a half. There was a lot of mathematical knowledge in the room: three statisticians, two physicists, a physical chemist, a nuclear physicist, and a large group of neuroimagers – but apparently we didn’t have what it took. I met with a Princeton physicist, a Stanford neurophysiologist, a Cold Springs Harbor neurobiologist to discuss the paper. Again blanks, one and all.

Normally this is the point at which I say “screw it” and give up. But almost all the most interesting neuroscience of the past decade involves this guy in one way or another. He’s the most-cited living neuroscientist, invented large parts of modern brain imaging, and received the prestigious Golden Brain Award (which is somehow a real thing). His Am I Autistic – An Intellectual Autobiography short essay, written in a weirdly lucid style and describing hijinks like deriving the Schrodinger equation for fun in school, is as consistent with genius as anything I’ve ever read.

As for free energy, it’s been dubbed “a unified brain theory” (Friston 2010), a key through which “nearly every aspect of [brain] anatomy and physiology starts to make sense” (Friston 2009), “[the source of] the ability of biological systems to resist a natural tendency to disorder” (Friston 2012), an explanation of how life “inevitably and emergently” arose from the primordial soup (Friston 2013), and “a real life version of Isaac Asimov’s psychohistory” (description here of Allen 2018).

I continue to hope some science journalist takes up the mantle of explaining this comprehensively. Until that happens, I’ve been working to gather as many perspectives as I can, to talk to the few neuroscientists who claim to even partially understand what’s going on, and to piece together a partial understanding. I am not at all the right person to do this, and this is not an attempt to get a gears-level understanding – just the kind of pop-science-journalism understanding that gives us a slight summary-level idea of what’s going on. My ulterior motive is to get to the point where I can understand Friston’s recent explanation of depression, relevant to my interests as a psychiatrist.

Sources include Dr. Alianna Maren’s How To Read Karl Friston (In The Original Greek), Wilson and Golonka’s Free Energy: How the F*ck Does That Work, Ecologically?, Alius Magazine’s interview with Friston, Observing Ideas, and (especially) the ominously named Wo’s Weblog.

From these I get the impression that part of the problem is that “free energy” is a complicated concept being used in a lot of different ways.

First, free energy is a specific mathematical term in certain Bayesian equations.

I’m getting this from here, which goes into much more detail about the math than I can manage. What I’ve managed to extract: Bayes’ theorem, as always, is the mathematical rule for determining how much to weigh evidence. The brain is sometimes called a Bayesian machine, because it has to create a coherent picture of the world by weighing all the different data it gets – everything from millions of photoreceptors’ worth of vision, to millions of cochlear receptors worth of hearing, to all the other sense, to logical reasoning, to past experience, and so on. But actually using Bayes on all this data quickly gets computationally intractable.

Free energy is a quantity used in “variational Bayesian methods”, a specific computationally tractable way of approximating Bayes’ Theorem. Under this interpretation, Friston is claiming that the brain uses this Bayes-approximation algorithm. Minmizing the free energy quantity in this algorithm is equivalent-ish to trying to minimize prediction error, trying to minimize the amount you’re surprised by the world around you, and trying to maximize accuracy of mental models. This sounds in line with standard predictive processing theories. Under this interpretation, the brain implements predictive processing through free energy minimization.

Second, free energy minimization is an algorithm-agnostic way of saying you’re trying to approximate Bayes as accurately as possible.

This comes from the same source as above. It also ends up equivalent-ish to all those other things like trying to be correct in your understanding of the world, and to standard predictive processing.

Third, free energy minimization is a claim that the fundamental psychological drive is the reduction of uncertainty.

I get this claim from the Alius interview, where Friston says:

If you subscribe to the premise that that creatures like you and me act to minimize their expected free energy, then we act to reduce expected surprise or, more simply, resolve uncertainty. So what’s the first thing that we would do on entering a dark room — we would turn on the lights. Why? Because this action has epistemic affordance; in other words, it resolves uncertainty (expected free energy). This simple argument generalizes to our inferences about (hidden or latent) states of the world — and the contingencies that underwrite those states of affairs.

The discovery that the only human motive is uncertainty-reduction might come as a surprise to humans who feel motivated by things like money, power, sex, friendship, or altruism. But the neuroscientist I talked to about this says I am not misinterpreting the interview. The claim really is that uncertainty-reduction is the only game in town.

In a sense, it must be true that there is only one human motivation. After all, if you’re Paris of Troy, getting offered the choice between power, fame, and sex – then some mental module must convert these to a common currency so it can decide which is most attractive. If that currency is, I dunno, dopamine in the striatum, then in some reductive sense, the only human motivation is increasing striatal dopamine (don’t philosophize at me, I know this is a stupid way of framing things, but you know what I mean). Then the only weird thing about the free energy formulation is identifying the common currency with uncertainty-minimization, which is some specific thing that already has another meaning.

I think the claim (briefly mentioned eg here) is that your brain hacks eg the hunger drive by “predicting” that your mouth is full of delicious food. Then, when your mouth is not full of delicious food, it’s a “prediction error”, it sets off all sorts of alarm bells, and your brain’s predictive machinery is confused and uncertain. The only way to “resolve” this “uncertainty” is to bring reality into line with the prediction and actually fill your mouth with delicious food. On the one hand, there is a lot of basic neuroscience research that suggests something like this is going on. On the other, Wo’s writes about this further:

The basic idea seems to go roughly as follows. Suppose my internal probability function Q assigns high probability to states in which I’m having a slice of pizza, while my sensory input suggests that I’m currently not having a slice of pizza. There are two ways of bringing Q in alignment with my sensory input: (a) I could change Q so that it no longer assigns high probability to pizza states, (b) I could grab a piece of pizza, thereby changing my sensory input so that it conforms to the pizza predictions of Q. Both (a) and (b) would lead to a state in which my (new) probability function Q’ assigns high probability to my (new) sensory input d’. Compared to the present state, the sensory input will then have lower surprise. So any transition to these states can be seen as a reduction of free energy, in the unambitious sense of the term.

Action is thus explained as an attempt to bring one’s sensory input in alignment with one’s representation of the world.

This is clearly nuts. When I decide to reach out for the pizza, I don’t assign high probability to states in which I’m already eating the slice. It is precisely my knowledge that I’m not eating the slice, together with my desire to eat the slice, that explains my reaching out.

There are at least two fundamental problems with the simple picture just outlined. One is that it makes little sense without postulating an independent source of goals or desires. Suppose it’s true that I reach out for the pizza because I hallucinate (as it were) that that’s what I’m doing, and I try to turn this hallucination into reality. Where does the hallucination come from? Surely it’s not just a technical glitch in my perceptual system. Otherwise it would be a miraculous coincidence that I mostly hallucinate pleasant and fitness-increasing states. Some further part of my cognitive architecture must trigger the hallucinations that cause me to act. (If there’s no such source, the much discussed “dark room problem” arises: why don’t we efficiently minimize sensory surprise (and thereby free energy) by sitting still in a dark room until we die?)

The second problem is that efficient action requires keeping track of both the actual state and the goal state. If I want to reach out for the pizza, I’d better know where my arms are, where the pizza is, what’s in between the two, and so on. If my internal representation of the world falsely says that the pizza is already in my mouth, it’s hard to explain how I manage to grab it from the plate.

A closer look at Friston’s papers suggests that the above rough proposal isn’t quite what he has in mind. Recall that minimizing free energy can be seen as an approximate method for bringing one probability function Q close to another function P. If we think of Q as representing the system’s beliefs about the present state, and P as a representation of its goals, then we have the required two components for explaining action. What’s unusual is only that the goals are represented by a probability function, rather than (say) a utility function. How would that work?

Here’s an idea. Given the present probability function Q, we can map any goal state A to the target function Q^A, which is Q conditionalized on A — or perhaps on certain sensory states that would go along with A. For example, if I successfully reach out for the pizza, my belief function Q will change to a function Q^A that assigns high probability to my arm being outstretched, to seeing and feeling the pizza in my fingers, etc. Choosing an act that minimizes the difference between my belief function and Q^A is then tantamount to choosing an act that realizes my goal.

This might lead to an interesting empirical model of how actions are generated. Of course we’d need to know more about how the target function Q^A is determined. I said it comes about by (approximately?) conditionalizing Q on the goal state A, but how do we identify the relevant A? Why do I want to reach out for the pizza? Arguably the explanation is that reaching out is likely (according to Q) to lead to a more distal state in which I eat the pizza, which I desire. So to compute the proximal target probability Q^A we presumably need to encode the system’s more distal goals and then use techniques from (stochastic) control theory, perhaps, to derive more immediate goals.

That version of the story looks much more plausible, and much less revolutionary, than the story outlined above. In the present version, perception and action are not two means to the same end — minimizing free energy. The free energy that’s minimized in perception is a completely different quantity than the free energy that’s minimized in action. What’s true is that both tasks involve mathematically similar optimization problems. But that isn’t too surprising given the well-known mathematical and computational parallels between conditionalizing and maximizing expected utility.

It’s tempting to throw this out entirely. But part of me does feel like there’s a weird connection between curiosity and every other drive. For example, sex seems like it should be pretty basic and curiosity-resistant. But how often do people say that they’re attracted to someone “because he’s mysterious”? And what about the Coolidge Effect (known in the polyamory community as “new relationship energy”)? After a while with the same partner, sex and romance lose their magic – only to reappear if the animal/person hooks up with a new partner. Doesn’t this point to some kind of connection between sexuality and curiosity?

What about the typical complaint of porn addicts – that they start off watching softcorn porn, find after a while that it’s no longer titillating, move on to harder porn, and eventually have to get into really perverted stuff just to feel anything at all? Is this a sort of uncertainty reduction?

The only problem is that this is a really specific kind of uncertainty reduction. Why should “uncertainty about what it would be like to be in a relationship with that particular attractive person” be so much more compelling than “uncertainty about what the middle letter of the Bible is”, a question which almost no one feels the slightest inclination to resolve? The interviewers ask Friston something sort of similar, referring to some experiments where people are happiest not when given easy things with no uncertainty, nor confusing things with unresolvable uncertainty, but puzzles – things that seem confusing at first, but actually have a lot of hidden order within them. They ask Friston whether he might want to switch teams to support a u-shaped theory where people like being in the middle between too little uncertainty or too much uncertainty. Friston…does not want to switch teams.

I do not think that “different laws may apply at different levels”. I see a singular and simple explanation for all the apparent dialectics above: they are all explained by minimization of expected free energy, expected surprise or uncertainty. I feel slightly puritanical when deflating some of the (magical) thinking about inverted U curves and “sweet spots”. However, things are just simpler than that: there is only one sweet spot; namely, the free energy minimum at the bottom of a U-shaped free energy function […]

This means that any opportunity to resolve uncertainty itself now becomes attractive (literally, in the mathematical sense of a random dynamical attractor) (Friston, 2013). In short, as nicely articulated by (Schmidhuber, 2010), the opportunity to answer “what would happen if I did that” is one of the most important resolvers of uncertainty. Formally, the resolution of uncertainty (aka intrinsic motivation, intrinsic value, epistemic value, the value of information, Bayesian surprise, etc. (Friston et al., 2017)) corresponds to salience. Note that in active inference, salience becomes an attribute of an action or policy in relation to the lived world. The mathematical homologue for contingencies (technically, the parameters of a generative model) corresponds to novelty. In other words, if there is an action that can reduce uncertainty about the consequences of a particular behavior, it is more likely to be expressed.
Given these imperatives, then the two ends of the inverted U become two extrema on different dimensions. In a world full of novelty and opportunity, we know immediately there is an opportunity to resolve reducible uncertainty and will immediately embark on joyful exploration — joyful because it reduces uncertainty or expected free energy (Joffily & Coricelli, 2013). Conversely, in a completely unpredictable world (i.e., a world with no precise sensory evidence, such as a dark room) there is no opportunity and all uncertainty is irreducible — a joyless world. Boredom is simply the product of explorative behavior; emptying a world of its epistemic value — a barren world in which all epistemic affordance has been exhausted through information seeking, free energy minimizing action.

Note that I slipped in the word “joyful” above. This brings something interesting to the table; namely, the affective valence of shifts in uncertainty — and how they are evaluated by our brains.

The only thing at all I am able to gather from this paragraph – besides the fact that apparently Karl Friston cites himself in conversation – is the Schmidhuber reference, which is actually really helpful. Schmidhuber is the guy behind eg the Formal Theory Of Fun & Creativity Explains Science, Art, Music, Humor, in which all of these are some form of taking a seemingly complex domain (in the mathematical sense of complexity) and reducing it to something simple (discovering a hidden order that makes it more compressible). I think Friston might be trying to hint that free energy minimization works in a Schmidhuberian sense where it applies to learning things that suddenly make large parts of our experience more comprehensible at once, rather than just “Here are some numbers: 1, 5, 7, 21 – now you have less uncertainty over what numbers I was about to tell you, isn’t that great?”

I agree this is one of life’s great joys, though maybe me and Karl Friston are not a 100% typical subset of humanity here. Also, I have trouble figuring out how to conceptualize other human drives like sex as this same kind complexity-reduction joy.

One more concern here – a lot of the things I read about this equivocate between “model accuracy maximization” and “surprise minimization”. These end really differently. Model accuracy maximization sounds like curiosity – you go out and explore as much of the world as possible to get a model that precisely matches reality. Surprise minimization sounds like locking yourself in a dark room with no stimuli, then predicting that you will be in a dark room with no stimuli, and never being surprised when your prediction turns out to be right. I understand Friston has written about the so-called “dark room problem”, but I haven’t had a chance to look into it as much as I should, and I can’t find anything that takes one or the other horn of the equivocation and says “definitely this one”.

Fourth, okay, all of this is pretty neat, but how does it explain all biological systems? How does it explain the origin of life from the primordial soup? And when do we get to the real-world version of psychohistory? In his Alius interview, Friston writes:

I first came up with a prototypical free energy principle when I was eight years old, in what I have previously called a “Gerald Durrell” moment (Friston, 2012). I was in the garden, during a gloriously hot 1960s British summer, preoccupied with the antics of some woodlice who were frantically scurrying around trying to find some shade. After half an hour of observation and innocent (childlike) contemplation, I realized their “scurrying” had no purpose or intent: they were simply moving faster in the sun — and slower in the shade. The simplicity of this explanation — for what one could artfully call biotic self-organization — appealed to me then and appeals to me now. It is exactly the same principle that underwrites the ensemble density dynamics of the free energy principle — and all its corollaries.

How do the wood lice have anything to do with any of the rest of this?

As best I can understand (and I’m drawing from here and here again), this is an ultimate meaning of “free energy” which is sort of like a formalization of homeostasis. It goes like this: consider a probability distribution of all the states an organism can be in. For example, your body can be at (90 degrees F, heart rate 10), (90 degrees F, heart rate 70), (98 degrees F, heart rate 10), (98 degrees F, heart rate 70), or any of a trillion other different combinations of possible parameters. But in fact, living systems successfully restrict themselves to tiny fractions of this space – if you go too far away from (98 degrees F, heart rate 70), you die. So you have two probability distributions – the maximum-entropy one where you could have any combination of heart rate and body temperature, and the one your body is aiming for with a life-compatible combination of heart rate and body temperature. Whenever you have a system trying to convert one probability distribution into another probability distribution, you can think of it as doing Bayesian work and following free energy principles. So free energy seems to be something like just a formal explanation of how certain systems display goal-directed behavior, without having to bring in an anthropomorphic or teleological concept of “goal-directedness”.

Friston mentions many times that free energy is “almost tautological”, and one of the neuroscientists I talked to who claimed to half-understand it said it should be viewed more as an elegant way of looking at things than as a scientific theory per se. From the Alius interview:

The free energy principle stands in stark distinction to things like predictive coding and the Bayesian brain hypothesis. This is because the free energy principle is what it is — a principle. Like Hamilton’s Principle of Stationary Action, it cannot be falsified. It cannot be disproven. In fact, there’s not much you can do with it, unless you ask whether measurable systems conform to the principle.

So we haven’t got a real-life version of Asimov’s psychohistory, is what you’re saying?

But also:

The Bayesian brain hypothesis is a corollary of the free energy principle and is realized through processes like predictive coding or abductive inference under prior beliefs. However, the Bayesian brain is not the free energy principle, because both the Bayesian brain hypothesis and predictive coding are incomplete theories of how we infer states of affairs.

This missing bit is the enactive compass of the free energy principle. In other words, the free energy principle is not just about making the best (Bayesian) sense of sensory impressions of what’s “out there”. It tries to understand how we sample the world and author our own sensations. Again, we come back to the woodlice and their scurrying — and an attempt to understand the imperatives behind this apparently purposeful sampling of the world. It is this enactive, embodied, extended, embedded, and encultured aspect that is lacking from the Bayesian brain and predictive coding theories; precisely because they do not consider entropy reduction […]

In short, the free energy principle fully endorses the Bayesian brain hypothesis — but that’s not the story. The only way you can change “the shape of things” — i.e., bound entropy production — is to act on the world. This is what distinguishes the free energy principle from predictive processing. In fact, we have now taken to referring to the free energy principle as “active inference”, which seems closer to the mark and slightly less pretentious for non-mathematicians.

So maybe the free energy principle is the unification of predictive coding of internal models, with the “action in the world is just another form of prediction” thesis mentioned above? I guess I thought that was part of the standard predictive coding story, but maybe I’m wrong?

Overall, the best I can do here is this: the free energy principle seems like an attempt to unify perception, cognition, homeostasis, and action.

“Free energy” is a mathematical concept that represents the failure of some things to match other things they’re supposed to be predicting.

The brain tries to minimize its free energy with respect to the world, ie minimize the difference between its models and reality. Sometimes it does that by updating its models of the world. Other times it does that by changing the world to better match its models.

Perception and cognition are both attempts to create accurate models that match the world, thus minimizing free energy.

Homeostasis and action are both attempts to make reality match mental models. Action tries to get the organism’s external state to match a mental model. Homeostasis tries to get the organism’s internal state to match a mental model. Since even bacteria are doing something homeostasis-like, all life shares the principle of being free energy minimizers.

So life isn’t doing four things – perceiving, thinking, acting, and maintaining homeostasis. It’s really just doing one thing – minimizing free energy – in four different ways – with the particular way it implements this in any given situation depending on which free energy minimization opportunities are most convenient. Or something.

This might be useful in some way? Or it might just be a cool philosophical way of looking at the world? Or maybe something in between? Or maybe a meaningless way of looking at the world? Or something? Somebody please help?


Discussion question for machine ethics researchers – if the free energy principle were right, would it disprove the orthogonality thesis? Might it be impossible to design a working brain with any goal besides free energy reduction? Would anything – even a paperclip maximizer – have to start by minimizing uncertainty, and then add paperclip maximization in later as a hack? Would it change anything if it did?

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

246 Responses to God Help Us, Let’s Try To Understand Friston On Free Energy

  1. “uncertainty about what the middle letter of the Bible is”, a question which almost no one feels the slightest inclination to resolve?

    Well, until now, anyway. Thanks Scott. 😉

    In a world full of novelty and opportunity, we know immediately there is an opportunity to resolve reducible uncertainty and will immediately embark on joyful exploration — joyful because it reduces uncertainty or expected free energy (Joffily & Coricelli, 2013). Conversely, in a completely unpredictable world (i.e., a world with no precise sensory evidence, such as a dark room) there is no opportunity and all uncertainty is irreducible — a joyless world. Boredom is simply the product of explorative behavior; emptying a world of its epistemic value — a barren world in which all epistemic affordance has been exhausted through information seeking, free energy minimizing action.

    So… the answer to the dark room problem is that the brain doesn’t desire the state of minimized uncertainty, but desires the process of minimized uncertainty, so it seeks out uncertainty to minimize, and that is what curiosity is? Am I understanding this correctly?

    • finnydo says:

      The dark room doesn’t really decrease uncertainty, though. It increases it. A world still exists outside of the dark room, in which now-unknown things are going on. It’s just a matter of time before some scary events break down your door, and you have no way of predicting or resolving that occurrence.

      By acting on the world, you can both observe and shape events to minimize the possibility of unforeseen events breaking down your door. By sitting quietly in the dark, you can do neither.

      • Toby Bartels says:

        Right, this is like the elementary thermodynamics error of thinking that a uniform gas should have low entropy because there's almost nothing to say about it. In fact, there's an incredible amount of unspecified detail about where precisely each molecule is. Here, the unspecified detail is what's going on where you're not looking.

      • papermite says:

        That begs the question then. If you were raised from birth inside a dark room, thinking that it’s the entirety of your existence, would you still try and seek out uncertainty?

        • Doesntliketocomment says:

          This sounds like a metaphor for the human condition.

        • Toby Bartels says:

          No, and what's more, if anybody did find out about the rest of the world and came to tell you about it, then you'd kill them.

          At least that's what Plato said.

        • Inty says:

          Another commenter mentioned Plato, but I think this quote from Douglas Adams is apt:

          ‘Upon first witnessing the glory and splendor of the Universe, they casually, whimsically, decided to destroy it, remarking, “It’ll have to go”.’

      • herculesorion says:

        Dark Room is a hypothetical, and in a hypothetical you’re allowed to assume that the world outside the room doesn’t exist (it’s like physics world where you can use massless ropes and frictionless pulleys to move spherical cows).

        • Yosarian2 says:

          I mean, I know a fair number of people who enjoy (and spend a lot of money) regularly going to a “float”, which is a sensory deprevation capsule or room where you float in a dark silent room in a room tempature liquid for a hour with absolutely no sensory stimulus at all. They find it extremely relaxing.

    • polymachairoplacida says:

      The Gutenberg Project’s Version of the King James Bible has an even number of letters (3224232, excluding all whitesdpace, numbers and punctuation), the middle of which are “er”.

      I somehow find that amusing.

      If you now feel spoilered, there’s plenty more versions to count through 😉

      • mudita says:

        I used the Gutenberg Project’s Version of the King James Bible as well – before I saw your comment – and arrived at the same answer 🙂

      • What would be cool is if you add up every middle letter/s from all the different versions of the Bible, arranged chronologically, you get a secret message from God.

      • Jaskologist says:

        If you now feel spoilered, there’s plenty more versions to count through

        Doesn’t Free Energy predict that we would never object to being spoilered?

    • Doesntliketocomment says:

      I think so. In nature, it would be impossible to achieve a state of complete environmental control (thus predictability) so the brain isn’t wired to consider it an option. Thus we will always want to escape the locked room (why does it need to be dark?) to make sure that nothing is threatening our stability from outside. One could look at windows as an extension of this problem, we need to see outside even when we’re inside “just in case”.

      • One could look at windows as an extension of this problem, we need to see outside even when we’re inside “just in case”.

        Incidentally, I think this also why most people wouldn’t be comfortable living in a simulation. We’d feel less sure about our safety and even less in control of it than usual.

        • anonymousskimmer says:

          Yep. We’d also feel even more constrained than we are by the reality that we see. (“Contrained” brings to me a feeling of being in a straight jacket, “controlled” brings to me a feeling of being handcuffed and ordered about.)

        • Protagoras says:

          In the short term, anyway. I’m sure in the long term people would start to ignore the ways in which they’re not in control in the simulation and delude themselves that they’re safer and more in control than they are just as they do in the real world. New threats always get disproportionate attention compared to old, familiar ones.

        • LibertyRisk says:

          I probably would have agreed with you before I got a vive (VR) about a year ago. Now my intuition has changed. I have excellent noise cancelling headphones and in well made virtual scenes the immersion is very impressive. I experienced a bit of anxiety at first about what was going on “outside” but it only takes a few minutes for me to almost completely forget about it. Sometimes I will step on my cat and that jolts me out of the immersion a bit and gives me an uneasy feeling, but again it takes only a minute or two to completely forget about it and feel immersed again.

          I’m not sure if my experience is typical.

    • markpneyer says:

      So… the answer to the dark room problem is that the brain doesn’t desire the state of minimized uncertainty, but desires the process of minimized uncertainty, so it seeks out uncertainty to minimize, and that is what curiosity is? Am I understanding this correctly?

      This paper had a great explanation. It seems plausible.

      Agents that predict rich stimulating environments will find the “dark room” surprising and will leave at the earliest opportunity. This would be a bit like arriving at the football match and finding the ground empty. Although the ambient sensory signals will have low entropy in the absence of any expectations (model), you will be surprised until you find a rational explanation or a new model (like turning up a day early)

      The argument is that we are not just modeling our environments themselves, but our own reactions to our environments. Each species evolves as a model of a niche. You don’t walk into a dark room with no preconcieved notions about reality; you walk into the dark room expecting to eat at some point in the future, and to use the restroom, and to spend time with other humans. That’s our niche. In that sense, we actually find the dark room somewhat surprising.

      • Surprise is relative to experience and expectation. Makes sense. It’s a mistake to have a thought experiment model start with blank slate assumptions for minds, and then try to give them traits/behaviors/goals, without any content, be that knowledge or instinct.

      • nameless1 says:

        What confuses me about the whole model is how volition and expectation are kind of equated? Just because I want something, why would I predict it, and vice versa? Just because I like rich stimulating environments, I can predict dark rooms if I or my ancestors saw a lot of dark rooms? I may like to eat, but why would I expect to eat? How are they related? We can expect pain because both us and our ancestors had experience of pain, but we don’t like it?

  2. jimmy says:

    This is clearly nuts. When I decide to reach out for the pizza, I don’t assign high probability to states in which I’m already eating the slice. It is precisely my knowledge that I’m not eating the slice, together with my desire to eat the slice, that explains my reaching out.

    This is a pretty straight forward misunderstanding. It’s not “I am already eating the slice”, it’s “I am *about* to eat the slice”. The predictions are always at least a time step in the future.

    That’s why the cliche hypnotist thing is “you are getting sleepy”, not “you are already sleepy”. The latter is either already true (and therefore useless for leading) or already false (and credibility blowing). When you put the predictions in the immediate future it gives them a chance to predict whether you’re going to be wrong or not, and if not, they now expect that they are about to eat the pizza or whatever, and have a chance to minimize their surprise by actually doing it.

    • Edge of Gravity says:

      the cliche hypnotist thing is “you are getting sleepy”, not “you are already sleepy”

      As an amateur hypnotist myself I actually do something close to the latter. Once you are through to subject’s subconscious and your words become their truths, at least for a time, I say “Notice how heavy your arm is, you are unable to lift it now, go on, try it, you cannot!” This minimizes the subconscious surprise (“Of course I can’t, because I am told that I can’t!”, while maximizing the conscious surprise (“What do you mean, I can’t, it’s my arm, I control it!”) and, when done at the right time, this discrepancy deepens the trance significantly and lets you rein in the remains of the subject’s conscious control.

      My guess is that you are doing something similar yourself, but I cba to look through your blog right now.

      • jimmy says:

        Heh, this one gets subtle.

        Yes, “Notice how heavy your arm is” will work fine on someone who is already good and hypnotized. However, it is not a good approach to take with someone who isn’t there yet.

        If you’re super absorbed in what the hypnotist is saying, you aren’t tracking things like “how heavy is my arm”, and so when they tell you to notice that it is heavy, you have to create an expectation of how it is going to feel, and they paint that picture for you. Still one time step in the future.

        If you’re not there yet, then you generally have a good handle on how heavy your arm feels from one time step in the past. In those cases, it’s a completely different matter of testing the hypnotist’s statement against your already experienced state of your arm, and finding it to be false.

    • vV_Vv says:

      This is a pretty straight forward misunderstanding. It’s not “I am already eating the slice”, it’s “I am *about* to eat the slice”. The predictions are always at least a time step in the future.

      Still doesn’t work. What if the pizza is still in the fridge? Or at the supermarket?

    • entobat says:

      How does this ever induce action? Is your belief some kind of statement like P(t) = “in t seconds, I will be eating pizza” and your brain says “shit, better grab the pizza before the clock runs out” once t = 1?

      • jimmy says:

        Say I ask if you want to bet on which box the green ball is going to be under in 10 seconds, and then hand you the ball. Which box do you honestly think it will be under? Whichever one you choose, right?

        It goes like this: you get hungry, and think “I should eat. Eating would be good”. Since you also believe yourself to be someone who does things when you’ve decided you should/you can/it would obviously be good for you, this leads to the expectation “I’m gonna do that”.

        Since the pizza is still at the store, you can’t really expect to eat it in one moment, so obviously the expectation of eating pizza involves getting off the couch, driving to the store, etc, so you get off the couch, drive to the store, and all that. You think “When should I do that?” and go through the same process leading to “I should do that right now. Okay, I’m getting up”.

        On “okay, I’m getting up” predicts actions beginning in the immediate future, and at that point if your muscle don’t move there’s a real problem and you’re learning that you must be paralyzed or something. There’s often work in translating “It would be nice to eat pizza before I starve.” into actual next steps, but the thing “one time step” into the future is the interesting place to look.

        Does that help answer your question, or am I explaining the wrong part?

        • nameless1 says:

          But expectations are entirely redunant and could be eliminated from this explanation. Just deciding to get up is enough to get up, don’t need to expect I will be getting up. Every explanation of Free Energy I heard so far looks like Occam’s perfectly unnecessary object as the thing in the example can be explained without it. We are evolved to predict rich environments, not dark rooms? Yes, but we also evolved to like them more. All this stuff can be reduced to “we do what we like” without involving expectations and predictions. Do you have an example when doing stuff we like is not a sufficient explanation of behavior and expectations and predictions need to be added?

    • Toby Bartels says:

      It’s not “I am already eating the slice”, it’s “I am *about* to eat the slice”. The predictions are always at least a time step in the future.

      Yes. And this fits with the phenomenon of (sometimes) imagining yourself eating the pizza and otherwise anticipating doing what you want to do; the predictions are leaking into your conscious mind. I also think of Temple Grandin's description of her decision-making process (going from memory here, so high epistemic uncertainty): she is presented with a collection of images of possible immediate futures, from which she makes a selection.

      • Speaker To Animals says:

        This reminds me of the autistic character in Iain Banks’s The Quarry who imagines himself selecting responses from a drop down menu, like The Terminator choosing ‘Fuck you, asshole.’

  3. Bugmaster says:

    This all sounds super interesting, but does the concept of “free energy” allow us to make better predictions about the world ? Does it have any explanatory power ? Even if mere mortals are not smart enough to understand the concept, can Friston himself apply it to anything quantifiable ? If not, then what’s the point ?

    • Scott Alexander says:

      That was my question too.

      You might be interested in checking the Alius interview and CTRL+F-ing I would assert that the notion that a “framework” can have the attribute “falsifiable” is a category error.

      • Bugmaster says:

        Ok, in this case, I hereby propose a framework that human brains are actually operated by an intricate society of invisible gremlins (see Bugmaster et al, 2018); naturally, the gremlins themselves are only quasi-physical, existing as a mixture of mathematical constructs and quantum energy states. Is my framework better, or worse, than Friston’s ? Remember, you can’t use evidence and facts and such to justify your answer, since the principle of falsification does not apply to frameworks.

        To be fair, in the preceding paragraph Friston does imply that he has lots of experimental data in support of his framework; sadly, the margins of this interview paper are too narrow to contain it.

        • Alkatyn says:

          We compare frameworks by their usefulness in allowing us to make good theories and good predictions. E.g. the different interpretations of Quantum mechanics dont result in different predictions directly, but allow you to generate different theories that may themselves be falsifiable

      • poignardazur says:

        I don’t want to approach this in bad faith, but this seems like bad epistemology / bad communication to me. Woodlice’s reasoning seems really close to “You can’t get out of the car by driving, you just have to get out of the car”, eg, not super helpful.

        Even if free energy can’t be “proven” by people in labcoats giving placebo free energy to half the studied group, it must still have *some* sort of practical application / manifestation?

        I’m not saying that the guy is making things up, mind you. Just that he seems kind of bad at explaining.

        • nameless1 says:

          This is not an uncommon thing. Chomsky said you cannot falsify generative grammar because it is a paradigm. Ultimately in both cases, if the frameworks, paradigms, lead to making good, falsifiable theories down the road, this can be OK.

          I think it is a classic mistake to blow the falsifiability criterion of science out of proportion. Falsifiability is like quality control in industry: when everything is done and you have a consumer product, basically, you check it against criteria. But just like industry has a long process involving raw materials, semi-finished goods, subassemblies, which are checked against different criteria, you cannot expect science either to always immediately deliver the final result, the falsifiable theory. There is work in progress. Frameworks, paradigms can be work in progress.

          You need a different quality control criteria, like, “could it conceivably lead to falsifiable theories at all?” Well, I remember reading somewhere Friston predicted if an SSRI does not really work on a depressed patient, try dopamine-oriented stuff.

          Or maybe another criteria. Information is defined, at least in IT circles, as data that reduces uncertainty. A framework or a paradigm should at least be information. Can we see Friston and Chomsky reducing uncertainties?

      • syrrim says:

        The only thing that can be falsified
        is a null “hypothesis”. In other words, the only way you can falsify something is to
        reject the null hypothesis in favor of an alternative hypothesis. … The better way to frame evidence based selection of hypotheses is in terms
        of how much empirical evidence is accrued by competing hypotheses. In this light,
        you have to ask yourself what are the alternative hypotheses on offer?

        See also: Asimov’s the relativity of wrong.

        What actually happens is that once scientists get hold of a good concept they gradually refine and extend it with greater and greater subtlety as their instruments of measurement improve. Theories are not so much wrong as incomplete.

    • BlindKungFuMaster says:

      My take on ideas about how the brain works is “if you can’t code it up, you don’t know anything”. Predictive coding is much closer to implementation details than what little I understand of free energy. Though, I did actually come across a guy who implemented a Deep active inference agent, but it still looks like a toy to me.

      • nameless1 says:

        Coding is for computers.

        https://www.sott.net/article/348454-Your-brain-is-not-a-computer

        Note that this is not saying we could not create a neural net inside a computer that can simulate a brain. In that sense we can code a brain. But we can also simulate airplanes fairly exactly and it does not mean airplanes are computers or flight is computation. A brain is just doing its brain thing, which can be simulated with computation. Air molecules around the wing of an airplane are also just doing their thing, which can be simulated with computation.

    • markpneyer says:

      Can it be measured? That seems like a decent question that I didn’t see addressed.

      Are we using “energy” because it kind of, sort of acts like energy in physics (where energy minimization is a thing).

      Or is it energy because it actually is energy, measurable (possibly using advanced equipment) in terms of joules.

      For example, raindrops follow a gravitational gradient. Does a person who acts in accordance with a future goal, making slow progress towards it, follow a “mental gradient” measurable in joules?

      • Peter says:

        It seems that Friston’s free energy bears roughly the same relation to Helmholtz free energy (a well-established concept from thermo) as information theoretic (“Shannon”) entropy does to thermodynamic entropy… possibly a bit less, as there are some pretty well-established relations between thermodynamic and Shannon entropy.

    • Yosarian2 says:

      If I’m understanding the theory correctly (and I’m probably not), wouldn’t it imply that the mathematical model the brain uses for curiosity should be very similar to the one the brain uses for acquiring food or whatever? If they turn out to to be radically different types of information processing on the neural and systems level that would basically disprove it, right?

  4. cactus head says:

    >What about the typical complaint of porn addicts – that they start off watching softcorn porn, find after a while that it’s no longer titillating, move on to harder porn, and eventually have to get into really perverted stuff just to feel anything at all?
    You tell me where a man gets his softcorn porn, I’ll tell you what his perversions are.

  5. caethan says:

    This reminds me of nothing more than Wolfram’s A New Kind of Science: evidence that very bright guys with good work behind them can be utter crackpots too.

    • Scott Alexander says:

      I think I’m tempted to lean that direction too, but given Friston’s obvious super-brilliance everywhere else, and the number of people who take this very seriously, and my own inability to understand this at more than a superficial level, it’s helpful to hear it confirmed.

    • BlindKungFuMaster says:

      That’s a bit harsh. But it’s definitely common for clever guys to fall in love with their “big idea”.

    • Alkatyn says:

      Even if the theory is in itself correct and insightful, it says something about the social or organisational structures around him that he can write in such an incomprehensible way and not get called out on it. Seems like there’s no-one who can get him to to simplify his language or answer questions clearly. The sort of bad habits we don’t let junior researchers have can reoccur as people get more senior. Part of being a good researcher is good communication and he seems to be very obviously failing at that. But in a position where his prestige and ego allows him to shift the burden of comprehension on to the audience

      • limestone says:

        Well, you can say that he’s already paying the price by having his ideas less widespread and renown than they could be. Other than that, I don’t think there is a problem. If his audience is interested in his ideas enough to decipher them, the more power to him.

        • caethan says:

          If he’s actually come up with a brilliant theory of the mind that would have enormous explanatory power, then of course there’s a problem that his writing comes off as the deranged ranting of a crackpot. A brilliant theory of the mind would be a very useful thing to have.

        • MB says:

          There’s a long history of (in)famous thinkers taking intriguing ideas from the hard sciences — magnetism, energies, relativity, quantum mechanics, to name but a few — and twisting them beyond comprehension, using them as metaphors, and building very intricate, but in the end almost content-free, philosophical and social science theories around them.
          This seems one of those cases.
          In its heyday, relativity was also a very scientific- and serious-sounding framework used to interpret all sorts of completely unrelated stuff. However, if you can’t write a formula, it’s probably bogus (and certainly not hard science).
          Or it could be that understanding this new paradigm is the true test of smarts, appropriately for a theory of everything (just like relativity before it). Probably its adepts, a self-selected group, would claim so. Sounds like a cargo cult to me.

    • sty_silver says:

      I’ll just second that all of this seems really unconvincing to me.

    • Iain says:

      One difference between Wolfram and Friston is that experts in the field seem to think Friston has something valuable, whereas experts almost invariably rolled their eyes at Wolfram.

    • Michael Handy says:

      There is a certain “penrose-ness” about it, isn’t there?

  6. Alkatyn says:

    Feels like maybe part of the issue in understanding the “prediction” part is of people mixing together different parts of the brain that are doing the “prediction.” In a simple model, its not that the top level processing or whole part of the brain/mind/person is predicting the presence of pizza and being wrong about that. But that there is some module in the brain making a model of the world with pizza in mouth, and is contrasting that with a separate model of the world obtained by the sense where it isn’t. Then trying to resolve those two models. Both those models would be far below the top level conscious reasoning we are aware of, so our lived experience of interacting with uncertainty isn’t very analogous.

    Re the second part about why people don’t sit in a dark room. Maybe the attractive bit is “the act of taking something from certain to certain” which requires seeking out uncertain things in order to be fulfilled. Similarly we value the experience of eating, not the experience of being in an environment where we have eaten everything.

    • Joyously says:

      But that there is some module in the brain making a model of the world with pizza in mouth, and is contrasting that with a separate model of the world obtained by the sense where it isn’t. Then trying to resolve those two models.

      Out of all of this, this is the part that actually jives with my own experience/tuition the most. A couple years ago when several articles were published about aphantasia (the inability to form mental images), I asked people I knew “When you’re craving a food, do you taste it in your mind?”

      The general consensus was that it can’t be that you taste it, because if it was why would you need to actually eat the thing? But my subjective experience is that I’m in line at the store, I see a chocolate bar, I taste chocolate in my mouth–or the ghost of a taste, I guess–and I deeply want to buy the chocolate and eat it for real.

      It’s the same when I crave specific music. I hear a song in my head, sometimes quite vividly, and I deeply desire to hear the song for real. In both cases, if I can’t figure out what food I’m tasting or what song I’m hearing in my head, I become disturbed because I can’t seek out the real-life version.

      I’m pretty sure this is how other people work too. It’s pretty common to say that you want a food so much that you “can almost taste it.” To me, this seems compatible with a system where some part of the brain is modeling “actually tasting the food this moment”, which creates friction with the true perception that no, you actually aren’t.

      • Toby Bartels says:

        The general consensus was that it can’t be that you taste it, because if it was why would you need to actually eat the thing?

        By this logic, there's no point to looking at the Grand Canyon either, because you can always look at a photograph (and thereafter no point to even looking at the photograph, because you can always imagine it). I'm with you; if I crave pizza, then I can taste it (although with pizza, it's more mouth feel than flavour as such), but this imagined taste is only a pale imitation of the real thing.

        • gekkey says:

          As someone with a (probably) very poor imagination, there is no point in seeing the Grand canyon because a photograph is just as good, and no point in eating a pizza unless not doing so is causing me pain.

          • Toby Bartels says:

            That's very interesting! I would have expected a poor imagination to only increase the need to experience the real thing.

            If you had a good imagination (and memory), then you wouldn't need to see the Grand Canyon more than once, and you wouldn't need to eat pizza more than once (unless you get hunger pangs), because you could vividly remember what it was like. And if you had a really good imagination, then with only a photograph to give you the general idea, you might not need to see the Grand Canyon at all, using your imagination to come up with something just as good (or better).

            But with a poor imagination, how can you get by without seeing the Grand Canyon or tasting pizza in reality unless you don't care about extraordinary sensations at all? (And if you don't, then maybe that's a cause of your poor imagination rather than an effect.) I'd be interested to know what you think about that.

  7. ajb says:

    This reminds me a little of Jayne’s paper `Entropy and Search-Theory`. In it, he argues that search theory ought to be an entropy minimisation problem, even though you have to take effort into account; so it’s not just purely about information. Instead it involves “different entropies” than information theory.
    Not sure if that is relevant. I guess I’m asking: is Friston’s free energy a mathematical abstraction, maybe a mathematical algorithm which he thinks organisms can apply to multiple tasks?

  8. Jan Kulveit says:

    This might be useful in some way? Or it might just be a cool philosophical way of looking at the world?

    I don’t know enough details, but my intuitive feeling is it may be a cool way of looking at the world, which usually is useful in some sense (models generation, intuitions,…), and not really much “falsifiable” or “predictive” itself.

    A good example may be the mentioned “variational principle”. Which e.g. in used both in general relativity in deriving Einstein–Hilbert action and in QM predicts where you will find the the paths maximally contributing to the probability distribution (basically due to constructive interference).

    Another example of such framework may be “network science” looking at complex systems as networks. Then you can use related bunch of mathematics both to predict distance at social networks, general properties of connectivity in the human brain, or critical lines in electrical grid.

  9. clocksandflowers says:

    I’m going to say something *terrible*, the intellectual equivalent of English –> Chinese –> English via babblefish — at least if we believe Scott’s hermeneutical modesty — and suggest that even though I’d never heard of Friston, I think I sorta know what he’s saying. Uh oh.

    I get the feeling that Friston found the “U-shaped” question kind of annoying. But I think it’s his fault.

    He’s articulated a principle of “free energy minimization”, which may be simple and elegant, but is profoundly misleading as a characterization of what’s really going on computationally (even if he’s right). Whenever we try to minimize a very complicated function, all we ever use is what’s called local information — the slope of the function at our current location.

    In English what this means is that it’s really easy to walk downhill, but it’s very hard to figure out the lowest point on the surface of the Earth. When we try to solve minimization problems, we always do some equivalent of walking, or maybe rolling (it’s quicker if you can pick up some speed!), downhill. This is called “gradient descent”. Since we don’t really know what we’re doing — because we have a finite amount of data — it’s often called “stochastic gradient descent”.

    So even if his theory is correct, due to computational constraints, you would really only be motivated by a desire to *change* your uncertainty. In this context it actually makes sense why you’d like puzzles — you can make progress.

    For the uninitiated… the process of “minimizing something complicated” is literally what’s going on in virtually all machine learning algorithms — they have a “loss function” and they’re trying to minimize it. However, that loss function may or may not be a “free energy”. In the context of reinforcement learning, it’s almost always the opposite of the score (e.g. in Atari).

    The answer to your discussion question is no. In fact, it’s so “no” that there are many papers in the ML literature that explicitly try to *add* a desire for “uncertainty minimization”, “curiosity”, and “I’m bored of that restaurant” to the score of games in order to make agents explore more and improve faster.

    FWIW… probably zero… your critiques seem reasonable.

    • mklaren says:

      IMHO the getting to a local minimum is what is mainly talked about. It is the ‘desire to do something else’ that may take us away from a local minimum and let us explore for a global minimum. This drive to explore gives us an evolutionary advantage. Looking at things this way, Friston’s principle works on a local and global level.

  10. Watchman says:

    Two what I initially thought were totally unrelated points but on review seem to be related and maybe even paradoxical.

    Firstly, is the model of pizza eating used here valid for assessing free energy? My reading of the interaction of action and reality that is seemingly the subject of discussion here is that it instinctively seems viable for higher orders of modelling, such as political or religious worldview. For something as mechanistic as eating pizza then (for an adult familiar with pizza at least) there is no need for much adjustment of reality or how you regard it unless the pizza has an unusual effect on your senses (so tastes bad, is spicier than expected, the sauce falls off down your nice new shirt…) in which case we might apply free energy to allow you to deal with this situation. It might be an explanation of why so many people put up with substandard pizza without complaint: the mental picture those people have of their situation does not allow for reacting badly in the particular social or public situation in which the bad pizza is encountered. I think what I’m trying to say is that free energy seems to work better as a way of conceptualising reaction to stimuli rather than the stimuli themselves. It would therefore be a much better way of understanding the adherence to one side of an argument such as gun control rather than homeostasis I guess, but that could arguably reflect the complexity of the human versus the bacteria or the wood louse.

    Secondly, were I a neuroscience researcher, this looks like exactly the sort of target I’d select to attack – the most fun papers to write are constructive take-downs of widely held but unsubstantiated views. The lack of a clear explanation why this works or how it might be applied and the widespread but hardly universal support on the relevant field are big red flags that here we have a plausible idea becoming a factoid, probably because most experts in the field have no time to really understand and check it and are working on a well-informed smell test of “sounds plausible”. Free energy may be correct but this looks to me to be a theory that has adherents but needs testing.

    And herein lies my unintentional slight paradox. For in suggesting that the best use of free energy might be to understand why people’s reactions are constrained by their worldview, we end up in a situation where we can explain adherence to an unsubstantiated idea like free energy by many relevant professionals through suggesting that once someone has determined this is a good idea, then the ‘energy’ required to change their mind becomes much greater, which is nicely explained by free energy…

    Tl/dr: This seems to work better for higher-order processing than eating pizza; it also looks like the classical unsubstantiated but widely accepted academic theory; ironically if this does work for higher-order processing then a proof of free energy’s applicability is the adherence to the idea by experts without any proof being demonstrated.

  11. niohiki says:

    So on entropy and free energy
    (that by the way come from thermodynamics – who could have told Boltzmann et al. how they would be used).

    Thing is, minimizing free energy can go in the way of minimizing entropy, but it is not quite the same. Let’s do physics! (sorry) Say you want to minimize the free energy of a gas in a box. You are asking the isolated system to not be able to do actual “work”, putting all the energy into heat, and basically maximizing entropy.

    It would seem that the dark room situation is definitely low entropy – single state, no fluctuations. I’m not fully sure how much physics maps into the brain’s discussion, and I’m inclined to say yes to

    Or it might just be a cool philosophical way of looking at the world?

    but the distinction sure looks meaningful here. Clearly constraints matter – in the gas example, entropy goes against free energy because we’re keeping actual energy constant in the isolated system. A thermal bath has different constraints, and different results. In Bayes these would show up at least as your priors. But are there more (maybe encoding gene-hardcoded needs like having pizza in your mouth, or not sitting and withering in a room)?

    Not saying this explains how to avoid everyone-chilling-in-dark-rooms in the framework, but there might be at least room for an argument.

  12. Egregious says:

    Would the simplest and most effective rule for determining which method is used to reduce uncertainty/free energy be to act on the world when possible, and to update probability function otherwise?

    I’d like to see an analysis form evolutionary arguments. Is the idea that error minimization is the best (or only)way to construct a mind, and as soon as it appears it’s a competition for who has the best (most fitness maximizing) priors? So I expect to eat the reddest fruit, and take action to cause this to happen, then survive and reproduce better. Expectations being more primitive than behavior and passed down genetically?

  13. a tim from the office says:

    Re. the “weird connection between curiosity and every other drive”, I think your examples make just as much sense under the assumption that curiosity is just one pleasure among others with no special role of its own. e.g. when you’re first becoming sexually involved with someone you have the pleasure of exploring a new person’s body and mind, plus whatever the “base” value of having sex with that person is. A few months in, the novelty is gone, far fewer curiosity-satisfying pleasures are being created, and you just have the base value, which is obviously going to be lower. So we get very similar predictions whether we assume curiosity is a fundamental driver of sexual desire or just a sort of spice that livens it up. “This person is mysterious and unknown” could be no more primal a pleasure than “this person has nice hair” except that unknownness wears off as you get to know someone and nice hair doesn’t.

    (Obviously the above is taking your example at face value, but I feel like I should point out the obvious facts that it’s not outlandishly uncommon for relationships to get more enjoyable over time rather than less, and that as far as I’m aware most people who consume porn are not addicts, at least in the sense you described. I know the examples weren’t meant to be decisive and I’m probably being pedantic but ahhh yeah.)

  14. Brendan.Furneaux says:

    As someone trained in physics and chemistry, I was expecting “Free Energy” to be one of the Free Energies from thermodynamics; I was guessing Gibbs, because it is the one which is relevant for chemical reactions, including metabolism. I have heard the framing that ecological systems (and thus, “all life”) are optimized to minimize Gibbs free energy. In less technical language, “if there is food, something will learn to eat it”.

    I can see a connection in the case of the wood lice; the behavior of “moves faster in the sun” leading to “mostly isn’t in the sun” feels very much like the way chemical equilibria work. I’m not familiar with the use of the term in the context of Bayesian optimization except for what I just read in your post, but I can see a way in which it is at least metaphorically related. Thermodynamic entropy (times temperature) is one of the terms in the formula for Gibbs free energy. Informational entropy (“uncertainty”) seems to be one of the aspects of Friston’s usage of “free energy”. Thermodynamic and informational entropy are, at least to a first approximation, the same thing. I wonder how much of the rest of this information-processing version of free energy is metaphoric, and how much is strictly analogous.

    • fion says:

      I am also curious about this. I’m also trained in physics and when I see “free energy” I immediately think of Gibbs or similar. I don’t know very much information theory, but I am aware of the connection between informational entropy and thermodynamic entropy. Is it appropriate to call informational entropy “uncertainty” or is that misleading?

      Does Friston give a formula for his “free energy”? Given how often different physical systems obey the same mathematics, I wouldn’t be surprised if there is something a bit deeper than mere metaphor here. One question might be if there is some analogy for the law of conservation of energy. If there is “free” energy, is there also “total” energy?

      • jamii says:

        > Does Friston give a formula for his “free energy”?

        https://en.wikipedia.org/wiki/Free_energy_principle#Action_and_perception

        • fion says:

          Thanks. For some reason it didn’t occur to me that it would have a wikipedia page. Now I feel silly for not googling it. 😛

          Interestingly, there is indeed a concept of “total energy” in that equation, although I don’t think I can interpret it…

        • Toby Bartels says:

          So free energy is energy minus entropy. That looks somewhat like Helmholtz free energy, which is energy minus temperature times entropy. Informational entropy and thermodynamic entropy are more closely related than a rough analogy; the thermodynamic entropy really is the informational entropy under certain conditions, at least in principle. But the relationship between this energy and physical energy seems less precise; for one thing, they don't have the same physical dimensions.

    • vV_Vv says:

      The usage is metaphoric, it does not refer to actual physical energy, just like information entropy does not refer to thermodynamic entropy, they are mathematically similar and can be formally connected in certain settings, but in the common usage the information entropy of the bits being transmitted through a copper cable has little to do with the thermal shaking of the copper atoms.

      The mathematical definition of the negative variational free energy, also known as evidence lower bound, is there. There is nothing controversial about it, it’s a standard way of doing approximate Bayesian inference. Friston is trying to stretch that framework to include action, which seems less rigorous and not particularly informative.

    • herculesorion says:

      I think the problem is that he’s using “free energy” to mean what anyone else in the world would call “error”.

    • sclmlw says:

      As a biologist, I completely agree. This association is either a huge distraction or a key to understanding the whole thing. All I could think of while reading the post was about activation energy and chemical reactions.

      This could lead to all sorts of analogous situations, such as mood being like a catalyst to allow an otherwise high-activation-energy emotion to catalyze a reaction and require less free energy to occur. Likewise, you can imagine that the reason people aren’t committing to certain actions is that there is not sufficient local free energy to allow the reaction to proceed. If any of this is true (again, I could totally be wrong and this is a huge distraction) then it should make all the difference what microenvironment the expected response/decision is occurring in, which would explain a lot of the otherwise confusing possible implications of this theory (such as the dark room problem and other questions of motivation and translation).

      If you could find a way to make it predictive, this hypothesis could lead to insights as to why certain behaviors are biased toward, and lead to causal explanations about things like depression, anxiety, etc. But it sounds more like the early days of Hari Seldon’s theory than like fully-formed psychohistory.

    • fr8train_ssc says:

      I too was initially confused at first. I would hope that Scott and others refer to this concept henceforth as “Friston’s Free Energy”

      On the other hand, as Toby pointed out, the term was probably chosen building off Shannon’s related theories into information theory and related thermodynamics.

      I’m surprised Friston didn’t choose another term, like “Cache.” I see a more useful analogy using that Computer Engineering term; Computer Systems attempt to minimize the use of slower time access media like DRAM or Disk by retaining the most pertinent data in registers or low level SRAM (Which also benefits by reducing clock cycles to access/minimizing energy usage of the system as a whole) When the cache is exploitable the event is called a “Cache-Hit” and results in inproved performance of the system. Conversely, when data in the cache cannot satisfy the requirements for the current instruction, the result is called a “Cache-Miss” and additional instructions must be made to access the data to be manipulated.

      In the case of humans, Cache would refer to the set of readily accessible heuristics and environmental conditions that provide low-energy or instant computation. Just as a computer system would be suspect of malfunctioning if it had repeated cache misses, human cognitive pathologies would be suspect if the brain was consistently ‘cache missing’

      The Cache term also fits elegantly with observations on how short-term memory is linked with the concept of willpower. Cache is often referred to as a computer’s “short-term memory”, and improving cache size is one way of significantly improving the performance of the computer. A human improving their willpower would suggest positive life outcomes for themselves.

  15. JPNunez says:

    The great part about following a parody account of a scientist is that now you don’t know if the accounts twitter now suggests to follow are parodies too.

  16. meltedcheesefondue says:

    >if the free energy principle were right, would it disprove the orthogonality thesis?

    As far as I can tell, it would not – unless you think that the determinism of physics also disproves the orthogonality thesis (because if the world is deterministic, then you can’t get *every* possible motivation, right? Just the ones that actually happen in the worly).

    Free energy explains behaviours and their opposites – it explains why someone punches someone or refrains from doing it, eats sushi or hamburger or soylent for lunch or skips it entirely, does/doesn’t, wants/doesn’t want, stays/leaves… builds paperclips/doesn’t build paperclips…

    This doesn’t mean that free energy is vacuous, any more than sometimes predicting sunshine and sometimes predicting snow makes weather prediction vacuous. It means that weather prediction/free energy need some other set of inputs to predict an action. In the case of weather prediction, this is things like pressure, wind speed, satellite imagery, etc… In the case of free energy, it’s less clear what the other inputs are, but motivation and preferences seem perfectly valid inputs.

    (for the Bayesian version of Free Energy, the evidence and the priors can serve as the – variable –
    inputs)

    Crossposted at LesserWrong because it’s relevant to there: https://www.lesserwrong.com/posts/wpZJvgQ4HvJE2bysy/god-help-us-let-s-try-to-understand-friston-on-free-energy#hD9pPyzvKTcxwQmTW

    • Eli says:

      Strong agree, Stuart. If anything, the Free-Energy Principle strongly evidences the orthogonality thesis for minds-in-general (rather than just the embodied-allostatic minds real vertebrates have), because it says that as long as you can encode your preferences as a probability model (and that’s at least hypothetically a class of models encompassing everything computable), you can reshape your environment to suit that preference by optimizing free-energy, with the only limits being the accuracy, precision, and computational tractability of your probability models.

      We can even use the free-energy principle to state a somewhat stronger, less hypothetical Orthogonality Thesis: “for every possible sampling frequency (distribution) of physically possible distal causes and statistically observable outcomes, there can exist a free-energy minimizing computation with respect to that distribution.”

      This is pretty yikes-y.

  17. VirgilKurkjian says:

    If Friston were correct in any sort of meaningful way, he would be able to demonstrate that this approach is able to do something other approaches can’t. But his attempts to do so (e.g.”Active Inference, Curiosity and Insight”) have been unnecessarily complicated, unclear, and don’t demonstrate that they can solve problems other approaches fail at.

    Anyways, I found Gershman and Daw’s perspective pretty clear.

    Friston and colleagues (Friston et al., 2009, 2010) formulate the decision optimization problem in these terms. There are at least two separable claims here. The technical thrust of the work is similar to the ideas discussed above: if one specifies a desired equilibrium state distribution (here playing the role of the utility function), then this can be optimized by free energy minimization with respect to actions. However, the authors build on this foundation to assert a much more provocative concept: that for biologically evolved organisms, the desired equilibrium is by definition just the species’ evolved equilibrium state distribution, and so minimizing free-energy with respect to actions is, in effect, equivalent to maximizing expected utility. What makes this claim provocative is that it rejects decision theory’s core distinction between a state’s likelihood and its utility: nowhere in the definition of free-energy is utility mentioned. The mathematical equivalence rests on the evolutionary argument that hidden states with high prior probability also tend to have high utility. This situation arises through a combination of evolution and ontogenetic development, whereby the brain is immersed in a “statistical bath” that prescribes the landscape of its prior distribution. Because agents who find themselves more often in congenial states are more likely to survive, they inherit (or develop) priors with modes located at the states of highest congeniality. Conversely, states that are surprising given your evolutionary niche — like being out of water, for a fish — are maladaptive and should be avoided.

    Although the free energy principle appears at the least to be a very useful formulation for exposing the computational parallelism between perceptual and decision problems, the more radical maneuver of treating them both as literally optimizing a single objective function is harder to swallow. A state’s equilibrium likelihood and its utility are, on the classical view, not the same thing; rare events might be either unusually bad (being out of water, for a fish), good (being elected president, for an African American), or indeed neither. The idea that the two are united within a biological niche seems in one sense to appeal to evolutionary considerations to the end of substituting equilibrium for evolution, and risks precluding adaption or learning. Should the first amphibian out of water dive back in? If a wolf eats deer not because he is hungry, but because he is attracted to the equilibrium state of his ancestors, would a sudden bonanza of deer inspire him to eat only the amount to which he is accustomed? Should a person immersed in the “statistical bath” of poverty her entire life refuse a winning lottery ticket, since this would necessitate transitioning from a state of high equilibrium probability to a rare one? In all these cases, the possibility of upward mobility, within the individual, seems to rest on at least some role for traditional notions of utility or fitness in guiding their decisions, though the idea remains intriguing that in the ethological setting these have more in common with probability than a decision theorist might expect.

  18. Deiseach says:

    But almost all the most interesting neuroscience of the past decade involves this guy in one way or another. He’s the most-cited living neuroscientist, invented large parts of modern brain imaging, and received the prestigious Golden Brain Award (which is somehow a real thing). His Am I Autistic – An Intellectual Autobiography short essay, written in a weirdly lucid style and describing hijinks like deriving the Schrodinger equation for fun in school, is as consistent with genius as anything I’ve ever read.

    Is it at all feasible that this one particular theory is an instance of the Emperor Has No Clothes, but since nobody can understand it enough to criticise it, and he has an otherwise stellar reputation, the consensus is “yes, wonderful, glorious!”

    It wouldn’t be the first time a genius struck out with a pet theory that wasn’t anchored to anything in reality, but they had so much intellectual firepower to expend on defending it that any critics got hopelessly lost in the underbrush.

    • fion says:

      Perhaps I’m just naive and optimistic* but I would argue that genius-puts-forward-useless-theory-that-everybody-else-goes-along-with-cos-genius-is-genius is not very common these days, and that if the theory didn’t have anything of value to offer it would be either ignored or (more likely if the genius has clout) criticised.

      *I’m definitely naive and optimistic

      • caethan says:

        Depends on the field. And psychology and neurobiology… don’t have good track records.

    • VirgilKurkjian says:

      This is absolutely what I think is going on. At the very least, if Friston’s theory has anything to it, he’s doing such a bad job of it that most of the best minds in all the related fields can’t make sense of it.

    • ohwhatisthis? says:

      Oh that’s definitely what’s going on here.

    • Grek says:

      I am reminded of Scott’s What The Hell, Hegal? post.

  19. azhdahak says:

    Whenever I see a claim about the human brain, I try to relate it to linguistics, because that’s what I know something about.

    Friston’s general tendency seems to be maximal reduction. First he tried to find a ” singular explanation for the shape of things, just starting from the premise that something existed”; then he tried to reduce all of physics to one page; and now he’s trying to reduce the brain to one equation.

    These attempts at reduction haven’t worked all that well in linguistics. Chomsky tried to reduce syntax to universal grammar; and while I’m not a syntactician, I gather that’s not so popular anymore. The neogrammarians tried to reduce historical phonology to the principle of regularity, and while that was the major advance that made historical linguistics in the modern sense possible, it’s only 95% right. If Everett is right about Piraha, Chomsky’s claims about recursion are only 99% right. And most claimed universals stall at the 99%-right stage. The one I remember best is a claim by a prestigious linguist that (to paraphrase) no language has timing contrasts in glottal features — that is, you won’t see contrasts between preglottalization and postglottalization, or between preaspiration and postaspiration. But there’s a dialect of Tibetan that’s claimed to contrast preaspiration and postaspiration…

    (The generally best-known example of this is that it was thought that OVS word order was impossible, until someone who studied Hixkaryana, a language with OVS word order, happened to attend a lecture that mentioned that universal.)

    Now, there are some universals that do hold. You won’t find a language that uses the operation of reversing the order of the words in the sentence, or the operation of reversing the order of the phonemes in a word. (Not that phonemes are real, of course!) But I don’t know of a good way to distinguish these universals from “you won’t find a language with OVS word order” or “you won’t find a language that has timing contrasts in glottal features”. And even if you do find a universal, it may not be due to anything about the brain; it may just be that there aren’t many diachronic paths leading to violation of the universal. It’s empirically a universal, for example, that no language has a linguolabial ejective consonant, but this is by coincidence — [p̪ʼ] is an entirely pronounceable consonant, but ejectives and linguolabials are rare and highly areal linguistic features, and it just so happens that no language has developed both.

    Reduction in linguistics is a very hard process — in general, the best you can hope for is something like the advance of the neogrammarians: a rule that holds well enough to allow advancement in a field, and will be assumed as true until someone finds the nigh-inevitable exceptions. And this is just reduction in one subdomain! I’ve never heard of an attempt to develop a grand unified theory of all subfields of linguistics. So I’m not very optimistic about the prospects for grand unified theories of the brain.

    • BlindKungFuMaster says:

      “You won’t find a language that uses the operation of reversing the order of the words in the sentence, or the operation of reversing the order of the phonemes in a word. (Not that phonemes are real, of course!) But I don’t know of a good way to distinguish these universals from “you won’t find a language with OVS word order” or “you won’t find a language that has timing contrasts in glottal features”.”

      Well, the universals you mention, that do hold, obviously hold because reversing the order of sequences is something the human brain is very bad at. And that is a direct result of the brain being a neural network in which the connections between two neurons only go in one direction.

      • caethan says:

        And that is a direct result of the brain being a neural network in which the connections between two neurons only go in one direction.

        Is it now. It’s amazing how many predictions of things that we already knew neurobiologists have managed to get right.

        • poignardazur says:

          Yeah, I had the same thought. (except for the “neurobiologists” part)

          If nothing else, I could imagine a language where reversing the order of words in a sentence is used to communicate sophistication or seriousness. Now, maybe an actual linguist could explain to me that what I just described is ridiculous and makes no sense because of principle X and Y.

          But I don’t find the idea that “brains have a hard time reversing sequences” obviously implies “there is no language ever with grammatical structures based on reversing sentences”.

      • azhdahak says:

        Well, the universals you mention, that do hold, obviously hold because reversing the order of sequences is something the human brain is very bad at. And that is a direct result of the brain being a neural network in which the connections between two neurons only go in one direction.

        I don’t know math, but does this predict the impossibility of morphological metathesis?

        That is, if you have verbs verb, word, and so on, would it be impossible for a language to mark the past tense of those verbs by reversing the positions of the vowel and the second consonant, to give vreb and wrod?

  20. A crude attempt at a hypothesis:

    I have a model that says that there should be as much Good and as little Bad as possible. This is a very high-level model with a lot of priority, but not the only model. When I do things that predict Good, like eating pizza, my brain minimizes free energy by making me feel Good.

    Why do we habituated to some types of pleasure but not others? I think the answer is that some types of pleasure predict higher-level Good and others do not. That is, endlessly consuming pornography doesn’t contribute to any higher level model that predicts Good. Pleasure generated through surprise about the amount of Good relative to your model of the world corresponds to the sorts of temporary pleasures that one quickly habituates to as one’s model of the world is slowly updated to include the source of that pleasure. The only way to continue to feel enjoyment from something then is to either a) do the addict thing where you keep seeking out more extreme and novel levels of the thing forever or b) for that thing to predict higher-level Good/pleasure. A successful and intimate long-term relationship is a model that continues to predict Good even in the absence of surprise, so your brain minimizes prediction error by making you experience Good. Mastery of things like sports and video games is associated with a model that predicts Good, and so continue to be pleasurable when you habituate to parts of them.

    This theory is nice because it explains masochism. Certain kinds of pain are part of higher-level models which predict Good, so when we experience them our brains minimize prediction error to make us feel Good. If pain (in certain contexts) is part of your model of sex, then your brain minimizes prediction error by making you feel sexual arousal. If (like me) you have a model that says that going on 10-mile runs in blizzards that prevent you from seeing more than a few feet in front of you and make you wonder if you’re ever going to feel your fingers again is something that is aesthetically Good and part of a model of you being tough and therefore Good, you your brain minimizes prediction error by making you experience a sort of pleasure alongside the pain.

    Somewhat relatedly, I’ve noticed that as I’ve been running throughout my life, I’ve come to learn which sorts of pain predict bad things and which ones predict Good things, and the former feel more painful than the latter. Shin splints and tendinitis lead to Bad, and so my brain will not minimize prediction error by reducing the pain, and may even increase it. Muscle soreness predicts Good and therefore begins to feel Good. The subjective difference has become more pronounced as I’ve gained a more complete model of the world and how injuries and training work.

    So pleasures/pains that stem from unaccounted-for surprise will feel straightforwardly Good or Bad, pleasures that predict Good will feel even better, pains that predict Bad will feel even worse, and pain that predicts Good or pleasure that predicts Bad will feel masochistic. These all feel like natural categories in my subjective experience.

    We can go even further, I think, and try to explain your wanting/liking/approving model. Wanting seems like a Bad unaccounted-for surprise that stems from not having a thing or a Bad surprise that is accounted for by a model that predicts Bad. Approving happens when you contemplate parts of your life and notice that they predict Good, which causes your brain to minimize prediction error by making you feel Good. And models can exist within models, so you could experience a desire for a state that you would approve of, etc.

    This all makes sense in my head and is hopefully not just confused rambling.

    (This is also neatly explained but I wasn’t sure where to put it)

  21. Toby Bartels says:

    Like Hamilton’s Principle of Stationary Action, it cannot be falsified. It cannot be disproven.

    An ironic example, considering that Hamilton didn't say anything about stationary action! Hamilton's principal was about least action, and it was falsified in (I think) the 1920s (compared to the 1830s for Hamilton's work on it), which is why we now have the principle of stationary action instead.

    I get the point: a framework doesn't make specific falsifiable predictions, but is rather a way to organize theories that will themselves make predictions. Nevertheless, theories that match experiment and observation may or may not fit into the framework, and in that way frameworks can be falsified. Which is a good thing, for all of the reasons that Karl Popper wrote about!

    • Toby Bartels says:

      Hamilton's principal

      That's Hamilton's principle, of course.

      (In my defence, I was probably using a speech-to-text interpreter for the first draft of the comment. Presumably, Google checked my YouTube history and decided that I was probably referring to Alexander Hamilton as an agent of George Washington, so that Washington was Hamilton's principal.)

  22. Jiro says:

    Suppose my internal probability function Q assigns high probability to states in which I’m having a slice of pizza, while my sensory input suggests that I’m currently not having a slice of pizza.

    The whole argument only seems to work because it ignores details.

    Assigning high probability to “I am in a state where I am having pizza now” isn’t the same as assigning high probability to “I will be having pizza in the near future”. It isn’t a prediction either, and it can’t be changed by your actions–it only seems like you can change it because “now” has different referents.

    If you say “okay, well, I meant the latter and didn’t phrase it properly”, then there’s still a difference between “I predict that if I do nothing, I will be eating a slice of pizza” and “I predict that if I do X, I will be eating a slice of pizza”. The first of these is false and remains false whether you do X or not. The second is true but leads to you performing actions to change which prediction is relevant, not change whether a prediction is true or false.

  23. Protagoras says:

    After all, if you’re Paris of Troy, getting offered the choice between power, fame, and sex – then some mental module must convert these to a common currency so it can decide which is most attractive.

    Wasn’t Paris offered the choice of power, wisdom, or sex?

    • Shannon Alther says:

      Hera offered him rule over Europe and Asia, Athena offered him wisdom and skill in war, and Aphrodite offered him the love of the most beautiful woman on Earth.

  24. Doesntliketocomment says:

    I don’t have a lot of time to make the comment I’d like, but the gist would be this: If this is true, and this is the fundamental guiding principle of intelligence, then this should dictate that there be a finite number of first-order behaviors/reactions. It should be possible to enumerate them, and then break down complex actions in terms of these reactions. If the list of first order reactions grows without limit, I wouldn’t put much stock in the idea.

    As for predictive power, if true this would mean that behaviors that we can’t assess are operating under a hidden input (this person tries to reduce free energy in X way because of Y successful/failed experience) and that behaviors could be redirected by understanding more about the input space (they would be equally inclined to pursue strategy Z given the opportunity)

  25. Icedcoffee says:

    I’ll have to read this again later, as I found the whole concept to be confusing. I’m having trouble squaring “reducing uncertainty as the motivation,” not the least because I can think of several situations where I very notably do NOT seek to reduce uncertainty. (E.g. asking out a romantic interest.) I can totally see how reducing uncertainty is a motivation, but not the only one.

    Or maybe I’m thinking about this at too high a level. At the sense/cognition level I can see that the part of the brain concerned with understanding the world would be focused purely on obtaining the most accurate understanding of that world, and maybe this free energy concept makes sense at that level, but I’m not sure how this can be extrapolated to motivating action. E.g. why am I even thinking about pizza if I’m not hungry?

    • poignardazur says:

      I’m throwing things at the wall here, but the whole “How do we aggregate motivations?” problem reminds me of programming.

      Specifically, a kind of programming problem where you know roughly which submodules your program has / should have, which tasks these submodules should perform (e.g. “fetch webpages from the internet”, “display the webpages on the screen”), but you’re not sure which architecture can actually make these submodules work together efficiently.

      The free energy feels kind of feels like Friston is assuming that, if you have the right submodules, they just work together “automatically”. Sort of, this isn’t the best metaphor.

  26. Eli says:

    Hi,

    I now work in a lab allied to both the Friston branch of neuroscience, and the probabilistic modeling branch of computational cognitive science, so I now feel even more arrogant enough to comment fluently.

    I’m gonna leave a bunch of comments over the day as I get the spare time to actually respond coherently to stuff.

    The first thing is that we have to situate Friston’s work in its appropriate context of Marr’s Three Levels of cognitive analysis: computational (what’s the target?), algorithmic (how do we want to hit it?), and implementational (how do we make neural hardware do it?).

    Friston’s work largely takes place at the algorithmic and implementational levels. He’s answering How questions, and then claiming that they answer the What questions. This is rather like unto, as often mentioned, formulating Hamiltonian Mechanics and saying, “I’m solved physics by pointing out that you can write any physical system in terms of differential equations for its conserved quantities.” Well, now you have to actually write out a real physical system in those terms, don’t you? What you’ve invented is a rigorous language for talking about the things you aim to explain.

    The free-energy principle should be thought of like the “supervised loss principle”: it just specifies what computational proxy you’re using for your real goal. It’s as rigorous as using probabilistic programming to model the mind (caveat: one of my advisers is a probabilistic programming expert).

    Now, my seminar is about to start soon, so I’ll try to type up a really short step-by-step of how we get to active inference. Let’s assume the example where I want to eat my nice slice of pizza, and I’ll try to type something up about goals/motivations later on. Suffice to say, since “free-energy minimization” is like “supervised loss minimization” or “reward maximization”, it’s meaningless to say that motivation is specified in free-energy terms. Of course it can be: that’s a mathematical tautology. Any bounded utility/reward/cost function can be expressed as a probability, and therefore a free-energy — this is the Complete Class Theorem Friston always cites, and you can make it constructive using the Boltzmann Distribution (the simplest exponential family) for energy functions.

    1) Firstly, free-energy is just the negative of the Evidence Lower Bound (ELBO) usually maximized in variational inference. ou take a P (a model of the world whose posterior you want to approximate), and a Q (a model that approximates it), and you optimize the variational parameters (the parameters with no priors or conditional densities) of Q by maximizing the ELBO, to get a good approximation to P(H | D) (probability of hypotheses, given data). This is normal and understandable and those of us who aren’t Friston do it all the time.

    2) Now you add some variables to P: the body’s proprioceptive states, its sense of where your bones are and what your muscles are doing. You add a P(D’ = bones and muscles), with some conditional P(D | D’) to show how other senses depend on body position. This is already really helpful for pure prediction, because it helps you factor out random noise or physical forces acting on your body from your sensory predictions to arrive at a coherent picture of the world outside your body. You now have P(D | D’) * P(D’ | H).

    3) For having new variables in the posterior, P(H | D’, D), you now need some new variables in Q. Here’s where we get the interesting insight of active inference: if the old P(H | sensory D) was approximated as Q(stuff H ; sensory D), we can now expand to Q(stuff H ; sensory D, motor M). Instead of inferring a parameter that approximates the proprioceptive state, we infer a parameter that can “compromise” with it: the actual body moves to accommodate M as much as possible, while M also adjusts itself to kinda suit what the body actually did.

    Here’s the part where I’m really simplifying what stuff does, to use more of a planning as inference explanation than “pure” active inference. I could talk about “pure” active inference, but it’s too fucking complicated and badly-written to get a useful intuition. Friston’s “pure” active inference papers often give models that would have very different empirical content from each-other, but which all get optimized using variational inference, so he kinda pretends they’re all the same. Unfortunately, this is something most people in neuroscience or cognitive science do to simplify models enough to fit one experiment well, instead of having to invent a cognitive architecture that might fit all experiments badly.

    4) So now, if I set a goal by clamping some variables in P(goal stuff H=pizza) (or by imposing “goal” priors on them, clamping them to within some range of values with noise), I can’t really just optimize Q(stuff H) to fit the new clamped model. Q(stuff H) is really Q(stuff H ; sensory D, motor M), and Q(sensory D) has to approximate P(sensory D). Instead, I can only optimize Q(motor M | goal stuff H=pizza) to fit P(body position D’ | goal stuff H=pizza). Actually doing so reaches a “Bayes-optimal” compromise between my current bodily state and really moving. Once Q already carries a good dynamical model (through time) of how my body and senses move (trajectories through time), changing M as a function of time lets me move as I please, even assuming my actual movements may be noisy with respect to my motor commands.

    That’s really all “active inference” is: variational inference with body position as a generative parameter, and motor commands as the variational parameter approximating it. You set motor commands to get the body position you want, then body position changes noisily based on motor commands. This keeps getting done until the ELBO is maximized/free-energy minimized, and now I’m eating the pizza (as a process over time).

    • Shannon Alther says:

      Could you explain what predictions you can make with this model of active inference that you otherwise couldn’t without it?

      • Eli says:

        Sure: it lets us unify our nice models of planning as inference for decision making, with what used to be separate models of motor control.

        There are very nice results regarding predictive coding in the motor system, saying that once we treat motor control as a “predictive” parameter, we’ve now reduced motor control and decision making to model predictive control, which is really computationally efficient.

        Eventually, this boils down (I’m still short on coffee today) to saying stuff like “schizophrenic people should be able to more easily tickle themselves”, and that happens to be what we find.

        Also, it ends up turning out that active inference as an algorithm for predictive control is really fucking important for control over internal organ systems, which gives rise to motivation and emotion and (according to one of my other advisers) most of the rest of behavior.

        • Shannon Alther says:

          Eventually, this boils down (I’m still short on coffee today) to saying stuff like “schizophrenic people should be able to more easily tickle themselves”, and that happens to be what we find.

          A far simpler explanation is that schizophrenia is characterized by an inability to determine the source of actions, and so when schizophrenics tickle themselves they confuse it with being tickled by someone else. Forgive me, but I don’t see where the inference/planning model comes in here.

          Also, it ends up turning out that active inference as an algorithm for predictive control is really fucking important for control over internal organ systems, which gives rise to motivation and emotion and (according to one of my other advisers) most of the rest of behavior.

          Most internal organ systems are self-regulating. Where does active inference come in?

          • herculesorion says:

            I think what he’s asking is *why* can’t schizophrenics determine the source of actions?

          • Eli says:

            @HerculesOrion: yeah, exactly. In fact, the schizophrenics tickling thing was described in Surfing Uncertainty, so just going back and reading it will help more than my explaining it.

            Most internal organ systems are self-regulating. Where does active inference come in?

            Hell no, they’re not! At least, not without the brain. As my neuro adviser always points out, if your brain didn’t order the body to prepare blood flow, you’d collapse from oxygen shortage when you tried to stand up from a chair.

          • Shannon Alther says:

            @Eli

            I phrased my objection poorly, but on reflection if you limit brain involvement to hormones secreted by the hypothalamus you actually get pretty close: many aspects of the circulatory, digestive, excretory, immune, reproductive, and respiratory systems get by just fine without sensory input from the CNS. Your kidneys respond to blood solutes via sensory input from chemoreceptors, your blood pressure is most actively regulated by baroreceptors, & the enteric nervous system has most of the fine control over digestive effector cells.

            But yes, apart from my own pedantry a lot of internal regulation is caused by sensation and the CNS. So what? How does active inference come in?

            And @herculesorion as well.

            Schizophrenics can’t determine the source of actions because something is damaged. What evidence is there that this damage is related to the active inference capability specifically? As I understand it, schizophrenia may have multiple causes (excess D2 receptor function and reduced NMDA receptor function are both popular hypotheses). Are both of these related to the active inference algorithm?

            Forgive me, but I still don’t see what evidence supports this model, or what it tells you that we didn’t know before.

    • PDV says:

      >Any bounded utility/reward/cost function can be expressed as a probability, and therefore a free-energy — this is the Complete Class Theorem Friston always cites, and you can make it constructive using the Boltzmann Distribution (the simplest exponential family) for energy functions.

      Please elaborate. I read through the wikipedia page on admissible decision rules and skimmed Wald 1945, “An Essentially Complete Class of Admissible Decision Functions”, which is the only clear result on the subject. None of that seems to justify the utility function to probability function mapping; I don’t see any reading of the theorems or equations where you get a probability function without taking the loss function and decision rule as fixed, which is taking everything essential about the utility function as fixed.

  27. Shannon Alther says:

    Discussion question for machine ethics researchers – if the free energy principle were right, would it disprove the orthogonality thesis?

    No, and for two reasons.

    1) The free energy principle is descriptive only, as Friston says in the Alius interview. It (apparently) makes no predictions about behaviour, much less about terminal goals.

    2) It applies specifically to biological organisms. Most of your sources note that this behaviour arose through natural selection, to handle certain specific types of uncertainty related to staying alive. It has no bearing whatsoever on, say, alien intelligences, much less computers, which can be programmed with any mind we can design.

    This assumes that the free energy principle is true & correct, which I’m not sure that it is. Being unfalsifiable is a bad start, as is the fact that Karl Friston’s work is impenetrable. Most simplified explanations of the free energy principle are either equally impenetrable or seem somehow confused (this one is difficult to quantify, but reading this hasn’t really given me any insight into behaviour; if this is actually revolutionary, there should be some combination of words that makes the true meaning shine through like the sun on a cloudless day) and as far as I know, nobody has used free energy or its related concepts to achieve anything remarkable. Strong evidence that this is probably pointless.

    • Toby Bartels says:

      Most of your sources note that this behaviour arose through natural selection, to handle certain specific types of uncertainty related to staying alive. It has no bearing whatsoever on, say, alien intelligences, much less computers, which can be programmed with any mind we can design.

      This is a minor point, but why shouldn't this apply to alien intelligences? I get that it won't apply to near-future AIs, which we program directly and which undergo no selection pressure. But an alien intelligence that manages to cross the galaxy and reach us has proved itself in the real world. Or to put it another way, we might make a paperclip maximizer that destroys the world, and it might be nothing like us in the ways relevant to Friston's theories (even assuming that Friston is correct). And some aliens might make a smeeprleclip maximizer that destroys their world, and it need not be anything like them or us. But if an AI from beyond the stars arrives to encounter us, then it will be subject to Friston's theories (assuming that they are correct), because it's not just some random AI, but the AI that managed to get to us first, surviving all of the trials along the way and outcompeting all of the other alien intelligences. (And the same if the intelligence that gets to us first has more natural origins.)

      • Shannon Alther says:

        I don’t see what the distinction is between the alien paperclip maximizing AI and the alien AI that encounters us first. It could have a goal designed in a similar way to “Maximize paperclips”, perhaps “Map the universe.” It needn’t undergo selection pressure to be viable, nor does it need to outcompete all other alien intelligences (space is pretty big).

        The notion I get here is that surviving selection pressures requires an ability to sense your environment and use that to make predictions. Friston then claims that this is basically everything there is underlying animal behaviour.

        Let’s say we agree with him. Could this fail to happen on other, imagined worlds? Absolutely. Imagine a world full of life spawned from diamond, immortal and unchanging. What do these crystalline monstrosities care if their predictions are inaccurate? What harm might come to them? Or a world where a chance infusion of energy (perhaps from a natural nuclear reactor) spawns a basic computer, an infinitesimal chance in a vast cosmos, along with a mechanism for self-replication. Eventually, a mind blind to the existence of the universe spreads its substrate across its planet. What use has it for predictions?

        Aliens organized in insect-like hives where the drones take the priors of their masters as absolute law. Planets populated by engineered creatures that have no subjective experience or desires, a thought experiment by aliens specifically to prove a point about minds. Boltzmann brains.

        The universe is a big place. Making steps from what we can observe from intelligence on earth to what intelligence might be like in theory is a huge leap. Even in terms of basic biochemistry, we assume that water is probably necessary for life but there’s no reason ammonia or a liquid hydrocarbon like propane couldn’t suffice in theory.

        • Toby Bartels says:

          If you want to talk about infinitesimal chances in a large cosmos, sure, there will be exceptions. But what will be most common, what we are most likely to meet, is whatever is most successful. That's the point of evolution.

          I don't normally advocate imagining that alien worlds will be much like ours. But in this particular subthread, I'm considering what would follow if Friston is correct about explaining the nature of our minds from basic information-theoretic principles. If he's right about that, then most alien minds, most of the time on most places, will have the same features.

          In contrast, I agree that an unfriendly AI about to destroy the world won't; that's a highly unusual situation that will exist for only a brief period on any one planet.

          • Shannon Alther says:

            The free energy principle isn’t derived from information-theoretic principles, it’s Friston’s observation about how animals on Earth make decisions. This could be because it’s inherent to the mind operation process (I don’t think so, but I’m not informed enough to say), of it could be because animals on Earth have neurons. Maybe our nearest galactic neighbours are single-celled blobs fifteen feet wide that ‘think’ using a strictly chemical process, or using photons or something. I’m not even confident in the free energy principle at all, much less that it applies to all evolved intelligent species everywhere.

            Unrelated, but the math notes on your website are excellent. Thank you for sharing.

  28. oom says:

    I’ve spent two+ years on understanding Friston’s theory and implementing it in a way that’s practical e.g. for robotics. It works beautifully, and it is a game changer. I will admit that it took me almost a year from understanding the theory to devise the right bits and pieces (algorithms, data structures) to implement it. For example, what is “expected precision”? It’s kinda clear on a high level what it is after banging your head against 25 of Friston’s papers. But how the heck do you implement it? Well, cracked that nut eventually and a few more.
    In my experience, most people get wrapped around the axle somewhere and then give up. A popular one is the motivation question (e.g. money, power, sex). It’s hard to reduce all of these very concrete, salient motivations to the overarching one – survival.
    There are a few other things to understand before it really all clicks. For example, the fundamental unit of an inference machine is the “belief”. A belief is the closest thing you can have to knowledge. It is what you learn when you learn. Examples for beliefs are “the sky is blue”, “when I drop something, it will fall”, “I like money” or “I shall maintain my body temperature at 36.5°C”. It’s probabilistic, so it will have a confidence (or “expected precision”) attached to it. The crucial bit is that action and perception is generated based on beliefs. Also, one can have beliefs about (future) observations as well as their precision, separately.
    This is gonna be a bit handwavy, but bear with me: A person who seems motivated by money has a strong belief that money will increase the precision of future observations and thus decrease the uncertainty in their model. That person will seek evidence for money as a way to “feel good” (by decreasing uncertainty). The stronger that belief is relative to other beliefs about what will increase precision of future observations, the more often it will be selected to generate action, and the more dominant it will be in that person’s observable behavior. Here’s the interesting bit though – the inference machine also keeps track of statistics, and uses them to build a prior. Things you perceive or do often are, by definition, good for you (because you keep seeing or doing them). So if that person had many experiences of observations of them getting money *actually* reducing their uncertainty about their strong belief that they shall be safe, they will build a prior over high-level behaviors that, at some point in time, will be strong enough to drive behavior even without the uncertainty reduction. A habit (or obsession perhaps).
    Anyway, got over most or all of the humps, can confirm that it still all makes sense and would be happy to help others over them, too.
    The one thing I would add is that the theory is actually quite simple, and, after one really understands it, provides explanations for everything in biology and the human experience you throw at it without any extra backpacks. Sounds like Occam’s razor to me.

    • Shannon Alther says:

      A person who seems motivated by money has a strong belief that money will increase the precision of future observations and thus decrease the uncertainty in their model. That person will seek evidence for money as a way to “feel good” (by decreasing uncertainty).

      What about a person who seems motivated by sex, drugs, or thrill-seeking?

      • oom says:

        A thrill-seeker has their basic needs satisfied and is confident that they can satisfy their basic needs in the future. They also believe with high confidence that they will be safe. In that case, the exploration term dominates in the Expected Free Energy calculation, and explorative policies are selected.
        Dopamine is a hell of a drug if you can get it.

        Some drugs, like cocaine and derivatives, cheat by messing with the dopamine levels directly. You get dopamine to flow when you reduce uncertainty about present or future outcomes (and you now know how to perceive them (present) or how to act on them (future)). And guess what, that’s also the very moment when you should learn how you got here. Because you want to get there again. Cocaine (at least initially) gives you the dopamine hit and the corresponding learnings for free, without your needing to go through the tedious work of reducing some uncertainty. Voila, rapid habituation.

        • oom says:

          One more thing about thrill-seekers: there’s “thrill” in reducing Free Energy of your model. You can do that by confirming existing beliefs like “I shall be satiated” (exploitation) or by improving your model (exploration). For thrill-seekers, the exploration leads to model improvement (perhaps rapid due to intensity of experience. Also probably involves a confirmation of fundamental beliefs like “I shall stay alive”).
          One could argue that after some amount of exploring into good, clean thrill-seeking, you’d form some kind of abstract representation for engaging in thrill-seeking activities in the context of all the things you could possibly do with yourself. That representation will, just based on the conditional statistics here, have a prediction for high precision of confirmatory observations of fundamental beliefs like the one mentioned above. It will now collect statistics itself – how often do you engage in thrill-seeking vs. other things. You can thusly, in fact, develop a habit of thrill-seeking.

        • Icedcoffee says:

          I vaguely recall that one of the traits that makes gambling addictive is coming close to winning, but still losing. How does this square with the “reduce uncertainty” explanation? Isn’t gambling increasing uncertainty?

          • oom says:

            My (rather uneducated) hypothesis: When you gamble, you try to learn the “trick”, how to beat the bank. It’s highly explorative in one way, but the potential winnings also make it attractive on on the exploitation side of the tradeoff.
            You could argue that games designed for addictiveness (like Candy Crush Saga) are carefully designed to keep the player on the hairy edge of randomness, where they always think they have it figured out or are about to. The game might actually adjust the level of randomness according to the player’s performance.

        • nameless1 says:

          I still don’t get how it reduces to just uncertainty and not about what: certainty and uncertainty about pleasant or unpleasant, liked or disliked outcomes? I mean, when I am in a hospital with a broken ankle and learn quite certainly that it sucks, and doing hold my beer tricks involving a roof and a pool also sucks because it led to this, and I am learning that I should not do stuff like that, and I have reduced a lot of uncertainty about what happens when I am being stupid, I still don’t get a dopamine high at all, I feel pretty bad about it. Reducing uncertainty about outcomes that suck does not feel good at all.

          And just what exactly do people learn when they snort coke in the music club and then just dance around?

    • Eli says:

      Do you have a write-up or some code somewhere I can look at?

    • baconbits9 says:

      It’s hard to reduce all of these very concrete, salient motivations to the overarching one – survival.

      The issue is that we don’t have one overarching motivation. Survival itself isn’t enough, maximizing individual survival would lead you never to procreate. Maximizing procreation on its own would likewise probably kill you or your offspring off in short order.

      There is no coherent definition of fitness.

      • oom says:

        “Survival itself isn’t enough”
        I don’t believe anybody would have said that 200 or 300 years ago. With all the comforts and safety of modern society, it’s easy to forget about the fact that the world actually conspires against you, that you’re beating the odds every second of your existence.

        • baconbits9 says:

          That is a bet you would lose handily as many people argued philosophically that merely surviving wasn’t enough. You are also wrong in the characterization that your are beating the odds every second of your existence. Outside of a tiny number of examples (Mao’s China during the famine years) humans did not live perpetually on the edge of destruction, but lived a life that cycled between states of being. Some times of the year are more prosperous than others, and some years are more prosperous than others, even for the poor in those times.

          • oom says:

            “many people have argued philosophically” doesn’t seem to be worth much in terms of building confidence towards the right answer.
            You could argue that every single cell in your body acts in one or the other way every single microsecond of its (and your) existence. Take one of the prerequisites for this process away (e.g. make your heart stop, or damage your lungs sufficiently, or injure you such that your blood pressure drops far enough, or … you name it), then the action stops, and what used to be “you” becomes a clump of matter like any other lifeless clump of matter that is going to disperse. Dust in the wind.
            So yes, survival is what it all comes down to. It’s the reason for every single one of our observed behaviors. It may not be apparent to us because our perception is clouded by all these things that we like to believe about what or who we are.
            If there’s more than survival, what is it? I count explorative behavior as conducive to survival.

          • baconbits9 says:

            “many people have argued philosophically” doesn’t seem to be worth much in terms of building confidence towards the right answer.

            is a direct refutation to

            “Survival itself isn’t enough”
            I don’t believe anybody would have said that 200 or 300 years ago.

            You could argue that every single cell in your body acts in one or the other way every single microsecond of its (and your) existence

            No you cannot argue that because many of your cells die without a fight. You skin cells repeatedly die, your liver cells put themselves at risk of death dealing with toxins, your blood cells die countless times.

            . It’s the reason for every single one of our observed behaviors

            So when two guys fight over a mate risking injury and death survival is the reason? When people eat terrible food, drink heavily and smoke that is for survival’s sake?

          • oom says:

            So when two guys fight over a mate risking injury and death survival is the reason? When people eat terrible food, drink heavily and smoke that is for survival’s sake?

            Fight over a mate: your innate, strong belief of “I shall have sex and reproduce” is violated by virtue of the other guy blocking you. That belief may be stronger than the belief “I shall be unharmed”. So you invoke the “fight” policy, with the “flight” policy being blocked by your strong belief of needing to reproduce.
            Also, death is not always the outcome. Many such fights do end with one party fleeing.
            Terrible food: often tasty.
            Drinking heavily: often fun.
            Smoking: often relaxing.
            Long-term consequences are abstract.

          • baconbits9 says:

            So where does

            the overarching one – survival.

            or

            You could argue that every single cell in your body acts in one or the other way every single microsecond of its (and your) existence.

            Come into play?

            It doesn’t sound like survival is an overarching motivation by your recent posts.

          • oom says:

            It doesn’t sound like survival is an overarching motivation by your recent posts.

            I’m having trouble seeing how you come to that conclusion. An organism must act to beat the odds and maintain its form. Inaction = certain death.

    • caethan says:

      The one thing I would add is that the theory is actually quite simple, and, after one really understands it, provides explanations for everything in biology and the human experience you throw at it without any extra backpacks.

      See, this is the kind of comment that just screams “crackpot” to the heavens. So you spent two years on understanding the theory, but in fact it’s actually quite simple. You claim to have “implemented” it but have not actually described anything about the implementation. And it “explains all of biology”.

      OK, I’m just a dumb biologist, but perhaps you can explain how to model, say a single GPCR pathway using this framework? I mean, he explicitly applies this to individual cells.

      • oom says:

        It didn’t take two years to understand the theory. That took maybe a few weeks, even though there are some pretty radical shifts in thinking in there that one needs to soften up one’s cognitive dissonance for before one can really understand it. What took two years is a) understanding all the consequences of it – making all the right mental connections to existing science, building all the right mental models for it from a number of angles, wrapping your head around how it differs from state-of-the-art formulations and why etc. etc., b) devising code and data structures to build it in a way that’s complete and practical and widely applicable (not toy examples for some narrow scenario in MATLAB), and then c) extending it in ways that only become apparent when you do b), e.g. how to think about model structure learning and optimization. The latter took perhaps most of the time.
        Oh and then there’s building demos and fundraising 🙂

        I just looked up what GPCRs are – I had no idea. There’s no doubt that there’s a lot of complexity in biology that is, at this point, not well-understood. When I made the perhaps overly bold statement above, I was thinking on a macro level, e.g. act to survive, explore to improve models, trade off risk and ambiguity, consequences of lesions like schizophrenia, etc. etc. I suspect that looking at the complexity with an overarching principle as a prior could, over the long run, help understand the emergent expressions on a micro level IF the prior is correct. Sorry for stepping on your toes.

        • caethan says:

          G-protein coupled receptors are one of the best characterized and understood sensory response pathways. They mediate neurotransmitter responses, nutrient sensing, vision, etc. Name the sense, and I can find a GPCR that mediates a cell sensing it. Friston explicitly says his framework is applicable to individual cells responding to their environment. I’m not (just) being snarky here — how would you build an explicit model to explain, say, a yeast cell responding to changes in the ambient glucose concentration? I can tell you how I’d do it as a biochemist and molecular biologist, and it involves a lot of detailed modeling of enzyme kinetics among multiple cellular compartments, with longer scale kinetics of DNA suppression and RNA transcription. It’s hard. This guy says it’s reducible to a simple mathematical relationship. I’m from Missouri. Show me.

          • oom says:

            If I understand correctly, that’s like saying that because the signals on a CPU address bus are somewhat predictable and exhibit certain patterns that are well-understood on some level, but don’t seem to have any obvious connection to payment processing, the program running on the CPU can’t possibly be implementing payment processing.
            What am I missing?

          • caethan says:

            No, you’re not understanding what I’m asking. If you want to elide over the enzyme kinetics and low-level details of how the proposed free energy scheme works, then fine – as long as you get the high level right. I’m asking you to provide the high level for a simple system.

            As I understand it, this whole framework is meant to model the sensing-action loop of an organism — organisms act to minimize the “free energy” of their sensory observations. So, here’s one of the simplest observed reaction loops:

            Take a single E. coli bacterium. It’s got a flagellum for locomotion and it eats glucose. It’s got sensing proteins on its outside that let it sense the ambient concentration of glucose. The flagellum can be in one of three states: off, tumbling, or propelling. When tumbling, the bacterium randomly reorients itself in solution. When propelling, it moves forward. The glucose concentration isn’t uniform; some regions of the solution have more than others. I can tell you from observational studies how the bacterium behaves and we have a fair bit of biochemical understanding of why it behaves that way.

            What I want you to tell me is what a free energy model of this sensing-response system looks like. It’s as simple as possible: one continuous input, one discrete output, and the bacterium wants sugar.

          • oom says:

            Ok, I’ll bite. Goddammit.
            One actuator/sensor (flagellum), three policies: no action, tumble, propel. “no action” is strongly preferred by way of prior preferences.
            Another set of sensors: Sensing proteins, presumably on all sides
            Prior / Preferred observations: Glucose. Specifically preferred: higher concentration of glucose ahead.
            And more sensors: Interoceptive sensors that observe “satiation” of sorts, with an optimal observation that’s preferred via a prior.
            A model that encodes the relevant dynamics of the environment i none way or another, learned in evolutionary timeframes. It needs to have a way to encode a joint distribution between multiple sensory modalities, and potentially valuable sequences/paths through these distributions.
            Here’s how I think it works.
            If the bacterium is satiated (preferred observations of satiation sensor = current/predicted observations), there’s no Free Energy in the model, and the “no action” policy is triggered by way of the policy prior.
            A divergence of the satiation sensor’s observations with its preferred observations is surprising to the model. The only way to decrease Free Energy in that simple of a model is through action, and so it triggers one of two strategies that have, in past experience, remedied the situation (as indicated by Free Energy in the model decreasing).
            If the model of the dynamics of the environment encoded in the bacterium’s design indicates that by propelling, there will be a higher concentration of glucose, that policy is selected (exploitation). You could imagine it being encoded by observations of higher glucose concentrations in front vs. sides/back. The policy is selected because it has low Expected Free Energy, and that is because predicted observations line up with preferred observations (glucose ahead).
            If it doesn’t (which is another source of Free Energy in the model), it triggers the “tumble” policy (exploration) because the model believes it will often decrease uncertainty in the model. Policies that decrease uncertainty have low Expected Free Energy, too (due to the ambiguity term). Not as low as exploitation policies, of course.
            Once satiation sensors indicate preferred observations, Free Energy in the model decreases and the preferred “no action” policy is selected again by way of the prior/preferences over policies.
            I’m well aware that this is designed to be a trap. I don’t know much about biology (was focused on more simple machines like robots), and I’m sure you will point out the many things I’ve missed (“what about the XSAV-XYZ protein???”). I thought maybe my answer can shed some light on the basic principles that other readers might find useful.

          • caethan says:

            Thanks. It’s not a trick question or a trap, I genuinely want to know what the theory or framework is supposed to be useful for. I’m a bit confused, though, because what you’ve described isn’t how to build a model, it’s a specific model for a bacterium’s behavior that’s empirically wrong, and I can’t tell if it’s wrong just because you built a model that doesn’t match the facts, or if it’s wrong because the free energy framework is incapable of matching the facts. (Or if the framework is useless because it’s capable of matching any set of facts.)

            Here’s how we see bacteria behaving: there’s no internal satiation sensor. Bacteria move the same way with respect to glucose gradients regardless of how well fed they are. Just because of the dynamics of water at small scales, it’s impossible to detect the local concentration gradient if the bacterium is stationary. The “front” and “back” of the bacterium don’t give different signals. The bacterium has to move to detect changes — it appears to be able to tell if the concentration is higher or lower than it was a little while ago. The one “trick” part is that the flagella are always on in one way or another; they never turn off. Though to be fair, I had forgotten that until I went to go look up the details.

            What you see in the absence of a concentration gradient is that the bacterium does a random walk. It tumbles for a bit, then switches to propelled mode and moves, tumbles, repeat ad infinitum. When there is a concentration gradient, it does the same thing, except that when the concentration is rising during propelled motion, it’s less likely to start tumbling, and when it’s falling, it’s more likely to start tumbling. So it random walks up the gradient, very slowly.

            So the strategy for the bacterium is effectively: how long do I stay in propelled mode before I tumble again to randomize my direction? I think what you would say is that I’ve got this one-dimensional output (how likely am I to start tumbling) and I have to pick what the best option is, and I’ve got this one-dimensional input (how much has the concentration changed in the last bit) that I’m using to determine that. And you combine those somehow to get a “free energy” that is minimized for the “best” option. I can see what you would call a “preferred observation” is: glucose rising. But what is the “predicted” observation here?

            Note as well: the specifics of that strategy (how long do I wait before tumbling given observed concentration change) is hard-coded into the bacterium, although you can do selection experiments to breed bacteria with different strategies.

          • oom says:

            Thanks, that’s helpful context. There’s a fantastic paper from Simon McGregor from the University of Sussex that would answer a lot of your questions. He has implemented a very simple robot (called Ringbot) based on the free-energy principle that operates very much like you describe the operation of the E.Coli bacterium (climbing a gradient of concentration of a nutrient based on a very simple 1bit (low/high) sensor and a very simple 1bit (on/off) actuator).
            He presented about it at the 2015 Free Energy Workshop. Link: https://arxiv.org/abs/1503.04187

          • oom says:

            So the strategy for the bacterium is effectively: how long do I stay in propelled mode before I tumble again to randomize my direction? I think what you would say is that I’ve got this one-dimensional output (how likely am I to start tumbling) and I have to pick what the best option is, and I’ve got this one-dimensional input (how much has the concentration changed in the last bit) that I’m using to determine that. And you combine those somehow to get a “free energy” that is minimized for the “best” option. I can see what you would call a “preferred observation” is: glucose rising. But what is the “predicted” observation here?

            The predicted observation is in the same value space as the preferred observation. So if you have a sensor for “glucose rising”, then the prediction could be “glucose rising” as well.
            I’ll take another crack at it. This could actually be more simple than what I thought.
            The model could have a hardcoded representation of a sequence/policy “glucose not rising” -> “propel” & “glucose rising” (#1), and another one for “any” -> “tumble” & “glucose rising” (#2), with the latter being preferred by way of prior preferences over policies.
            Policies / sequences are nothing but beliefs about how the world may unfold. They are learned from observations of how the world has unfolded in the past. The difference between policies and sequences is solely that policies involve observations of the agent’s own action (e.g. via proprioception), and sequences do not. Or in other words, policies are just a special case of sequences.
            Friston’s theory says that the model calculates both Free Energy (based on current observations vs. observations expected/required for that policy) and Expected Free Energy (based on predicted vs. preferred observations under that policy) for every policy. The policy whose sum of these two numbers is lowest (after addition of the prior preferences) is selected.
            FE policy #1 would be low if glucose is not rising, and Expected Free Energy for that policy is always low because “glucose rising” is always preferred.
            FE for policy #2 is high, but not as high to override the prior over policies. EFE for #2 is also low (see above).
            The sensor for “glucose not rising” / “glucose rising” probably has some latency and some inertia, which would explain the alternating behaviors.
            This is pretty bare-bones. But the behavior is also quite simple. In this example, both policies predict the preferred observation, and the decision is made solely on policy FE via the current observation. More sophisticated models could have policies that do not always apply (may have high FE in most situations), and that would predict different things. These models could respond properly in different situations as well as consider different preferences over time, e.g. a preference for observations of a nutrient becoming higher when it’s running low internally.

    • Doesntliketocomment says:

      It’s hard to reduce all of these very concrete, salient motivations to the overarching one – survival.

      I think you are making a mistake with this line. From what I can see, survival is not the primary motivator of this system, it is just the likely outcome from pursuing allostasis with regards to body demands and the environment. By this theory, if a behavior matches my predictions (beliefs) strongly enough, I will choose it even if it is not the one most likely to guarantee my survival. Likewise, a course of action that runs extremely contrary to my predictions will be ignored, even if it would promote my survival.

    • Tracy W says:

      It’s hard to reduce all of these very concrete, salient motivations to the overarching one – survival.

      I’d say it’s impossible. How do you explain suicide bombers? Or, that, in the sinking of the Titanic, men travelling first class had a lower survival rate than women travelling third class?

      • oom says:

        Suicide bombers have been maneuvered into a situation where they *believe* their only choices to be to press that button and being delivered into some kind of paradise, or a lifetime of shame and/or agony. If the former seems preferable, they will press the button. Action is based on beliefs, not on actual outcomes.
        Same mechanism with Titanic casualties.

        • Tracy W says:

          Yep.

          To quote Francis Bacon from the 17th century:

          It is worth the observing, that there is no passion in the mind of man so weak, but it mates and masters the fear of death; and therefore death is no such terrible enemy when a man hath so many attendants about him that can win the combat of him. Revenge triumphs over death; love slights it; honor aspireth to it; grief flieth to it; fear pre-occupateth it; nay, we read, after Otho the emperor had slain himself, pity (which is the tenderest of affections) provoked many to die, out of mere compassion to their sovereign, and as the truest sort of followers.

    • nameless1 says:

      > A person who seems motivated by money has a strong belief that money will increase the precision of future observations and thus decrease the uncertainty in their model.

      I really don’t understand this. It seems the whole model is equating desires with predictions in a way I don’t get – I can expect things I don’t like, and I can not expect things I like. A person motivated by money has a strong belief that money will put him in a state that he will like. Perhaps he is unsure how exactly that will turn out, but it will be likeable. We can predict very precisely our future observations in a dark room, but we would not like them.

      Every explanation I saw so far said something like “and there are preferred predictions”. Yeah. Those are called desires. So why do we even need to add expectations and and predictions in the model at all when it can just be explained by desire? A person motivated by money believes his future rich state will be likeable. When we see a dark room we are not motivated to sit in it because we believe it would not be likeable. Why do we even have to bother about predictions at all?

  29. greghb says:

    > > It is this enactive, embodied, extended, embedded, and encultured aspect that is lacking from the Bayesian brain and predictive coding theories.

    It’s a heuristic of mine that if someone employs an avalanche of adjectives to convey an idea of theirs, it means (a) they have high verbal intelligence, and (b) they don’t quite have their idea nailed down well enough to explain it. I think what’s going on is the speaker knows they haven’t quite nailed it, they don’t want the listener to inaccurately reduce their idea to a simpler idea, and so they gesture at a bunch of concepts that cover such a broad range that it at least makes the point that they mean to convey something complex, even if they can’t actually do it.

    This seems to fit pretty well with “enactive, embodied, extended, embedded, and encultured” … two of which seem maybe to be coined on the spot.

    • Freddie deBoer says:

      the opening lines of the Hobbit beg to differ

      • greghb says:

        In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.

        I guess it’s a bit different because he’s using it in the negative: it’s not an adj, adj, adj hole, nor an adj, adj, adj hole. Also this heuristic is more fairly applied only to very abstract adjectives.

        But it’s still a pretty good fit: Tolkien has a particular kind of hole in mind, knows that we’re going to misunderstand him if he just says “a comfortable hole”, and so basically tries to knock us off our preconceptions about default holes, rather than state affirmatively what he means by a hobbit hole. And then, once he’s knocked us off our preconceptions, he goes on to show us the kind of hole he really does mean.

        • poignardazur says:

          Yeah. It’s funny, I can’t find the words to describe it (and it would be pretty ironic if I used a lot of vague words to expand on your idea), but the way Tolkien communicates in that paragraph is the exact opposite of the way Friston communicates.

  30. But part of me does feel like there’s a weird connection between curiosity and every other drive. For example, sex seems like it should be pretty basic and curiosity-resistant. But how often do people say that they’re attracted to someone “because he’s mysterious”? And what about the Coolidge Effect (known in the polyamory community as “new relationship energy”)? After a while with the same partner, sex and romance lose their magic – only to reappear if the animal/person hooks up with a new partner. Doesn’t this point to some kind of connection between sexuality and curiosity? […]

    The only problem is that this is a really specific kind of uncertainty reduction. Why should “uncertainty about what it would be like to be in a relationship with that particular attractive person” be so much more compelling than “uncertainty about what the middle letter of the Bible is”, a question which almost no one feels the slightest inclination to resolve?

    I think this close relationship between uncertainty and every other desire might just naturally fall out of a framework in which we always are uncertain, in an uncertain world, but nonetheless have desires other than reducing uncertainty. (Which is what most people assume most of the time, if they aren’t subscribers to some uncertainty-is-everything theory.) Uncertainty is relevant to everything because we’re uncertain over outcomes; our decisions are (or could be) based on (approximations to) expected utility theory, and you need a probability distribution and a utility function to calculate an expected utility.

    So things like curiosity seem like they could be related to tuning the balance between exploration and exploitation, which will always be necessary in a world that is not fully known. (Concretely, we are initially capable of imagining that the relationship with the mysterious guy might lead us into a future that’s far better than the present, since his mysteriousness makes it so hard to clearly argue that this won’t or can’t happen)

  31. drethelin says:

    The simple answer to “how can this possibly explain everything?” is that it doesn’t, and the simple addition that lets it do so is weightings. The brain doesn’t try to reduce all uncertainty equally hard, it is weighted by reproductive success. Every lifeform is trying to occupy a certain state that, for its ancestors, was associated with reproduction. Different lifeforms live in differently complex environments, and various inputs have varying levels of importance, so they are programmed to weight the various inputs in increasingly more complex ways as the animal gets bigger and more capable.

    Even very simple animals like nematode worms make this process meta by learning, eg they have a memory and can associate the presence of food with non-food chemical signals and preferentially burrow in a way that will lead to food in the environment they find themselves in.

    We have a ton of built-in weightings on the genetic level, and we have a system for generating further weightings of the importance of certain kinds of information based on our lived experience.

    This is why almost everyone is more curious about sex than the middle letter of the bible but also why some people are in fact curious about the middle letter of the bible: the world is so complicated that our evolved learning system can actually get sidetracked from the path of reproduction.

  32. rahien.din says:

    Wouldn’t this just plain prove too much?

  33. vV_Vv says:

    It’s tempting to throw this out entirely. But part of me does feel like there’s a weird connection between curiosity and every other drive. For example, sex seems like it should be pretty basic and curiosity-resistant. But how often do people say that they’re attracted to someone “because he’s mysterious”?

    If anything this looks like uncertainty seeking, not uncertainty reduction.

    A mysterious man (it’s only women who say they like a mysterious partner) has a small but non-negligible chance of having a very high socio-sexual value, while with the average plain guy what you see is what you get. Of course, women don’t chase all mysterious men, no woman cares about what mysteries the ugly weirdo may hold, mysteriousness is only positive in men that are already reasonably attractive and display high-status signals.

    And what about the Coolidge Effect (known in the polyamory community as “new relationship energy”)? After a while with the same partner, sex and romance lose their magic – only to reappear if the animal/person hooks up with a new partner. Doesn’t this point to some kind of connection between sexuality and curiosity?

    As Wikipedia says: “The evolutionary benefit to this phenomenon is that a male can fertilize multiple females. The male may be reinvigorated repeatedly for successful insemination of multiple females. This type of mating system can be referred to as polygyny, where one male has multiple female mates, but each female only mates with one or a few male mates.”

    What about the typical complaint of porn addicts – that they start off watching softcorn porn, find after a while that it’s no longer titillating, move on to harder porn, and eventually have to get into really perverted stuff just to feel anything at all? Is this a sort of uncertainty reduction?

    I think this novelty seeking behavior of porn addicts is due to a combination of the Coolidge effect and the desensitization effect that also occurs in other addictions.

    So life isn’t doing four things – perceiving, thinking, acting, and maintaining homeostasis. It’s really just doing one thing – minimizing free energy – in four different ways – with the particular way it implements this in any given situation depending on which free energy minimization opportunities are most convenient. Or something.

    The problem is that these four things are all qualitatively different, they are probably accomplished by different mechanisms and only look related if you squint very hard. So is this “free energy minimization” paradigm an useful description of the world? Is seem to me that even Friston more or less acknowledges that it does not make falsifiable predictions, but does it at least catalyze intuitions? If it does I can’t see how.

    In general, I’m skeptical that there exists any fundamental principle of animal behavior that applies at individual level. The correct way to look at it is at evolutionary, population level: the woodlice scurry in the sun because they evolved in an environment where scurrying in the sun tended to increase evolutionary fitness, you crave pizza because the human species evolved in an environment where eating carbs and fats tended to increase evolutionary fitness, and so on. The nervous systems of woodlice and humans implement these behaviors with a complicated series of hacks, some of these hacks, especially in more complex animals, may resemble variational Bayesian inference, expected utility maximization, reinforcement learning, or whatever, but ultimately the whole thing is a Rube Goldberg machine that stocastically happens to work right when placed in an environment similar enough to the evolutionary one.

  34. Eli says:

    Ok, now a post on motivation, affect, and emotion: attempting to explain sex, money, and pizza. Then I’ll try a post on some of my own theories/ideas regarding some stuff. Together, I’m hoping these two posts address the Dark Room Problem in a sufficient way. HEY SCOTT, you’ll want to read this, because I’m going to link a paper giving a better explanation of depression than I think Friston posits.

    The following ideas come from one of my advisers who studies emotion. I may bungle it, because our class on the embodied neuroscience of this stuff hasn’t gotten too far.

    The core of “emotion” is really this thing we call core affect, and it’s actually the core job of the brain, any biological brain, at all. This is: regulate the states of the internal organs (particularly the sympathetic and parasympathetic nervous systems) to keep the viscera functioning well and the organism “doing its job” (survival and reproduction).

    What is “its job”? Well, that’s where we actually get programmed-in, innate “priors” that express goals. Her idea is, evolution endows organisms with some nice idea of what internal organ states are good, in terms of valence (goodness/badness) and arousal (preparedness for action or inaction, potentially: emphasis on the sympathetic or parasympathetic nervous system’s regulatory functions). You can think of arousal and sympathetic/parasympathetic as composing a spectrum between the counterposed poles of “fight or flight” and “rest, digest, reproduce”. Spending time in an arousal state affects your internal physiology, so it then affects valence. We now get one of the really useful, interesting empirical predictions to fall right out: young and healthy people like spending time in high-arousal states, while older or less healthy people prefer low-arousal states. That is, even provided you’re in a pleasurable state, young people will prefer more active pleasures (sports, video gaming, sex) while old people will prefer passive pleasures (sitting on the porch with a drink yelling at children). Since this is all physiology, basically everything impacts it: what you eat, how you socialize, how often you mate.

    The brain is thus a specialized organ with a specific job: to proactively, predictively regulate those internal states (allostasis), because reactively regulating them (homeostasis) doesn’t work as well). Note that the brain how has its own metabolic demands and arousal/relaxation spectrum, giving rise to bounded rationality in the brain’s Bayesian modeling and feelings like boredom or mental tiredness. The brain’s regulation of the internal organs proceeds via closed-loop predictive control, which can be made really accurate and computationally efficient. We observe anatomically that the interoceptive (internal perception) and visceromotor (exactly what it says on the tin) networks in the brain are at the “core”, seemingly at the “highest level” of the predictive model, and basically control almost everything else in the name of keeping your physiology in the states prescribed as positive by evolution as useful proxies for survival and reproduction.

    Get this wrong, however, and the brain-body system can wind up in an accidental positive feedback that moves it over to a new equilibrium of consistently negative valence with either consistent high arousal (anxiety) or consistent low arousal (depression). Depression and anxiety thus result from the brain continually getting the impression that the body is in shitty, low-energy, low-activity states, and then sending internal motor commands designed to correct the problem, which actually, due to brain miscalibration, make it worse. You sleep too much, you eat too much or too little, you don’t go outside, you misattribute negative valence to your friends when it’s actually your job, etc. Things like a healthy diet, exercise, and sunlight can try to bring the body closer to genuinely optimal physiological states, which helps it yell at the brain that actually you’re healthy now and it should stuff fucking shit up by misallocating physiological resources.

    “Emotions” wind up being something vaguely like your “mood” (your core affect system’s assessment of your internal physiology’s valence and arousal) combined with a causal “appraisal” done by the brain using sensory data, combined with a physiological and external plan of action issued by the brain.

    You’re not motivated to sit in a Dark Room because the “predictions” that your motor systems care about are internal, physiological hyperparameters which can only be revised to a very limited extent, or which can be interpreted as some form of reinforcement signalling. You go into a Dark Room and your external (exteroceptive, in neuro-speak) senses have really low surprise, but your internal senses and internal motor systems are yelling that your organs say shit’s fucked up. Since your organs say shit’s fucked up, “surprise” is now very high, and you need to go change your external sensory and motor variables to deal with that shit.

    Note that you can sometimes seek out calming, boring external sensory states, because your brain has demanded a lot from your metabolism and physiology lately, so it’s “out of energy” and you need to “relax your mind”.

    Pizza becomes positively valenced when you are hungry, especially if you’re low on fats and glucose. Sex becomes most salient when your parasympathetic nervous system is dominant: your body believes that it’s safe, and the resources available for action can now be devoted to reproduction over survival.

    Note that the actual physiological details here could, once again, be very crude approximations of the truth or straight-up wrong, because our class just hasn’t gotten far enough to really hammer everything in.

    • Scott Alexander says:

      I’ll look at this more later – that paper is super dense – but it doesn’t really ring true. There’s a correlation between depression and organ/immune/endocrine dysfunction, but it’s not as spectacular as you’d expect if they were the same thing.

      And some star athletes who are depressed seem to be able to continue excelling at their chosen sport, which makes it hard for me to conceptualize it as a failure of energy mobilization.

      And their claim that growing up in an abusive environment places high *metabolic* demand on the body seems a little ad hoc. Are we sure that kids from bad families have higher metabolic demands than, eg, kids who are athletes? Also, “metabolic efficiency may also be compromised by loss of a loved one”? Really?

      And does happiness/sadness really have much to do with organ quality? It seems more often affected by events in the world.

      Also, I think there’s a lot of evidence that depressed people are *overly* sensitive to interoceptive states, which is why eg depression increases risk for somatosensory feedback loop based panic disorder.

      The paper’s discussion of “unreliable prediction errors”, “insensitivity to prediction error”, etc, all sound a lot like Friston’s view of low neural confidence/precision.

      Maybe the most interesting implication I can think of here is for ketamine as a depression treatment. Kind of interesting that completely dissociating you from your body resets depression somehow.

    • nameless1 says:

      I think I 80% understand the linked paper. Does this help explaining why depression seems more common than in the past? I have no stats for that claim but clearly Hamlet seems to be meant as a huge exception back then and now every fourth person is a Hamlet.

      The behavior of other people became far more unpredictable between 1500 and 2000. We usually call this freedom. Could this play a role?

      It could relate to my hobby horse that we need to bring back formal etiquette. In 1900 going to an etiquette and dancing teacher and a tailor gave a man every skill and equipment to navigate a ballroom. Hardly any intuitive social skills had to be used. Everything scripted. Heaven for someone slightly autistic like me. Today we don’t even have clearly defined rules exactly how much insistence counts as harassment. Today you need to use a lot of empathy to get by socially – you need to read other people’s body language to find rejecting your courtship attempt is 1) just playful 2) or meant seriously for now, but not necessarily forever 3) or meant seriously forever.

      We used to have different ways to greet and address people, now it is “Hi, Joe!” both to your best friend, your boss, and to a coworker you dislike, and we have to find out from the intonation if it is meant in a friendly, or respectful, or disliking way.

  35. Eli says:

    Ok, now the post where I go into my own theory on how to avoid the Dark Room Problem, even without physiological goals.

    The brain isn’t just configured to learn any old predictive or causal model of the world. It has to learn the distal causes of its sensory stimuli: the ones that reliably cause the same thing, over and over again, which can be modeled in a tractable way.

    If I see a sandwich (which I do right now, it’s lunchtime), one of the important causes is that photons are bouncing off the sandwich, hitting my eyes, and stimulating my retina. However, most photons don’t make me see a sandwich, they make me see other things, and trying to make a model complex enough that exact photon behavior becomes parameters instead of noise is way too complicated.

    So instead, I model the cause of my seeing a sandwich as being the sandwich. I see a sandwich because there really is a sandwich.

    The useful part about this is that since I’m modeling the consistent, reliable, repeatable causes, these same inferences also support and explain my active interventions. I see a sandwich because there really is a sandwich, and that explains why I can move my hands and mouth to eat the sandwich, and why when I eat the sandwich, I taste a sandwich. Photons don’t really explain any of that without recourse to the sandwich.

    However, if I were to reach for the sandwich and find that my hands pass through it, I would have to expand my hypothesis space to include ghost sandwiches or living in a simulation. Some people think the brain can do this with nonparametric models: probabilistic models of infinite stuff, of which I use finite pieces to make predictions. When new data comes in that supports a more complex model, I just expand the finite piece of the infinite object that I’m actually using. The downside is, a nonparametric model will always, irreducibly have a bit of extra uncertainty “left over” when compared to a parametric model that started from the right degree of complexity. The nonparametric has more things to be uncertain about, so it’s always a little more uncertain.

    How can these ideas apply to the Dark Room? Well, if I go into a Dark Room, I’m actually sealing myself off from the distal causes of sensations. The walls of the room block out what’s going on outside the room, so I have no idea when, for instance, someone might knock on the door. Really knowing what’s going on requires confidence about the distal causal structure of my environment, not just confidence about the proximal structure of a small local environment. Otherwise, I could always just say, “I’m certain that photons are hitting my eyeballs in some reasonable configuration”, and I’d never need to move or do any inferences at all.

    It gets worse! If my model of those distal causes is nonparametric, it always has extra leftover uncertainty. No matter how confident I am about the stuff I’ve seen, I never have complete evidence that I’ve seen everything, that there isn’t an even bigger universe out there I haven’t observed yet.

    So really “minimizing prediction error” with respect to a nonparametric model of distal causes ends up requiring that I not only leave my room, but that I explore and control as much of the world as possible, at all scales which ever significantly impact my observations, without limit.

    • Scott Alexander says:

      Not sure I follow. It seems like a rich person acting to minimize their prediction error could just go into a dark room, pay somebody to handle their affairs for them and make sure they weren’t bothered, and they would be 100% correct that this would all work out.

      (I guess this assumes the rich person starts with some minimum understanding of the world such that they know this to be true. But I know this to be true, I will eventually have enough money to retire on, and I don’t feel any urge to do this. Why not?)

      Also, most people don’t try to explore and control as much of the world as possible. A lot of people are totally happy going to work, going home, and repeating for 50 years.

      • Eli says:

        Yeah, that post was more of a completely theoretical consideration from the “purely Bayesian” point of view. Your imaginary rich person seems to do that because you’re not really imagining how nonparametrics work very well ;-).

        But still, I don’t really put this forward as a real explanation: I’d go with the affect system directing you away from the Dark Room over anything else.

      • Doesntliketocomment says:

        I would like to add for consideration that some rich people do almost exactly this – they become recluses, minimizing all contact with the outside world and novel stimulus. As an explanation as to why more do not, one could argue that the behaviors that led to them becoming rich was the manifestation of their drive for control, and that they wouldn’t give that up as long as it seemed a workable strategy.

  36. Michael Thomas says:

    Perhaps the concept makes more sense if it’s not about reducing uncertainty in general, but reducing uncertainty about a specific thing – such as dying or being able to reproduce.

    Either of these (reducing uncertainty of death or reducing uncertainty about being able to reproduce) would make the pizza example make a little more sense (although I agree the example kind of sucks), since it then becomes reducing uncertainty about dying from hunger instead of having something to do with some BS mental model of eating pizza. And this example could then also take into account how full or hungry you are, which would decrease or increase the uncertainty respectively.

    Both of these would also make more sense from an evolutionary perspective, since it would make very good evolutionary sense that a brain wired to reduce the uncertainty of dying or failing to reproduce would increase evolutionary survival.

    However, looking at it from an evolutionary perspective, the uncertainty about dying might actually be a secondary effect, and the uncertainty about being able to reproduce might actually be the first-order effect. This, then, would also help explain the drive to sex (including, sometimes, a drive to sex that is stronger than a drive for self-preservation).

    Once you clarify it like that, alot of other things fall in line. The dark room example goes away as well, since sitting in a dark room all by yourself drastically increases the uncertainty about being able to reproduce (unless there’s a similarly stressed-out member of the opposite sex in there with you…). And all the other things that affect survival and reproduction automatically come into play (which, well, they better for the principle to make any sense).

    With this little tweak (the brain is optimized to minimize the uncertainty of being able to reproduce), the principle makes alot more sense to me on a whole lot of levels.

  37. caethan says:

    Took the time to read through the linked scihub paper discussing Friston. Here’s one of the papers they’re discussing, by Friston:

    The default-mode, ego-functions and free-energy: a neurobiological account of Freudian ideas

    What.

    Alright, my crackpot needle – which was already registering high – is now pretty much immovable from “WHY IS ANYONE TAKING THIS FRAUD SERIOUSLY”. For those unwilling to waste their time, this is a discussion of how Friston’s theories prove that Freud was right, dammit, and that all of his stupid and unsupported constructs fall right out of Friston’s framework.

    • Eli says:

      Honestly that paper really does look like Friston maximizing his publication count, and I’ve marinated in this stuff so long I find the equations to actually be readable with some effort, and the free-energy equation itself intuitive. Bunk! BUNK I SAY!

    • oom says:

      Cognitive dissonance strikes again 🙂

  38. ThrustVectoring says:

    For what it’s worth as a personal anecdote: I like listening to music that I don’t know why I like. Pretty much when I understand why I like a song, I stop listening to it. Until then, I can listen to a single song on repeat for hours.

    I might be weird, but anyways, it feels like the same sort of thing as described.

    • Said Achmiz says:

      Can you give an example of “understanding why you like a song”? I am not sure what this could mean.

  39. jaredtobin says:

    The discovery that the only human motive is uncertainty-reduction might come as a surprise to humans who feel motivated by things like money, power, sex, friendship, or altruism. But the neuroscientist I talked to about this says I am not misinterpreting the interview. The claim really is that uncertainty-reduction is the only game in town.

    You can actually derive the same idea from a sort of first-principles philosophical argument. Note that it requires an intuitionist (i.e. correct ;)) ethical calculus, as otherwise you run into the is/ought problem:

    * One ought do something (this is uncontroversial to me, so I take it as axiomatic).
    * If one ought do something, then there exists some set of actions that are better than others. In other words, there is a partial order on actions.
    * If there is a partial order on actions, then there exists at least some class of ‘best’ actions that one could take in any context (I’ll call any element of this class the ‘best’ action).
    * If one knew the result of every possible action, then one would necessarily know the best action to take in any context. Equivalently: when one doesn’t take the ‘best’ action, the difference is necessarily due to uncertainty.
    * Uncertainty is characterized by entropy, which is ordered. An action taken in the context of more uncertainty is necessarily no better than one taken in the context of less uncertainty.
    * If one ought do Y, and doing X partially reveals Y, then one ought do X.
    * Since reducing uncertainty partially reveals what one ought do, one ought reduce uncertainty.

    • Charlie__ says:

      Read up on the multi-armed bandit problem. Just because reducing uncertainty helps us achieve goals, doesn’t mean that reducing uncertainty is the goal.

    • nameless1 says:

      We aren’t ought to do something because there is no objective system of values, preferences, by which one outcome is objectively better than the other.

  40. Kevin says:

    This seems like an information theoretic/thermodynamic tautology to me.

    I apologize this idea has already been put forth. I scanned the comments, as well as searching specifically for “entropy” and “information theory”, but didn’t see anything directly duplicative.

    As many readers here undoubtedly know, information theory and thermodynamics seem to be either very closely related or different views of the exact same underlying concept. Entropy measures information content but the universe grinds towards maximum entropy. Anything that measures information also measures entropy.

    So, in order for a thing to be something rather nothing, it must contain information = be non random. Any system that pursues a goal or replicates must create information = reduce randomness. The brain is such a system, so one expects that we can reduce all brain operations to producing bits of information = increasing order = reducing entropy. This also seems like exactly the same as minimizing prediction error, which simply means you can’t reduce the random residual any further, so you’ve extracted the maximum amount of order from the system.

    My strong prior would therefore be that if you tried to apply full reductionism to the brain, bit creation / negentropy -ish math would eventually emerge.

    Does anyone here understand how this Free Energy Hypothesis differs from such math?

    It’s been a couple of decades since I did such math, but my memory is that it’s easy to lead yourself in circles because you’re essentially proving 1=1, 0=0, and 1!=0. So that would seem a possibility here.

  41. Ilya Shpitser says:

    This is bullshit, I think.

    The only part that’s “not bullshit” is the obvious idea that some of the stuff the brain does can be framed as a maximization/minimization problem (so for example, if you catch a baseball in real time with your hand, you can think of that as minimizing the distance between your hand and the ball over time — duh).

    All that “variational Bayes” and “free energy” business is keywords designed to bullshit people.

  42. Charlie__ says:

    I read the 2009 letter in Cell. It was very clear that this was a proposal for a model of human perception and action that was not at all tautological. But it didn’t explain why we’d expect this model to be true… instead, it had a lot of handwaving, and for “more details,” referred me to the 2010 Nature paper. Which I then skimmed, looking for the derivation or motivation of these equations (e.g. from figure 1 in Friston 2009). Of which I found exactly nothing.

    Basically, when presented with an idea, it’s often hard to tell whether it’s true in a vacuum. But it’s not so hard to evaluate why it’s true – there are so many false things that if you believe something without good reason, it’s probably false. So rather than delving into issues with the idea itself, which might lead to engaging with some very vague writing, it’s a lot easier to just note that the mathematical parts of this model are pulled directly from the posterior.

  43. pjs says:

    > We now get one of the really useful, interesting empirical predictions to fall right out: young and healthy people like spending time in high-arousal states, while older or less healthy people prefer low-arousal states. That is, even provided you’re in a pleasurable state, young people will prefer more active pleasures (sports, video gaming, sex) while old people will prefer passive pleasures (sitting on the porch with a drink yelling at children)

    Personally, I think the single biggest thing that might help me understand all this is to see one example/prediction of the theory (or a theory in the framework) being made – so I can see a concrete example of it in action.

    I’m not enamoured of this particular “really useful, interesting” prediction – I don’t feel it’s very precise or even true (and even if true, what does it mean – is it that there’s a tiny statistical difference between the entire populations of young vs old people, or there is a near universal difference, or it is ceteris parisubs comparison (which sounds most plausible, but then you explicitly invoke differing health status, so I guess not), … ).

    But forget this quibble, and grant that it’s a concrete prediction (true or interesting, doesn’t matter). Can you say anything more about how it falls right out? (Is that supposed to be obvious from what you had said prior to this claim?) I feel it could be very interesting to help some of us understand the ideas here if we could see a bit more about the chain of reasoning leading to this prediction.

    • Eli says:

      I’d have replied a lot quicker if you’d been replying in a thread to me!

      But the part about age versus arousal? That’s from this study, and since my neuro adviser brought it up at one point (might have been in her book), I’d hope the result has remained stable since 2012. Some kind of similar study has found a similar result at least once.

      My chain of theory for this prediction is: “arousal” is the variable for action-readiness versus relaxation. High arousal is HYPED UP BRO, versus low arousal being more “chilled”. Being in an aroused state requires physiological resources: glucose, oxygen, etc. It releases cortisol, and a bunch of other things that can cause inflammation if they stay in your system too long (which is the major source of stress-related illness, btw).

      If you’re older, your body gets inflamed and suffers from the high-arousal states more quickly, so it learns to prefer lower-arousal states — or so my theoretical reasoning goes.

      • vV_Vv says:

        The observation that young and healthy people are more active than old or sick people is not exactly surprising.

      • pjs says:

        Thanks for replying (sorry for posting my question in the wrong place.)

        IMO it undermines your to point predict that, e.g, “young people will prefer more active pleasures (sports, video gaming, sex)” because these activities themselves (well, maybe not so much gaming) are physiologically costly, so may be less pleasant to actually perform for older/iller people. Adding ‘arousal’ to this rather obvious
        story is a needless complication.

        So I’d say it’s best to do what the study you cite did, and just talk about differences in emotion, not what people actually prefer to do. The study as I read it (and not inconsistent with what you say) says that physical condition explains a large part of the effect found (and maybe essentially all, I don’t think the statistics presented are definitive on that). This reinforces why it’s so bad and confusing to claim your theory predicts activity preferences: of course declining physical condition influences whether people like doing active stuff, so I don’t think you should claim anything like this as a prediction (it sounds silly IMO.)

        But so far I can understand, I think the not-so-obvious claim you (and the study) are making is something different: that there are emotional states (in particular, arousal) that are in-themselves physiologically challenging. Is this right?

  44. HeelBearCub says:

    I have no idea what Friston is on about.

    But, damn, calling it “Free Energy” sounds like trolling to Nth degree. So much so that I’m surprised you did not taboo the term when trying to review the material. Substitute “perpetual motion” as the term and see how well it scans.

    As to what “Free Energy” is, well, I think Darwin et. al. covered that. If he isn’t just talking about long term (genetic) survival, well, then he has a bone to pick with them. And if he isn’t addressing that, then I think my troll meter went up even more.

  45. Forge the Sky says:

    Perhaps I’m misunderstanding, but I’m not quite sure how a rigorous version of this theory even operates with (for example) the our knowledge that different brain structures reliably form to perform distinct functions, or that cognitive processes are molded by different neurotransmitters. Unless this theory is just saying something kind of uninteresting (at least, uninteresting if one already understands predictive processing) about the fundamental way the brain does its calculations, in the same way one might say that everything a computer does is down to moving 0’s and 1’s around.

    The one way I can think to rescue it gets pretty weird pretty quickly, and I’m not sure how to describe it well. Basically it involves seeing all biological life and its environment as a sort of ‘superorganism’ within which this ‘free energy’ process takes place – with each individual expression of a behavior in (say) a bacterial colony working in an analogous fashion to neurons competing for low free-energy states, as is eventually expressed in a sucessful colony with a strong biofilm or something; all the way up to each iteration of a human brain being an attempt to find a low free-energy state within the current environment, complete with its hardwired architecture as part of that expression.

    Then, at the level of thoughts and actions, all neurological firing regardless of purpose becomes part of one big process that can be described under a unitive ‘free energy’ theory.

    It’s unclear to me how much utility a concept with such a broad scope could hold, at least where it comes to trying to understand human psychological functioning.

  46. JASSCC says:

    Does Friston’s theory permit second-order minimization of predictive error?

    The pizza case gets less wacky if the mind is monitoring its predictive capability and notices somehow when it has predicted that hunger is going to distract from accurate prediction unless a solution to hunger is supplied soon. In a certain locale, the quickest and most acceptably predictable solution might be eating a piece of pizza, so the system latches onto that specific “prediction” of action to minimize the predicted hunger-induced free energy.

    In this way, while the process remains steady, the details of what is being predicted vary in response to conditions, including proprioception and other core biological inputs, because predictable undesirable outcomes would in the future lead to a breakdown in predictive power.

    I don’t find this very compelling, but it makes more sense to me than jumping right to predictions about what you’re going to eat — there has to be some mechanism to get basic drives in. The mechanism I am suggesting might make sense with Friston’s model is that those drives act as something like interrupts that are addressed because they would otherwise subvert predictive capability.

  47. herculesorion says:

    “So life isn’t doing four things…[i]t’s really just doing one thing – minimizing free energy – in four different ways – with the particular way it implements this in any given situation depending on which free energy minimization opportunities are most convenient. Or something. This might be useful in some way?”

    I think the idea is that it’s useful because you can draw a philosophical equivalence between “fix the problem”, “gather more data about the situation to recognize that there is no actual problem”, “think about the situation until you understand why it’s not actually a problem”, and “achieve a mental state in which you don’t care about the problem”. Meaning, when someone complains about (X), replying “(X) is not a problem and here’s why” is as useful a response as “here’s how we could fix (X)”.

  48. herculesorion says:

    I must say, the whole thing sounds like the Efficient Markets Hypothesis, where any data that seems to counter the hypothesis can be explained as actually confirming it. People intentionally put their money in a hole and set it on fire? Well, I guess they just wanted to be warm and see pretty fire more than they wanted money, EMH works!

  49. Freddie deBoer says:

    When my graduate stats prof told me not to take a class in Bayesian because I don’t know calculus it was one of the most disappointing moments of my life.

  50. Sammerman says:

    I’ve always thought the prime drive is minimizing cognitive dissonance ( slightly different from minimizing uncertainty). I think most forms of disorders hinge on failure of expectations to match or model reality. Eg if I think I am at X or the world works in Y way, I am not going to have a good time when I find out it does not – this may result in amusement or pain, depending on the consequences and extent of the dissonance and predisposition to find it funny, amusing or terrifying. If I’m lucky enough, I may get pleasure out of the harmony of correct predictions and modeling.

    • Sammerman says:

      Not sure if true, but, you could probably argue that most internal issues people deal with is some disposition or thought bias, Eg I can be superman that when constantly rejected by reality feeding – I am average – results in a dissonance that is known as depression.

      IMO, content people are people who lack abnormal biases. smart people are. essentially good modelers or good modelers of their abilities and roles.

      While I have a podium, more wonky ideas too

      Consq: learning is the matching of expectations to some ( external ) reality. The matching and resolving. Of the dual stimului of resolving what you know with the new information.

      Learning is the pairing of existing expectations + new information, eg reality is slightly different than the known and minimize the dissonance to solidify knowledge. I didn’t know she has a sister… Now I do. I knew there were stars, but I didnt know you could call clusters of stars constellations… Thanks for the word, Now I do, etc,

      We are learning machines, we are dissonance minimizing machines.

      Creative production is the proposition of new ideas then the minimization of dissonance within those ideas.

      Known : I am funny, I want to make her laugh. I have those ideas, let me minimize the dissonance and achieve what the goal by telling a joke.

      The minimization of dissonance goes all the way from Pavlov’dog. I hear bell, I expected food all the way to higher cognitive processes.

      Those higher processes are tainted by our biases and those biases in expectations are our personalities.? Maybe


      Totally unrelated, but endemic to thought :
      The ability to do something = ( motivation – internal resistance – external resistance ) x innate ability

      • Sammerman says:

        *Those higher processes are tainted by our biases and those biases in expectations form the asthtetics that are our personalities.?

  51. amaranth says:

    you’re seeing Separate Human Conscious Agents as the thing, silly. the agent which is reducing free energy by humans having sex is more like homo sapiens or gaia than “you”

    • Forge the Sky says:

      I said basically as much above. I’m not sure it’s correct, or that it’s what Friston meant, but it’s the only coherent way to parse it.

  52. ohwhatisthis? says:

    So what do I sum up *this* one as?

    Word play? A weird rephrasing of “This phenomena follows MATH”
    Not even wrongness?
    A giant spaghetti bowl of terms not used how they usually are?

    No. Life didn’t start by minimizing uncertainty(though i’m sure you can somehow add a heap of quantum words to it, something something reduce energy chemical reaction) The brain is this massive giant mess,with some memory systems here, some neural networks with a function that ends up updating itself when wrong(I’m not sure I would call it bayes though), and a bunch of chemical levers that are really really weird and can’t often easily be put into the above yet somehow end up making the system run.

  53. IdleKing says:

    In Friston’s “inverted U” quote, he seems to be saying that what we enjoy is not *being in a state* of low uncertainty or low surprise, but rather the *experience of reducing* our uncertainty. E.g. being told about a new puzzle isn’t a source of pain but rather joy, since we can then enjoy the process of investigating the puzzle. And being in a dark room is boring, not because we are in a state of high or low uncertainty, but because whatever uncertainty we do have affords no opportunities for investigation. (Equivalently, we enjoy being in a state of reducible uncertainty, and dislike irreducible uncertainty?)

  54. Tracy W says:

    In a sense, it must be true that there is only one human motivation. After all, if you’re Paris of Troy, getting offered the choice between power, fame, and sex – then some mental module must convert these to a common currency

    I don’t see how this follows. Let’s take standard economic theory. You have a utility, U, which is a function of various things. So we can say:
    U=f(power, fame, sex, cat pics, brown paper packages tied up with string…)
    Where U is increasing w.r.t each of these (we can turn ‘bads’ like ‘stinks’ into a good by talking about the absence of said bad.)

    If you are trying to choose between power, fame and sex, you choose whichever will give you the greatest marginal increase in utility.

    Note that utility isn’t the One Human Motivation, because it’s a function of all sorts of motivations.

    Also can I add that the term “free energy” is totally raising my crackpot hackles? Probably very unfairly.

    • Peter says:

      The reputable use of “free energy” I’m familiar with is from thermodynamics; I’m a chemist originally, we have an interesting take on thermo.

      Free energy: A = U – TS, where A is the (Helmholtz[1]) free energy, U is internal energy (i.e. everything but heat – due to conservation of energy, changes in U are equivalent to heat gain/loss with the opposite sign), S is entropy and T is temperature. Generally we’re interested in changes – ΔA = ΔU – TΔS, the free energy lets us lump together entropy changes due to heat production/absorbtion and entropy changes due to everything else, and use that to predict eg. where chemical equilibria will be. In a way it’s a bit like utility – a way of lumping together dissimilar things which might not even have the same units and coming up with something you can calculate with.

      Entropy also shows up in information theory, confusingly they use the letter H [2] rather than S. The maths of information theoretic (“Shannon”) entropy is pretty similar to thermodynamic entropy [3], there are some deep connections between the two that can e.g. define hard physical limits on computing power[4]. So I suppose it was only a matter of time before “free energy” started showing up, probably with an awkward relationship to the thermo version. My guess is that this implies that information theoretic versions of “internal energy” and “temperature” should show up somewhere. There’s a “temperature” in simulated annealing which may or may not be relevant, dunno about “internal energy”.

      The terminological nightmare and twisted maze of things that are sort-of-like-each-other-but-different seems conducive to both a) actually useful actual genius and b) Aaron Smith-Teller style “genius”, and telling the two apart may be difficult.

      [1] There’s also Gibbs free energy, which is similar, but is applicable to constant-pressure rather than constant-volume systems. Since chemists like to work at constant pressure – it’s easier to set up and less likely to implode/explode, we tend to use the Gibbs. G = H – TS.
      [2] In thermo, H is “enthalpy”, which is basically the constant-pressure equivalent of internal energy.
      [3] Strictly speaking it looks like thermodynamic entropy is more like information theoretic “self-information” or “surprisal”.
      [4] There’s some freaky thing called “reversible computing” that supposedly circumvents this but it involves never throwing any information away, which could get tricky. Thermodynamic reversibility is a slippery concept and is confusingly different from the chemist’s concept of reversible reactions, which tend to get analysed using thermo. Oh gods, the terminology.

  55. deciusbrutus says:

    “Minimize uncertainty” sounds like a version of paperclip maximizer that doesn’t sound so stupid that nobody would ever make it.

    • Toby Bartels says:

      —We built this AI to minimize uncertainty, because Friston's theories proved that this was the coherent extrapolated volition of all humanity and indeed of all life.

      —So that's why we're all forced to remain still in this featureless dark room?

  56. wcraig3927 says:

    All of this reminds me a great deal of some of the work leading up to Vernon Smith (2002 economics Nobel prize winner) like Herbert Simon’s notion of bounded rationality and F. A. Hayek’s book The Sensory Order. I’m curious as to whether you have any thoughts on that work, Scott.

  57. rharang says:

    Sorry if this was brought up above and I missed it, but there’s also a significant difference between the goals of “minimize free energy/minimize prediction error” and “minimize uncertainty”. If the actual distribution of future events from some point forward is completely uniform (maximum uncertainty), you minimize free energy and future prediction error by making your own approximation of the future distribution as close to uniform as possible. You’d minimize prediction error and free energy by making yourself as uncertain as possible about the future.

    But ‘free energy’ on its own is hard to calibrate as far as “high” or “low”, because you’ve always got that log(P(d)) term which you can’t evaluate if you don’t have access to P, but you need to really know how good your approximate distribution Q is to your real distribution P. So it might make more sense that the reward signal is the _reduction in free energy over time_.

    So in very predictable situations, the _process_ of minimizing free energy would be a short one, with little reward to reap. Being in incredibly complex situations where no approximation you can make is a good one to the actual distribution also means you have very little ability to minimize the average free energy accumulated over time, so you hit a local minima very quickly and also get no ‘reward’ for minimizing. It’s in complex but tractable distributions that don’t match your original distribution function but can be well approximated by it that you get a chance to significantly reduce your free energy and so get a large reward. And if you can interact with a system and make it do surprising but ultimately explicable things, then you tee yourself up to harvest that “I’ve reduced free energy again” reward over and over.

    All of which might be to say that “minimizing surprise” might simply be a side effect of “reducing free energy”. You get reward for moving towards a minimum, not being at one. I’ll resist the urge to speculate about depression since I’m in no way qualified.

  58. John Schilling says:

    I think that if you are going to proclaim the great discovery that “All of Neurobiology, nay Consciousness Itself, is naught but the Brain’s attempt to Minimize BadFoo!”, then you really need to be simultaneously very clear and very comprehensive about what actually constitutes BadFoo. Otherwise you can call it “Free Energy” or “Uncertainy” or some other vague thing that sounds like it ought to be minimized, surround it in bafflegab, and leave the rest as an exercise for the student. Anything a brain or mind does, we just find something that there’s less of at the end and find a way to shoehorn it into the vague concept of BadFoo, and look, further validation of the theory.

    Or at least further validation of the Great Author’s stature, that he can get everyone else to do his work for him. And do it the hard way, constrained by the Procrustean use of BadFoo terminology.

  59. Eponymous says:

    I haven’t read the blog post or comments yet because I chased the links at the beginning to see if I could understand the theory on my own. I think I do (the linked blog post on wo’s weblog was particularly helpful). Here’s my interpretation:

    Our brain is basically trying to do two things: learn about the world, and choose actions that lead to good outcomes (i.e. epistemic and instrumental rationality). Usually we think of the first as a Bayesian process (basically we update our model of the world based on sensory input), while the second involves expected utility maximization.

    But what if they’re both done using a single mental algorithm?

    The idea is that our preferences are represented by a probability distribution P over states x, with “good” states corresponding to states that are deemed likely under this probability distribution. Then we receive some sensory input s, which leads us to update our beliefs.

    Then we can define a mathematical object called “free energy”, denoted F, which is:

    F = sum_{x} Q(x) * log(Q(x) / P(x&s))

    For some probability distribution Q, which is an *approximation* to the posterior P(x|s).

    Then it turns out that F *factors* into two terms:

    F = (posterior approximation error) + (surprise)

    where surprise is -log(P(s)), i.e. basically how much we don’t like the sensory input we got, and the posterior approximation error is the “Kullback-Leibler divergence”, which is defined by:

    KL(Q || P(x|s)) = sum_{x} Q(x) * log(Q(x) / P(x|s))

    Let me repeat that: we can calculate *one quantity* that contains *both* the error in our beliefs *and* how much we dislike the sensory input we’re getting. In other words, we’re lumping together beliefs and preferences. Then we adjust our beliefs Q(x) and our actions to minimize that one quantity.

    • Toby Bartels says:

      Thanks, this is very clear. This definitely leans towards the framework-not-theory side. But it also doesn't seem to have the grand implications. The free energy being minimized isn't surprise, but a sum of two terms, one of which is a surprisal. And the colloquial meaning of ‘surprise’ would better fit the Q-surprisal than the P-surprisal; P encodes our preferences rather than our predictions, and its surprisal is more a measure of dissatisfaction than a measure of surprise. But it's still interesting.

      • Eponymous says:

        Thinking about it more (and assuming that my interpretation is correct, which I’m not certain of), I think that the “innovation” is claiming that the P that encodes my preferences (the ideal probability distribution over states/experiences) is the same as the P that encodes my prior about the world before receiving any sensory input.

        I guess the idea is that this is a pretty good approximation if we think of P as capturing a sort of “homeostasis” state in which all of an organisms basic survival needs are taken care of, and it is trying to minimize deviations from this condition.

        In other words, it might be a useful algorithmic hack for evolved biological organisms like us. But it might be a pretty bad approximation for an ideal reasoner that has aspirations to optimize the world in ways drastically different than its current state.

        If true this might have some interesting psychological implications. But I’m wary of extrapolating too far because (1) I’m not sure that my interpretation of Friston is correct, and (2) I’m not at all confident that the theory is true.

        But it’s at least plausible to me that our brain runs on some sort of algorithmic hack that approximates ideal reasoning in the ancestral environment. So to the extent this theory is an attempt to be precise about exactly what sort of algorithmic hack we use, I think it’s very cool.

  60. Erl137 says:

    After all, if you’re Paris of Troy, getting offered the choice between power, fame, and sex – then some mental module must convert these to a common currency so it can decide which is most attractive.

    Pretty sure this argument (humans make choices between disparate goods, ergo they have a unified scale of goods) proves too much.

    Imagine a game show, where there are two people sitting in front of a console with big buttons, playing some sort of game with a range of prizes. And stipulate arguendo that each individual human has a single, unitary currency of value according to which everything is scored.

    Whatever button is first pressed when the options flash on the screen, that’s the prize the two players get.

    Now, the players might have identical preferences. Or they might have different ones. If the latter, maybe they negotiate, or they vote, or they allocate some sort of internal economy. Maybe they just have a mad slapfight every time images are flashed onto the screen, and whoever hits a button first wins.

    From the outside perspective, a unitary choice is always made. But the model of “these decisions must be made according to an implicit ordering of values” is obviously false (except inasmuch as you treat the entire room, console, two guys setup at the physical level as a unified system).

    Obviously this isn’t a very precise model of the brain, but it strikes me as plenty plausible that the brain might have multiple adaptive drives which trade off against each other through mechanisms which can’t be rounded down to “a common currency”.

  61. Alia D. says:

    On the question of a surprise-minimizer wanting to sit in a dark room until you starved to death: What is being directly predicted is not the external signals coming into the sensory systems, but pattern of neural firings inside the sensory system. And it seems not improbable that neurons that are in the process of starving to death as blood sugar drops fire or fail to fire in random and unpredictable ways.
    The surprise minimizer is trying to minimize surprise over as long a term as it can manage. Mammals may learn, perhaps in babyhood when systems for smoothing are still coming on line, that things like hunger or lack of breathing led to state with very high prediction error due to erratic neuron behavior. In that case a surprise minimizer in a mammal brain would be motivated away to work to avoid these states.
    Thinking of babyhood makes me think of the fact that mammal brains start out their development in a perfectly dark container, kept at a constant temperature and provided with a steady flow of nutrition, with sound muffled and a fluid bath to cushion movement. Then they get smushed out a narrow canal into a place of light, loud noises, temperature variation, and all sorts of unfamiliar touches. This could be a very dramatic surprise for prediction system embodied in a mammal’s brain.
    If this prediction system were to condition strongly on early training experiences, then it might form a deep distrust of total predictability as a possible warning flag that the universe might just be about to dump a whole truck load of surprise on you. So there would be a motive to explore the edges of the known to see if there were any big surprise generators waiting. In that case a surprise minimizer with a mammal’s early history might act a lot more like a model accuracy maximizer than one would expect absent that early experience.

  62. I attended a lecture by Karl Friston at the University of Edinburgh on the 5th of April, 2017. You can listen to it here. It was not very comprehensible, in my experience.

    I did take some notes at the start, before I became totally overwhelmed with confusion. Perhaps someone will find them amusing:

    Active inference and artificial curiosity

    1. blue > right
    2. red > left
    3. red > left or right (we already know it’s left)
    4. green > left or middle
    5. green > middle (update)

    optimal action is the one that maximizes the value of the state induced by the action
    problem: depends directly on states which we don’t know (we have beliefs about them)
    u_i^* = arg max V(s_(i + 1) | u_i)
    OR optimal action is the one that minimizes ?????? physics stuff
    u_i^* = arg min F(Q(s_i + 1) | u_i)
    Q is something to do with beliefs
    minimize uncertainty about outcome?

    it “solves the problem in just 12 moves”…
    (but you only need 5!)
    it is supposedly not improvable
    and it knew exactly what we knew (except it didn’t know there was a rule? so not really)
    so what DID it know? what is its hypothesis space? i missed that part

  63. oom says:

    There are a number of cognitive dissonance-inducing shifts in Friston’s theory vs. commonly accepted beliefs.
    For example:
    – the fundamental value is survival (continuity of the organism’s form and processes) against the odds. There is no other value on the same level. There are other values, but they are subordinate to and follow from survival.
    – action is a result of probabilistic inference about future observations, resulting in predictions at some level of the hierarchy. Predictions trickle down through the hierarchy. There is machinery on the lowest level of the hierarchy that fulfills predictions (muscle stretch states).
    – all inference and therefore all perception as well as action is based on probabilistic beliefs of the model (which can vary widely!) and hypotheses derived from these beliefs, NOT based on the actual states of the world
    – statistics of observations become prior preferences.

    It seems that the set of folks who don’t understand or accept the above (yet) overlaps a lot with the set of folks who have voiced the “crackpot” hypothesis. Which is a good data point by itself.

    • caethan says:

      the fundamental value is survival (continuity of the organism’s form and processes)

      This isn’t true. There are a number of species where individual organisms sacrifice their own survival in exchange for a higher success rate of reproduction.

      all inference and therefore all perception as well as action is based on probabilistic beliefs of the model and hypotheses derived from these beliefs, NOT based on the actual states of the world

      For the third goddamn time, explain to me how this theory models a single-celled organism sensing nutrients and altering its locomotion in response.

      It seems that the set of folks who don’t understand or accept the above (yet) overlaps a lot with the set of folks who have voiced the “crackpot” hypothesis. Which is a good data point by itself.

      You mean that people who know jack shit about biology think he’s a crackpot? Yes, that is a good data point.

      • oom says:

        This isn’t true. There are a number of species where individual organisms sacrifice their own survival in exchange for a higher success rate of reproduction.

        Friston’s theory does not preclude that. Species may have evolved to have hardcoded beliefs that causes individuals to act in a self-sacrificing way in certain conditions. But that’s the exception. Self-preservation is priority most of the time even for individuals of these species. How could a species survive if it isn’t? What other notions of value are you missing?

        For the third goddamn time, explain to me how this theory models a single-celled organism sensing nutrients and altering its locomotion in response.

        I’m not sure I caught the first two goddamn times you asked that question, not sure whether your tone is conducive to a good conversation, and also not sure whether I should take a crack at it. Seems well-covered by Friston’s work. What aspects of your example are not clearly explained?

    • Tracy W says:

      the fundamental value is survival (continuity of the organism’s form and processes) against the odds. There is no other value on the same level.

      Then why did men travelling first-class on the Titanic have a lower survival rate than women travelling third class?

      Why, back in 1601, did Francis Bacon note that virtually every other motive has at times proved stronger than the fear of death?

      • oom says:

        Because a) action is based on beliefs about future outcomes, not based on actual future outcomes, and b) humans are awesome and, by virtue of language, can have beliefs that reach beyond their own death.
        This seems to be a bit of a hot button issue. I wonder why.

        • Tracy W says:

          So, given that you agree with me, why do you keep asserting something (that “the fundamental value is survival”) that you know to be false?

          Why is this so important to you?

          • oom says:

            It’s not important to me per se, it’s just surprising that this is so controversial. Seemed quite obvious to me. Stop acting, and you die.
            I don’t know it to be false, I still maintain that it’s true. It may not apply to the process that is the individual organism in some special cases (where foregoing that value is useful to the species), but it seems to apply to the process that is the species.

            Also, so far no one has pointed out to me what else should be a value (beyond survival and at the same level of survival).

            If the concerns are more along the lines of “well it can’t just be survival because that’d be contrary to beliefs about how things should be – not sure what exactly it would be though”, then I’d be inclined to assume it’s survival only until somebody points out convincingly what else there should be and why.

          • caethan says:

            Also, so far no one has pointed out to me what else should be a value…

            Darwinian inclusive fitness. Survival is very often correlated to inclusive fitness, but when organisms have the opportunity to sacrifice survival for inclusive fitness, they take it. This was the whole impetus for Hamilton’s development of inclusive fitness back in the 60’s in the first place! If you are going to make grandiose claims about being able to explain all of biology, then you ought to know about 60-year-old developments in the science!

            If the concerns are more along the lines of “well it can’t just be survival because that’d be contrary to beliefs about how things should be – not sure what exactly it would be though”, then I’d be inclined to assume it’s survival only until somebody points out convincingly what else there should be and why.

            The “concerns” are that you’re wrong because organisms – obviously to anyone looking – don’t act to maximize survival. Salmon kill themselves to spawn, honeybees kill themselves to protect the hive, and desert plants kill themselves to seed before the wet season ends. Survival explains none of these behaviors, but turns out that inclusive fitness does.

          • oom says:

            Is that all? That fact that individuals, in some cases, act to ensure the survival of the species? How does that invalidate the claim that the sole purpose of action is to increase survival? It just changes the frame of reference.
            So in the vast, vast majority of situations, individuals act to survive, because that helps the species survive. And in some unsual, but obviously very salient special cases, they don’t, when it helps the species survive.
            Most of the learning of most of the species is beyond individuals (i.e. in evolutionary timeframes), too.

  64. nhmllr says:

    A tad unrelated, but does anyone know how he could have “derived” Schrödinger’s equation as a kid? I can’t see how one could “derive” it, instead of just taking it a priori.

    • Toby Bartels says:

      My guess was that he'd heard something about quantum mechanics, not detailed enough to include Schrödinger's equation, but clear enough that he could figure out what the equation had to be. At least, I did such things as a kid a few times (not Schrödinger's equation though).

  65. Lab Rat says:

    I don’t know if I’d call free energy a formalization of homeostasis. Homeostasis is already a useful, formalized concept. From a biological and evolutionary perspective I cannot see what free energy contributes that existing terms and concepts do not already cover.

    Take the wood louse example. First, its oversimplified oversimplified (Devigne et al. 2011, Morris 1999, Beale & Webster 1971), but I’ll critique it as written. He writes that the scurrying has no purpose or intent. I’ll agree that it lacks “intent.” However, in an ultimate, evolutionary sense, its purpose is to increase the probability of returning to dark environment. In the context of a forest floor, dark environments are more likely to be humid, cool, safe from predators, and contain food.

    From the proximate view of a wood louse, finding yourself exposed in the hot sun may be “surprising,” if we’re assuming some equivalence of free energy concept and homeostasis and I use the operant definition that surprising = pejus range. In an ultimate sense, the situation isn’t surprising at all. It happens frequently enough that somewhat complex behavioral responses are fixed not just in some populations but within multiple species. We don’t need free energy to explain this evolutionary language works just fine. We even have a single “motivation” that covers the evolution of every organismal trait: fitness.

    The probability distributions for homeostasis also don’t track for me. My body’s regulatory mechanism are trying to restrict the distribution of any given trait to those compatible with chemical reactions, preferably within range of optimum performance, but the possibilities are narrower than…all of them. My body will never have to deal with a heat rate of 350. A maximum entropy distribution isn’t possible. Either I’m within the bounds of life or my body has reached chemical equilibrium, and I’m dead.

    I am not sure we can understand life without homeostasis, but if anyone can actually understand free energy enough to make predictions and build models, there may be room for it to try to explain how homeostasis cropped up in the first place. However, I’ll mention that foundational concepts and theories in biology tend to be easy to explain in broad strokes, but have complicated details (life-history trade-offs, evolution, allometric scaling, niche theory, marginal value theorem). I’m suspicious of The Big Important Unifying Concept that no one can explain to a 400 level college class. I think E. O. Wilson has my proxy on this one,

    Sometimes a concept is baffling not because it is profound, but because it is wrong.

  66. Sniffnoy says:

    Is anyone else bothered that that Chabad link doesn’t *actually* tell us what the middle letter of the Torah is? It discusses a bunch of wrong answers but doesn’t tell us the actual answer. OK, it tells us it’s the 152,403rd letter, but which is that?

  67. Federico Vaggi says:

    Hi Scott,

    variational methods are not actually as complicated as you think. Here is a very approachable summary by Eric Jiang: https://blog.evjang.com/2016/08/variational-bayes.html.

    The very high level intuition is this: say you have a P(z), and a likelyhood P(x|z), typically, you might be interested in either the marginal likelihood P(x) or the posterior P(z|x) – and those can be very difficult to calculate. Variational methods turn very difficult probabilistic inference problems into optimization problem with the following trick. You posit, a priori, a family of *parametric* distributions called Q_{\theta} IE: a Gaussian, with parameters mu and sigma all contained within theta. Then, you look for the probability distribution in that family that is closest to the real posterior distribution: ie, the best Q_{\theta} – by optimizing over the parameters \theta. If the family of approximating distributions is sufficiently flexible, you can usually find a pretty good match.

  68. Gerry Quinn says:

    If you hallucinate (or imagine, as we usually call this type of hallucination) eating the pizza, you presumably hallucinate a story leading up to this point, i.e. reaching out for it.

    So you just have to make the story true and the conclusion will be true by default.

  69. Peter says:

    There’s a normative sense of the word “expect”, as in if a boss says to a habitually un-punctual person “I expect you to turn up to work on time”. There’s also a predictive sense of “should”, e.g. the boss could be saying to his cow orker of a morning, “well, it’s half an hour past Bob’s nominal coming-in time, so if he’s as unpunctual as he normally is, he should be coming in any time now…” What does this have to do with Friston?

    I started to dig into Friston’s 2010 Nature Reviews Neuroscience paper. There’s an interesting little bit on the first page:

    Entropy is also the average self information or ‘surprise’ (more formally, it is the negative log-probability of an outcome). Here, ‘a fish out of water’ would be in a surprising state (both emotionally and mathematically). A fish that frequently forsook water would have high entropy. Note that both surprise and entropy depend on the agent: what is surprising for one agent (for example, being out of water) may not be surprising for another. Biological agents must therefore minimize the long-term average of surprise to ensure that their sensory entropy remains low.

    This is interesting, but not quite how I understand just-plain-entropy as such. Your fish out of water is in a surprising state for a generic fish (or – for a generic ancestral fish) but not necessarily surprising for the fish frequently out of water. Your often-out-of-water fish (call him Nemo) could be one doomed to a short and unsuccessful life, or maybe it’s a mudskipper or something like that. So with plain-old-entropy, the long-term you use for averaging surprise is the same long term you use for getting the probability distribution that says whether something is surprising or not. There’s also cross-entropy, where you can use one long term (well, probability distribution) to get your average and another long term (i.e. probability distribution) to calibrate your surprise. So the plain-old-entropy of Nemo’s in-or-out-of-waterness could be fairly low, but the cross-entropy relative to an average fish of the same species can be substatially higher – by that standard, the fish is constantly in surprising states. Put another way: surprise here is a one-shot version of cross-entropy, you’re comparing a 1/0 probability distribution (“there’s a 1 in 1 chance that the thing that just happened happened, and a 0 in 1 chance that something else happened”) with a reference probability distribution and seeing how good the fit is.

    Likewise the plain-old-entropy of Bob’s arrival times could be quite low, we’ve seen he’s a pretty predictable fellow. The cross entropy relative to an ideal employee or even an average one is likely to be a lot higher, the boss is constantly having his expectations violated. Cross entropy could be considered a form of “living-up-to-expectationness”. Somewhere in Friston’s actual maths there’s Kullback-Leibler divergence, which is related to cross entropy. KL divergence is cross entropy minus plain-old-entropy. In Friston’s review paper, he has a glossary sidebar: “Kullback-Leibler divergence (Or information divergence, information gain or cross entropy.)” Argh. The most charitable reading of this is that Friston uses terminology in a highly eccentric way and this accounts for a lot of the confusion.

    If I understand it right, your pizza eater sits down and doesn’t so much hallucinate eating the pizza as expect to be eating the pizza already and… no? Better get on with it then.

    A hypothetical paperclip-producing Fristonian AI – you don’t so much tell it to produce as many paperclips as possible, as give it a probability distribution over how many paperclips you expect it to be producing. I think that with an exponential distribution – e.g. like how many dice you need to roll before you roll your first six, with each new paperclip produced, that’s evidence that the Expected Lifetime Paperclips Produced needs to be updated upwards by 1 and so the AI keeps producing paperclips indefinitely. On the other hand, it’s not really so motivated to make desperate lunges for world domination in order to produce the countless zillions of paperclips that it maybe could make because it isn’t expected to make them.

    …I think. People who understand this better: am I on the right tracks here?

    • Toby Bartels says:

      “Kullback-Leibler divergence (Or information divergence, information gain or cross entropy.)” Argh.

      Friston may not be alone in this. Wikipedia writes:

      In the engineering literature, the principle of minimising KL Divergence (Kullback's "Principle of Minimum Discrimination Information") is often called the Principle of Minimum Cross-Entropy (MCE), or Minxent.

      (This is because, depending on what is held fixed when doing the minimisation, the two problems may be equivalent. Unfortunately, I don't think that they're equivalent for Friston.)

    • Eponymous says:

      This is roughly my interpretation. Minimizing free energy means minimizing the sum of two terms: the accuracy of your beliefs (the KL divergence), and how surprising you find your experiences. But importantly, this surprise is relative to an “ideal” probability distribution, not your actual beliefs.

      Thus a fish out of water is in a “surprising” state, even if being out of water is completely expected under his current beliefs about the world.

      The “innovation” is claiming that the “ideal” probability distribution that encodes your preferences is basically the same thing as your “prior” about the world. Which makes the whole thing look like a sort of algorithmic hack.

      See my comment upthread for the math. (Note: the above is my interpretation, which may be wrong.)

      • Doesntliketocomment says:

        I appreciate your earlier post concerning the math, I found it to be the most enlightening in the thread. I guess my question would be if the “ideal” is different from the “prior”, what generates this “ideal” probability solution?

  70. JKPaw says:

    After spending a few days trying to spot an uncertainty-reduction force at work in my own mind/body, I was having a lot of trouble locating it as a universal motivator. But when I turned my attention outside, trying to make sense of others‘ behavior, it suddenly seemed a much better fit. Most people seem to latch onto whatever narrative is most likely (usually because that narrative is popular, socially sanctioned, or statistically probable).

    Which makes me wonder whether this particular theory-of-everything has value (if indeed it does), not in a biological way (as in dopamine transmission), but simply as a describer of our cultural agreements about the nature of intelligence and psychological well-being. In other words, uncertainty reduction might strictly be a shared value.

    I’m reminded of Michael Kinsley’s excellent adventure into the world of cognitive testing, and his outrage that test-makers consider certain objectively correct answers as a signs of a broken brain (or at least pedanticism). Abstract “truths” that are narrative based are considered more correct than literal truths that violate agreed-upon narratives. It seems to me that if we latch onto certainty that, say, the emperor is wearing a new set of fancy clothes, we are acquiescing to the comfort of shared society — not some sort of biological imperative.

  71. amirlb says:

    This was not mentioned yet, so let me link to a piece on Jeremy England’s work. I’m not the first to mention him together with Friston but the connection doesn’t seem common yet. I’m not familiar with the biology or neurology involved in either line of research, but as a mathematician I find the models fascinating.

    England’s theory says, in a short and somewhat inaccurate summary, that the reproductive success of organisms is almost proportional to the rate at which they increase their environment’s entropy. This supposedly also held in the pre-biology era, and simulations show that this principle can explain the origin of life. This is similar to the lice observation from the Alius interview, indeed life must change the probability distribution of its conditions from “anything goes” to ones suitable for life.

    Slipping into wild hypotheses to connect England to Friston:
    Assume that a primitive organism is successful. This means it embodies an optimization process that raises the environment’s entropy gradient. When the creature sees fit to evolve a brain, it might be able to reuse some of its genes for free-energy minimization, as the formulas are very similar. So this could explain why our brains would optimize for free energy.

  72. P. George Stewart says:

    That sound like a lot of hard work just to avoid Causa Finalis. 🙂

    • Peter says:

      I have a theory about Aristotle’s metaphysics; it – well, the wikipedia summary of it – makes a lot more sense if you don’t imagine his four causes as being like four forces, and instead treat it as a schema for constructing satisfactory explanations. This may be me preferring a misunderstanding of Aristotle over actual Aristotle, in which case, so much the better. Anyway, in that view, a final cause isn’t something that acts in addition to the efficient cause, more the final cause is something that explains why things achieved by the efficient cause aren’t just some freakish coincidence.

      def f():
      i = 2
      while True:
      for j in range(1,i):
      m = i-j
      n = j
      print(m,n) # comment this out if you like
      if(m*m == n*n*2):
      print("m/n = %s/%s" % (m,n))
      return
      i += 1
      f()

      If you try running this program, it will never print out the m/n … thing, despite seeming to have ample opportunity to do so. Why not? If you follow what the computer does, step by step, you’ll see that the observed behaviour is compatible with your deterministic model of how computers work, no occult forces need to be invoked here, in a rather weak sense this “explains” why you never see that output. But that’s all a bit unsatisfying – in particular, if you’re strict about not saying “it turns out that”, then your “explanation” is incredibly long, and those don’t seem to satisfy.

      A satisfying explanation says “it’s trying to find a rational root of two, and it fails because there isn’t one, because [well-known proof]”. This “trying to” looks an awful lot like a final cause… well, sort of like one to my eyes with a bit of squinting, other people may need to squint harder. Anyway, that explanation is mercifully short, and doesn’t need to get longer the longer you leave the computer running for, which helps it feel much more like a properly satisfying explanation.

      So one key thing in science is to find things which, depending on your tastes, either count as final causes, or sort-of look like final causes. If you are a chemist, then one of these is the second law of thermodynamics[1] – why does such an such a reaction happen? “Because energy minimisation” is a common answer, on closer inspection “because free energy minimisation” is a better one, and free energy tends to get minimised because entropy maximisation. So once you’ve established entropy maximisation as a satisfying endpoint for your investigations, then you can go forth and investigate all manner of chemical systems and come up with satisfactory explanations of why they behave like they do. Likewise in biology a lot of things can boil down to “because natural selection”.

      Things like the second law and natural selection – key endpoints in explanation-generation – you can argue endlessly about what to call them. Does Darwinian natural selection explain teleology or merely explain it away? I believe Darwin thought the former, whereas contemporary biologists tend to think the latter. You could debate endlessly whether natural selection should be called teleology or quasiteleology or pseudoteleology or whatever, but I think I would find that debate tedious and beside the point. If you find one of these, and do a good job of theorizing it, then that’s a big achievement, worthy of a Darwin or a Boltzmann.

      [1] Incidentally, the second law of thermodynamics could be interpreted as saying that there are no “occult final cause forces”

      • P. George Stewart says:

        That’s roughly about right, I think. The Four Causes are basically the four kinds of reasons for things being the way they are. Final Cause was never conceived of as any kind of “force.” What science now calls “causality,” if taken as being roughly similar to what Aristotle called Efficient Cause, is just one of the four different kinds of reasons for why things are the way they are. One reason why things are the way they are is, in part, because other things act on them in such a way as to realize whatever potential for transformation they have.

        Another way of wriggling out of Final Cause that’s quite amusing in its inadvertent reinventing-the-wheel nature, is Dennett’s “Free-Floating Rationales.” They are pretty much Final Cause, just by another name – but because it’s Dennett, somehow it’s ok 🙂

        The key to understanding Aristotle (so far as I understand it, anyway) is to understand the distinction between Actuality and Potentiality (or Act and Potency in the classical lingo), and the key to understanding that is to understand why Aristotle came up with the distinction, as a harmonization of Parmenidean and Heraclitean metaphysics:-

        Parmenides’ claim was that something can’t come from nothing, but that nothing was the only thing something new could come from, since the only thing there is other than what already exists (i.e. being) is non-being or nothing. Hence nothing new can come into existence, and change is impossible. Aristotle’s reply is that while it is true that something can’t come from nothing, it is false to suppose that nothing or non-being is the only possible candidate for a source of change. Take any object of our experience: a blue rubber ball, for instance. What can we say about it? Well, there are the ways it actually is: solid, round, blue, and bouncy. (These are different aspects of its “being,” you might say.) And there are the ways it is not: square and red, for example; it is also not a dog, or a Buick Skylark, or innumerable other things. (The ball’s squareness, redness, dogginess, etc., since they don’t exist, are thus different kinds of “non-being.”) But in addition to all this, we can distinguish the various ways the ball potentially is: red (if you paint it), soft and gooey (if you melt it), a miniature globe (if you draw little continents on it), and so forth. So being and non-being aren’t the only relevant factors here; there are also a thing’s various potentialities.

        Feser, Edward. The Last Superstition: A Refutation of the New Atheism. St. Augustine’s Press.

        Once this clicks, the classical metaphysics – including the Four Causes, Essentialism, etc. –
        makes a lot more sense. You start to understand why things have baked-in limited ranges of things they can be caused to change into, not just any old “logical possibility.”

  73. ekaj says:

    I’m a medical student and I’ve been trying to apply one of Friston’s fMRI methods (DCM) for the past year, and still 80% of it is over my head. I talked with a leading fMRI statistician at Hopkins about the method and he shared a similar sentiment. ¯\_(ツ)_/¯

  74. Steve Witham says:

    This post & its comments contain some references to Boltzmann, metaphorical temperature, and simulated annealing, but so far no references to: Boltzmann machines; Hopfield nets; Metropolis algorithm; or relaxation in the physics, math, or computer science sense. There other nifty methods in this vein I can’t remember the names of at the moment.

    This kind of method has a system that solves problems by
    1) Having a state space that’s has evolved and/or learned and/or been set up to be mapped/linked to the dynamics of the problem/solution space (including sometimes action dimensions), and
    2) Having “downhill” in the space defined statistically– an area is more downhill if you’re simply more likely to end up there after a random semi-local jump, uphill if less likely.
    3) Having an equivalent of temperature so that higher temperature means bouncing around more and lower temperature settling down to more local jumps in the space (or at least into a more local rolling path). Temperature is like a standard deviation or variance, or the opposite of confidence or precision (or some combination).

    Sometimes the jump-result distribution is derived from something (e.g. in Metropolis); sometimes it’s just a result of an opaque current state & state-transition rules.

    Sometimes there’s no temperature, sometimes it’s an externally tuned or algorithmically driven number, sometimes it varies according to some (quasi-) physics of the system. But, when talking about the methods, people call it temperature, or sometimes energy, not “excitement” or “happiness,” “focus”, “confidence”, etc.

    The example of the bugs searching for shade is both confusing and simplifying because their behavior is like thermal jiggling of molecules but linked (either through temperature sensing nerves or–??– some more direct biological effect of heat on their nerves and muscles?) to their behavior. It just happens to work out for the bugs in this case that running around more when you’re too hot and less when you’re cool means you stay in cool areas longer.

    What’s nice about these methods is that they just “naturally” end up in good places or solutions. What’s maybe hard about them is that their states don’t resemble statements about confidence of belief or happiness or probability distribution over a space of assertions. And also, it’s possibly hard to design or understand the mappings between outside world and inside state space (or the learning or evolution process on the mapping) such that the natural problem solving happens reliably and efficiently.

    So, maybe Friston means free energy as an intermediate point for linking physical-ish models to more semantic-ish ideas like PP or Bayes.

  75. anonymousskimmer says:

    I agree this is one of life’s great joys, though maybe me and Karl Friston are not a 100% typical subset of humanity here. Also, I have trouble figuring out how to conceptualize other human drives like sex as this same kind complexity-reduction joy.

    Stopping here to post this before finishing your entry.

    It is undoubtedly possible to come up with extremely basic functions and parameters, perhaps even a fundamental one.

    To point out three things:

    1) In mathematics it is possible to represent distinct systems using a variety of kinds of math. This does not change the underlying reality of the system, it merely(?!) shows how amazing math and a turing machine are at approximating realities. If a system in which variable X fairly accurately calculates what’s going on and is called “uncertainty minimization”, that doesn’t mean that “uncertainty minimization” is the entirety of that variable.

    2) I forgot point two, I think it probably had something to do with the kludgey effects of evolution on cellular and bodily functioning, and the question as to why this kludgey effect wouldn’t also pertain to the brain.

    3) The terms “uncertainty” and “joy” are Enneagram of personality key terms. They can be generalized with handwaving, but most often “uncertainty” is linked to types 5,6,7 and joy to types 1,7,5. This makes me think that the terms “uncertainty minimization” and “joy” are, at the very least, merely subsets of the underlying motivation which Friston is claiming is the universal minimization function. And “effort minimization” is very much a type 5,7,8 linked concept.

    Back to reading the entry.

    • anonymousskimmer says:

      Your last few paragraphs are a very information summation.

      Overall, the best I can do here is this: the free energy principle seems like an attempt to unify perception, cognition, homeostasis, and action.

      This is what most personality theories have attempted to do, only they come at it from the opposite direction (and are thus much less mathematically robust, surviving on mere pattern matching and abstracting).

      Homeostasis tries to get the organism’s internal state to match a mental model. Since even bacteria are doing something homeostasis-like, all life shares the principle of being free energy minimizers.

      Is “Free energy minimization” a boundary condition, or a “goal”? Does it limit what we can do, or does it drive what we can do? Is this the fundamental motivation, or the fundamental rock and hard place?

      And about those woodlice: Was the sun giving them more energy to run with, or encouraging them to run faster due to the biological fear of burning up/dessication?

  76. David Bahry says:

    I’m trying to figure out if Friston is being equivocal or unorthodox when he’s talking about surprisal and entropy (long-run average surprisal). My thought is – which entropy is being minimized? E.g. say I’m looking at a die; I assign a probability distribution to its possible outcomes, and this probability distribution has an entropy. But also, there’s a set of possible states that I could be in, to which, say, God, or Karl Friston, assigns a probability distribution or probability density; and this probability distribution also has an entropy.

    So am I supposed to be minimizing Entropy(my PDF for the d20’s states), or Entropy(Friston’s PDF for my states)? Which of these do predictive processing model’s talk about? Which one does Friston think is unavailable to the organism, necessitating minimizing (which?) free energy instead?

  77. David Bahry says:

    This comment will intuitively explain the parts of the relevant math that I understand: what surprisal is, what Shannon entropy is and how long-run average surprisal is similar but potentially different from it, and what Kullback-Leibler Divergence is. It assumes knowledge of what a probability distribution is.

    [edit: ugh I thought latex was going to work but it didn’t]

    (I don’t yet understand what free energy is. It sounds, though, like Friston is talking about Variational Free Energy from information theory, which I guess is mathematically similar to but should be interpreted differently from Thermodynamic Free Energy. It sounds like Variational Free Energy is a quantity that emerges from rearranging a certain instance of the Kullback-Leibler Divergence formula, in a certain way, https://www.umsu.de/wo/2013/600 describes.)

    Suprisal is just another way of quantifying the chance /uncertainty about an outcome. Intuitively, you could describe an outcome’s chance either in terms of “how likely you think it is to happen”, or in terms of “how surprising it would be if it did happen”. If you wanted to define such a quantity, you would probably want it to obey a couple intuitive constraints (I got this explanation from https://math.stackexchange.com/a/374981/470287). First: the less probable something is, the more surprising it should be if it happens. Second: if two things are independent, then the surprisingness if both happen, should equal the sum of the surprisingnesses that either one gives you on its own.

    A quantity which has these properties, where the probability of $latex A$ is $latex p_A$, is

    $latex Surprisal_A = -log{p_A}$

    That’s surprisal. Obviously, the lower $latex p_A$ is, the higher $latex -log{p_A}$ is, because of the minus sign. Independent outcomes’ surprisals add ($latex -log{p_{A&B}}= (-log{p_A}) +(-log{p_B$})), because their probabilities multiply ($latex p_{A&B}= p_A \times p_B$), and multiplying two things is equivalent to adding their logarithms. (You can choose whatever base for the logarithm you like, as long as you’re consistent and you tell the reader.)

    Entropy is a measure of uncertainty, not for specific outcomes, but for entire probability distributions of possible outcomes. Suppose that there are 100 outcomes, but 99 of them each have a tiny probability, and one of them has a huge probability. Now suppose that tomorrow there will still be 100 outcomes, but they’ll all have equal probability. Intuitively, it feels like you’re more uncertain about what will happen tomorrow, than what will happen today. “Entropy” lets you quantify this.

    What the entropy formula does, is add up the surprisals for each outcome, weighted by the probability of that outcome:

    $latex Entropy=\sum p_i Surprisal_A$
    $latex Entropy = – \sum p_i log{p_i}$

    In other words, this gives the frequency-weighted average surprisal of the distribution. In other other words, it gives the expected surprisal of samples taken from the distribution.

    Long-Run Average Surprisal (which the organism is said to want to minimize) can be, but may not be, the same as entropy. We now have to consider that there can be two or more “probability distributions” for the same set of events. For example, suppose that I’m looking at a 20-sided die, and that the die is biased, but I don’t know it. There’s an “objective” probability distribution for the die’s outcomes – say, those probabilities are given by $latex p_i$ – and there’s also my subjective probability distribution (which might be inaccurate, if e.g. I think the die is fair but it isn’t), given by $latex q_i$. Then “my” surprisal from each outcome is assigned by my subjective probability distribution, but the frequency weightings in determining my long-run average surprisal are determined by the actual objective frequencies. My subjective distribution’s entropy would be $latex – \sum q_i log{q_i}$, but my actual long-run average surprisal would be $latex – \sum p_i log{q_i}$.

    Kullback-Leibler Divergence is a non-commutative measure of the differentness, or divergence, between two probability distributions for the same set of possible outcomes. (“Non-commutative” means that the order can make a difference: the divergence between distributions A and B, need not equal the divergence between B and A.) It’s defined in terms of the disagreement, between the two distributions, about the surprisals of the outcomes. It’s defined as the probability-weighted average difference between the two distributions’ surprisals for the same outcome, weighted by the outcome’s probability according to the first distribution:

    $latex Divergence_{PQ} = \sum p_i[(-log{p_i})-(-log{q_i})]$

    The reason why the measure is non-commutative, is because it matters if the surprisal-differences are weighted by $latex p_i$ or by $latex q_i$. (The surprisal-differences themselves will also change sign, since $latex A-B \not= B-A$.)
    The formula is sometimes rearranged to be more concise, though less intuitive:
    $latex Divergence_{PQ} = \sum p_i [log{q_i}-log{p_i}]$
    $latex Divergence_PQ = \sum p_i log{q_i / p_i}$

    If I’m looking at a die, and I want my model to be accurate, then I want my subjective probability distribution to match its objective one as closely as possible. If I succeed, then the KL divergence will go to zero. Also, my long-run average surprisal from sampling the die, will be as low as possible (and will equal the objective distribution’s entropy).

  78. tristanls says:

    I’m surprised that Hierarchical Temporal Memory (HTM) from Numenta hasn’t come up (at least not via my browser’s “find” feature). HTM is precisely a model which demonstrates how a prediction machine (human neocortex in HTM’s case) can self-motivate to generate actions without goals by predicting next actions in long temporal sequences modulated by external stimuli.