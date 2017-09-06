Yesterday’s review of Surfing Uncertainty mentioned how predictive processing attributes movement to strong predictions about proprioceptive sensations. Because the brain tries to minimize predictive error, it moves the limbs into the positions needed to produce those sensations, fulfilling its own prophecy.
This was a really difficult concept for me to understand at first. But there were a couple of passages that helped me make an important connection. See if you start thinking the same thing I’m thinking:
To make [bodily] action come about, the motor plant behaves (Friston, Daunizeau, et al, 2010) in ways that cancel out proprioceptive prediction errors. This works because the proprioceptive prediction errors signal the difference between how the bodily plant is currently disposed and how it would be disposed were the desired actions being performed. Proprioceptive prediction error will yield (moment-by-moment) the projected proprioceptive inputs. In this way, predictions of the unfolding proprioceptive patterns that would be associated with the performance of some action actually bring that action about. This kind of scenario is neatly captured by Hawkins and Blakeslee (2004), who write that: “As strange as it sounds, when your own behavior is involved, your predictions not only precede sensation, they determine sensation.”
And:
PP thus implements the distinctive circular dynamics described by Cisek and Kalaska using a famous quote from the American pragmatist John Dewey. Dewey rejects the ‘passive’ model of stimuli evoking responses in favour of an active and circular model in which ‘the motor response determines the stimulus, just as truly as sensory stimulus determines movement’
Still not getting it? What about:
According to active inference, the agent moves body and sensors in ways that amount to actively seeking out the sensory consequences that their brains expect.
This is the model from Will Powers’ Behavior: The Control Of Perception.
Clark knows this. A few pages after all these quotes, he writes:
One signature of this kind of grip-based non-reconstructive dance is that it suggests a potent reversal of our ordinary way of thinking about the relations between perception and action. Instead of seeing perception as the control of action, it becomes fruitful to think of action as the control of perception [Powers 1973, Powers et al, 2011].
But I feel like this connection should be given more weight. Powers’ perceptual control theory presages predictive processing theory in a lot of ways. In particular, both share the idea of cogntitive “layers”, which act at various levels (light-intensity-detection vs. edge-detection vs. object-detection, or movements vs. positions-in-space vs. specific-muscle-actions vs. specific-muscle-fiber-tensions). Upper layers decide what stimuli they want lower levels to be perceiving, and lower layers arrange themselves in the way that produce those stimuli. PCT talks about “set points” for cybernetic systems, and PP talks about “predictions”, but they both seem to be groping at the same thing.
I was least convinced by the part of PCT which represented the uppermost layers of the brain as control systems controlling various quantities like “love” or “communism”, and which sometimes seemed to veer into self-parody. PP offers an alternative by describing those layers as making predictions (sometimes “active predictions” of the sort that guide behavior) and trying to minimize predictive error. This allows lower level systems to “control for” deviation from a specific plan, rather than just monitoring the amount of some scalar quantity.
My review of Behavior: The Control Of Perception ended by saying:
It does seem like there’s something going on where my decision to drive activates a lot of carefully-trained subsystems that handle the rest of it automatically, and that there’s probably some neural correlate to it. But I don’t know whether control systems are the right way to think about this… I think maybe there are some obvious parallels, maybe even parallels that bear fruit in empirical results, in lower level systems like motor control. Once you get to high-level systems like communism or social desirability, I’m not sure we’re doing much better than [strained control-related metaphors].
I think my instincts were right. PCT is a good model, but what’s good about it is that it approximates PP. It approximates PP best at the lower levels, and so is most useful there; its thoughts on the higher levels remain useful but start to diverge and so become less profound.
The Greek atomists like Epicurus have been totally superseded by modern atomic theory, but they still get a sort of “how did they do that?” award for using vague intuition and good instincts to cook up a scientific theory that couldn’t be proven or universally accepted until centuries later. If PP proves right, then Will Powers and PCT deserve a place in the pantheon besides them. There’s something kind of wasteful about this – we can’t properly acknowledge the cutting-edgeness of their contribution until it’s obsolete – but at the very least we can look through their other work and see if they’ve got even more smart ideas that might be ahead of their time.
(Along with his atomic theory, Epicurus gathered a bunch of philosophers and mathematicians into a small cult around him, who lived together in co-ed group houses preaching atheism and materialism and – as per the rumors – having orgies. If we’d just agreed he was right about everything from the start, we wouldn’t have had to laboriously reinvent his whole system.)
Something about this framing is bothering me. One thing is that error-minimization and surprisal-minimization should look basically the same, and so it’s not obvious that framing one as approximating the other is better than framing them as isomorphic to each other. (Whether or not they’re actually isomorphic depends on the math going on underneath, but I think both frameworks could be consistent with any expected experimental results and major aspects of each seem natural and consistent with each other.)
Another thing is that control systems seem better motivated than prediction systems. In the PCT framing, there’s the neural hierarchical control framework, and then another sub-neural control system that modifies the neural hierarchy that isn’t well understood. (This seems to be related to stuff like calorie acquisition and sexual satisfaction and so on, such that those are inherently rewarding in some way.) I don’t think the surprisal-minimization framework has an explanation for eating food being rewarding, but the controls framework can.
And, in general, when you think of people as agents, agents acting to control their perceptions seems like a better framing than agents predicting what they’ll do and then doing that (especially if they predict that they’ll act to control their perceptions!). In particular, think about Bayes scores, which are canonically measured as the log of predictions of events, which are always negative; an agent that just seeks to minimize surprisal is better off receiving no sensory data by being dead! An agent that seeks to maintain homeostatic balance observes that being dead is not being homeostatically balanced.
—
While I started out writing this comment because that framing rubbed me the wrong way, I think I should just state my overall position: I view PP and PCT as basically isomorphic, with PP how you would look at things from a unsupervised learning frame and PCT how you would look at things from a supervised learning frame. But one of the obvious things to do in a supervised learning agent in a big world is for it to also have unsupervised learning capabilities, and encourage using them at the right levels through something like curiosity, boredom, and overstimulation. Given the deep mathematical similarities, that means a synthesis of the two frames is very simple and seems desirable.
(But it also seems to me like PCT is the foundational frame; at their heart, it looks like people want homeostatic balance, not perfect prediction, and predictions are just extremely useful tools at reaching homeostatic balance in a big world.)
As a question: couldn’t things such as curiosity and boredom serve as supervisions, if you take as default the desires for knowledge and activity respectively? Many learning frames could thus be described as unsupervised within their own limited context (e.g. the process of learning to bring a spoon to your mouth is internally governed by avoiding the surprise of boiling soup ending up where it shouldn’t be) but ultimately organized by a fairly standard and straightforward set of principles, with the key one being an acquisition of new and more powerful predictive models. Otherwise, there would seem to be a slight disconnect between the unsupervised frame and the supervised one, where (as you so accurately point out) the proper goal for the unsupervised frame is to simply avoid all new data whatsoever.
(In fact, this might be explanatory for people who seem to avoid any kind of “unsupervised learning” and have very little curiosity: for whatever reason, their knowledge drives are low in the same way as some people have low sex drives, and thus they seek to shut out all potential disruptive stimulus on those fronts by avoiding anything that might make them think.)
I work as a controls engineer, and while I all I know about PCT and PP at the moment is what I’ve read in these 3 posts, I was thinking that I would implement them exactly the same way. So yes, based on what’s here I’d agree they’re isomorphic. Maybe Scott didn’t agree with the things that Powers thought were being controlled for at a higher level? That’s incidental to the theory IMO – the framework looks the same.
Every large scale dynamic electronic machine (airplane, petroleum plant, etc) is controlled by a hierarchical system of observer-predictor-controller loops. These pass information between layers not with the low level information they operate on, but an “error signal” which is the difference between their measured output and predicted output. You can stack as many of these up as you want, and get a coherent self-correcting output at every level from an abstract concept like “fly me to New York”. Or you oscillate wildly when something in your model is broken. Add in stuff like adaptive control and you can start to see a pretty nice formalized model of things we see in biology (I’d love to read more on people who have applied this to biological processes – the different neurochemicals being used as different pathways here is super interesting. Flood of dopamine in the brain = all error signals go to 0 = everything is right in the world?). PP and PCT both seem to be doing this. It’s a way of approaching our thought process that makes a lot of sense to me.
If it’s control systems all the way up, this way of looking at cognition implies a single measure of goodness that is being maximized at the highest level. Perhaps an evolutionary survive & thrive imperative that is going on above the levels we are conscious at? It fits pretty well with ideas about executive functions. But now I’m getting out of my area of expertise.
“An agent that just seeks to minimize surprisal is better off receiving no sensory data by being dead! An agent that seeks to maintain homeostatic balance observes that being dead is not being homeostatically balanced.”
If you having Surfing Uncertainty or want to borrow it from me, this is addressed in Section 8.10, but it’s hard to understand and not very convincing. It seems to be one part repeating the word “embodiment” like a mantra, one part saying “Yeah, okay, that’s because of something other than predictive processing, we didn’t say this model explained everything, and one part basically Powers’ solution – saying that we have hard-coded “predictions” of getting enough to eat or being the right temperature or whatever.
Interestingly, Clark’s description of what a hypothetical predictive agent that really tried to minimize surprisal without these constraints would do is sit in a dark room without eating or socializing or doing anything. I’m struck by the relevance to depression here, though maybe it’s a coincidence.
Scott: have you read Immanuel Kant’s Critique of Pure Reason? I remember reading somewhere that you were in philosophy as your undergrad, so there’s a substantial chance of it, but on the off chance you haven’t yet had the opportunity, I highly recommend it. I (along with some of the commenters in the previous thread) think there’s a lot of overlap between it and PP – not total, but close in several key regards. Like Greek atomism, Kant’s work makes a bunch of claims purely based on sitting down and thinking about the problem which in turn appear to be entirely correct. In particular, his expression of the careful interplay between empirical appearances and concepts as constituting the cognition of objects (as featured in the Transcendental Deduction and elsewhere) is highly predictive of PP, although I’m still a little on the fence as to whether his Categories were jumping the gun.
If you’ve read it and think there’s no relation, then I’d be interested in hearing.
Yeah, see for example here, which I still don’t feel got a really good answer.
I can see the resemblance between this and Kant’s ideas, but I think in the end PP treats perception as a glorified Photoshop Sharpen Image filter, and Kant treats it as the source of time and space and math and order and everything else.
In particular, Kant’s whole point was to resolve certain philosophical questions that AFAICT PP doesn’t do anything to resolve. That makes me think maybe the resemblance is more of a coincidence.
Did not see that post! I’d like to try and answer as best I can.
Part one: Kant was talking in part about mathematical laws, but more critically about the whole project of empirical science. Empirical science is, at its core, this kind of up-down structuring that PP is working with: expectations and sense data interact to create meaningful results. The problem at the time was that the general philosophy was torn between Locke’s style of hyper-realism and idealism from folks like Berkeley and (although not wholeheartedly) Descartes. That is, from the one direction you had people who were claiming that sense-data gave the full and true story every time, and from the other, people who said only the expectations meant anything. Idealism was pretty well scuttled from the get-go, because nobody really has ever believed it, but Lockean realism had to wait for a refutation by Humean skepticism (e.g. through the problem of induction). The problem there was that Hume didn’t leave anything really great to go on after the fact, except for a fairly undeveloped notion of “natural law.” This is where Kant steps in.
Kant begins by making a pretty bold declaration: in order for empirical science to be possible, we need to have some kind of synthetic a priori cognition. I think the best translation into contemporary lingo is “hyperpriors,” although it goes without saying that there are some differences. The idea, at any rate, is that we need to have something plugged into the system from the start in order to get meaningful results, which (surprise surprise) is what machine-learning researchers have discovered, as ksvanhorn points out here. As another, yet even stronger version of the claim: Kant says that there is a valid category “thinking being,” and that since the category is valid, there are necessary characteristics universal to all thinking beings without which they would not be thinking beings. This is a very, very important idea, which Kant doesn’t make explicit nearly as well as he ought to, and which has a ton of implications for basically every mind-based field as a foundational belief.
But if we have the necessary information plugged in from the start, what good does the sense data do? Kant’s answer: there’s a basic level of sense-molding, which sets everything we experience within the context of space and time (I’d personally recommend a cautious and conditional expansion of this, but space and time in the experiential sense, not the sense denoted by contemporary science are as universal as you can get), and then everything else is treated by a kind of calculus of sense-data and expectations. The terms he uses, for reference, are “empirical appearance” and “concept,” respectively, although his use is more subtle and deserves a proper and serious reading rather than blind substitution. The result of this is the cognition of empirical objects, which are the building blocks of science: we declare that a tree is a tree and can be studied as a tree, for example.
So, as a preliminary, why is this basically the same style of work as PP? Simple: because Kant and PP both are trying to explain what goes on in human cognition and perception such that something like empirical science is the best possible method, over (say) Aristotelian examination of scientific syllogisms. The apparent difference is basically just due to the fact that after the immense success of empirical science, with sound philosophy providing theoretical justification for it, there’s been basically no argument anywhere in the kind of Western academia that PP grew out of as to whether empirical science is the way to go. For the creators of PP, and probably you and me besides, there’s just never been a question about it. However, I’d argue that right now we’re actually in a fairly similar place to where Kant was when he was writing: on the one hand, the Analytic tradition has been trying to push some fairly naive positions about sense-data and mathematics, and on the other hand, the Continental traditions have been trying to undermine those positions with the various sorts of conditioned knowledge and lived experience and what have you (I won’t try too hard to get terms right here, because no two Continentals use the same ones anyway). Getting lost somewhere in all this kerfuffle is the human mind, which doesn’t seem to have the kind of perfect grasp of reality required for all the bizarre Analytic thought experiments but also adheres to the world more strongly than a Continental would have it. In to the rescue comes PP, which (unbeknownst to it) is justifying empirical science all over again, as a means of being kind-of-right about the world.
But the bigger question, I think, is about things-in-themselves and where mathematics come from if not from the world, right? I understand why it seems so totally absurd that Kant’s saying that we’re just imposing mathematical laws on the world, when those mathematical laws we’re imposing are good enough to land little metal darts on Mars and then have them sing Happy Birthday. The problem is, basically, that the word you’re using for “world” is what Kant would call the “empirical world,” while what Kant says escapes mathematics is the “things-in-themselves.” Things-in-themselves form a critical category which is totally absent from most contemporary discourse, which is basically what’s behind the veil. Consider the old skeptical argument, whether from Descartes’ evil demons or the more pop-culture Matrix, that denies that we’re experiencing the real world altogether. Are we simply wrong about everything we know, if we’re living in such a world? No, Kant would say; we’re right about that entire empirical world. However, the transcendental reality beyond that escapes all our current knowledge, and so we’re right to say that we don’t know the ultimate conditions under which we experience everything. What we can do is examine our own nature as minds, though, and from that declare some necessary conditions of experience: for example, that we need to sense time in order to have ordered thoughts, and space to have any experience of an “external” world (i.e. a space outside ourselves for that world to be in). This is why mathematics and logic can carry over so well between humans, and even between humans and aliens (yes, Kant wrote about aliens from space, and I’m not being silly or facetious here). The math may not be what’s behind the curtain in the end, but because of what we can tell about the structure of mind itself (through the Categories, for Kant), we can know that math will apply to our empirical worlds as well as the empirical worlds for anything with a mind. (This is an incredibly strong claim, and I think he’s entirely correct, with some hesitation about his particular Categories.)
It is worth noting that Kant is very passionate about insisting that space and time are not features of things-in-themselves, or whatever lies beyond the veil. The justification for this, as I understand it, is that space and time as we know and experience them are features of mind, not of whatever it is that lies beyond the veil. In his book, it’s miscategorization to try and apply them to what there really is out there. I think he’d be more amenable to the suggestion that there might be something vaguely analogous out there, but that we really can’t know anything about it in its ultimate state. Oh, and the reason why he’s so insistent about that point is because a lot of the deductive arguments towards idealism in his time started out by declaring space or time to be inherent features of the world as it really is. At least in part, it’s a direct stab at Berkeley. Hope that makes it feel a bit more reasonable.
So, as an overall summary: I’d say that a properly fleshed-out PP would have to end up declaring perception as the source of math at the very least, and probably time and space along with it, or else fall straight into the good old-fashioned map-territory problem (seriously, math is the best example of an iterative predictive model that’s non-identical with what’s under the hood). PP doesn’t have much to say about Berkeley and Kant doesn’t have much to say about the hegemony of European science, but that’s probably more the anachronism than anything, because the same battle seems to be going on behind all the faff with different words. For your black box, ignoring the obvious detail that the aliens could have put a different equation in, the reason why you can understand the answer they would give and they can understand the answer you would give is that there’s the same fundamental structure of mind shared between you (and you’re experiencing the same empirical object). All that we need in order to both come to the conclusion that 2 + 2 = 4 is the same basic software vis a vis counting, not some property of what’s going on behind the scenes or a magical spark going from mind to mind.
That ended up being quite long! I would be happy to go over any part of this in more detail, and if you give me some time to dig up my Kant volume, I can even give you citations. It’s good stuff, I believe, although Kant made a serious mistake when he tried shifting over to ethics, which is why he ends up saying such weird things there. Hope it all made sense and wasn’t too much of a bore.
I don’t know about your open thread question, sorry, but if this is the ‘asking confused questions about Kant’ section can I ask mine:
One argument I often see is that the discovery of non-Euclidean geometries messed up Kant’s program for a priori geometry. I’ve never really understood that argument, though.
It is really surprising and interesting that the parallel postulate is independent and we can have different geometries. But I don’t see how this has changed my own inbuilt geometric intuition. Nothing about my perceptual intuition for parallel lines has changed in a way that makes them look more like they ever want to cross! And if I’m trying to reason about a non-Euclidean geometry and want to visualise something, I’m still going to do some sort of projection into Euclidean space, e.g. use some model of the hyperbolic plane.
I’m sure there’s some horrible German post-Kantian philosopher I should read, but I don’t know which one.
Yeah, the reason you don’t get that argument is that it doesn’t work. Kant definitely did the Kant thing and failed to anticipate that there could be some fairly radical expansions to prior fields of knowledge, but although a priori geometry by itself isn’t so much a thing, a priori Euclidean geometry is a thing, as well as a priori non-Euclidean geometry. Heck, you can even introduce non-Euclidean geometry as a weirder extension of normal geometry and just keep on tickin’, same as any other field of math. All of this allows for the same basic program for Kant, which is that at no point in the study of any kind of geometry do you absolutely have to stop, go outside, and measure things in order to prove your next point. You might not know what proofs are useful without going outside and noticing, say, that the earth is approximately ball-shaped, but completing them takes zero measuring and experimentation (as opposed to figuring out the gravitational constant). That’s all he needed in order for it to be properly a priori.
Thanks, that’s helpful as I could never make much sense of the objection. Agree that that geometry being non-Euclidean doesn’t stop it being a priori.
I don’t know, I think I wanted to claim something a bit stronger though. In your long comment above you’re careful to distinguish ‘space and time in the experiential sense, not the sense denoted by contemporary science’. I kind of want to say that that experiential sense just is Euclidean, and Kant was right!
(I tend to think about flat tangent spaces to curved manifolds in this weird semi-Kantian sense – that it’s my local experiential approximation, or something, but that reality might not agree with me on large scales.)
What does PP say at high levels that you can tell it diverges from PCT?
My impression, which might have been wrong, was something like PCT wanting higher levels to control a scalar quantity (like “amount of love”).
PP doesn’t really present higher level predictions as being motivational (though maybe it could), but if they were I feel like it would be more like “similarity to my ideal relationship”.
I’m unclear as to what the content of claiming a PP style perceptual control of action really amounts to.
I mean in what sense isn’t it true that any system which learns which control signals to provide in order to create a certain desired perceptual state can be redescribed in terms of minimizing predicted error? In other words given a control system whose goals are understood in terms of certain perceptual states, e.g., an automated vehicle trying to ensure it is between the lines and learning the correct control to apply to the wheels to achieve that, can’t we always just call the desired behavior (staying between the lines or correcting course by such and such amount) a prediction and thereby vindicate the idea that control is just another instance of minimizing perceptual predictive error?
Maybe there is something I’m missing but I’m worried that one could redescribe just about any system which learns which controls to apply by comparing the actual sensory inputs to some desired state could be so described. The fact that there seems to be considerable freedom in choosing what to call a prediction (it apparently encompass fairly abstract notions that can have pretty attenuated links with the actual sensory input…as it must if the theory is to handle even rudimentary actions) and little specification of a particular manner in which perceptual predictions are minimized adds to this worry. If so that would make this model trivially true but also not pretty vacuous.
Maybe it would help if someone could describe a plausible way this kind of learned control behavior could function that is ruled out by this theory.
Hey scott, one of the issues that bothers me about the whole topic is how bad people are about probability in general. How is having a really bad flood now mean that we shouldn’t expect a flood for the next 100 years? how is the system generating these (strange) abstract predictions?
This all feels very convenient. In this post, and the book review, you made a lot of claims about the predictive processing model, and how it neatly explains a bunch of different already-existing data.
Isn’t that a Science Deadly Sin? Like, anyone can look at existing data and say “Ahah, I know why this happens”, but only real scientists can say “This is will happen, this is impossible and will never happen”?
I feel like these posts need an “epistemic status” warning. Unless you’re seeing something I’m not seeing, and I’m just projecting my confusion on you.
But until you put forth some prediction, or some form of model that says “this is what can happen” without reusing existing observations… well, I won’t be convinced, and I don’t think anyone should be.
The prediction is just that all nearly-thinking things have to work this way, and that anything that works in a different way (e.g. classic computer structure) just doesn’t qualify as a thinking thing. Anything with an upstream-downstream flow of sense data and expectations and a set of motivating forces will get most of the way there by itself. Anything without those will never get close.
In addition, you can see Scott trying to apply this to his own work with the schizophrenia suggestions. The idea is, of course, that if upstream-downstream malfunction is what’s going on with most mental illnesses, you can start to predict with greater accuracy which drugs will work there, which currently isn’t going well at all. Scott, not being an active biomedical chemist, doesn’t have any particular predictions, but a dedicated research team could come up with some and then test them. (Related: the reason that current drug-effect predictions aren’t working well could be that existing models don’t actually explain or predict anything.)
So yeah, this does have the potential to be a just-so story, but it’s already branching out in the direction of useful forecasting. The theory does come before certain practices sometimes.
That last bit makes this sound like it is not an actual prediction about the world but rather just a delineation of terminology. How do you cash it out into an actual prediction?
Well, the category “thinking thing” has some other requirements that we expect to see. It’s not like it was just made up for the sake of PP. It’s a category that was basically made for humans alone, and is tenuously extended with some severe caveats to other animals (some people try it with plants; I don’t buy it). If something is able to act very much like an animal in how it learns and interacts with the world, then we can put it into the same category as them. If it doesn’t, we won’t put it in that same category. Intuition and standard use are what can keep this category grounded, instead of just being so loose as to say whatever we want.
For the record, I think that some of our robots might be at pillbug or earthworm levels of competence. That’s not saying a lot, but it is quite cool!
This model is very similar to HTM from Jeff Hawkins – Hierarchical Temporal Memory. The Hawkins model has a *lot* more detail though.
https://en.wikipedia.org/wiki/Hierarchical_temporal_memory
https://numenta.org/resources/HTM_CorticalLearningAlgorithms.pdf
This is essentially how a lot of control systems in industrial processing, robotics, aviation, etc work. You have a set of sensors that measure the state of your system, and you have a set of actuators that affect the state of your system, and you use a feedback controller to drive the actuators until your system approaches some target state as measured by the sensors. Autopilot in an aircraft? Feedback controller that detects the aircraft’s attitude and moves the control surfaces to move the aircraft toward your desired attitude. Nuclear power plant? Feedback controller that moves the control rods in and out to match actual output power with desired output power. Feedback controllers are so exceptionally useful and powerful that quite often you don’t even any sort of open-loop or predictive controller at all.
I’ve read a theory that a human being is basically just a shitload of feedback controllers layered on top of each other. Not just proprioception, but all the way up to complex behaviors like “I am hungry -> seek food to reduce hunger”. No word on whether this extends all the way up to things like “I want to be stronger -> Spend the next six months in a gym.”