pt02.html

2.1 The core flow of information is top-down, not bottom-up, and the forward flow of sensory information is replaced by the forward flow of prediction error

This is the heart and soul of the radical vision. Incoming sensory information, if PP is correct, is constantly met with a cascade of top-down prediction, whose job is to predict the incoming signal across multiple temporal and spatial scales.

To see how this works in practice, consider a seminal proof-of-concept by Rao & Ballard (1999). In this work, prediction-based learning targets image patches drawn from natural scenes using a multi-layer artificial neural network. The network had no pre-set task apart from that of using the downwards connections to match input samples with successful predictions. Instead, visual signals were processed via a hierarchical system in which each level tried (in the way just sketched) to predict activity at the level below it using recurrent (feedback) connections. If the feedback successfully predicted the lower-level activity, no further action was required. Failures to predict enabled tuning and revision of the model (initially, just a random set of connection weights) generating the predictions, thus slowly delivering knowledge of the regularities governing the domain. In this architecture, forward connections between levels carried only the “residual errors” (Rao & Ballard 1999, p. 79) between top-down predictions and actual lower level activity, while backward or recurrent connections carried the predictions themselves.

After training, the network developed a nested structure of units with simple-cell-like receptive fields and captured a variety of important, empirically-observed effects. One such effect was “end-stopping”. This is a “non-classical receptive field” effect in which a neuron responds strongly to a short line falling within its classical receptive field but (surprisingly) shows diminishing response as the line gets longer. Such effects (and with them, a whole panoply of “context effects”) emerge naturally from the use of hierarchical predictive processing. The response tails off as the line gets longer, because longer lines and edges were the statistical norm in the natural scenes to which the network was exposed in training. After training, longer lines are thus what is first predicted (and fed back, as a hypothesis) by the level-two network. The strong firing of some level-one “edge cells”, when they are driven by shorter lines, thus reflects not successful feature detection by those cells but rather error or mismatch, since the short segment was not initially predicted by the higher-level network. This example neatly illustrates the dangers of thinking in terms of a simple cumulative flow of feature-detection, and also shows the advantages of re-thinking the flow of processing as a mixture of top-down prediction and bottom-up error correction.[2] In addition it highlights the way these learning routines latch on to the world in a manner specified by the training data. End-stopped cells are simply a response to the structure of the natural scenes used in training, and reflect the typical length of the lines and edges in these natural scenes. In a very different world (such as the underwater world of some sea-creatures) such cells would learn very different responses.

These were early and relatively low-level results, but the predictive processing model itself has proven rich and (as we shall see) widely applicable. It assumes only that the environment generates sensory signals by means of nested interacting causes and that the task of the perceptual system is to invert this structure by learning and applying a structured internal model—so as to predict the unfolding sensory stream. Routines of this kind have recently been successfully applied in many domains, including speech perception, reading, and recognizing the actions of oneself and of other agents (see Poeppel & Monahan 2011; Price & Devlin 2011; Friston et al. 2011). This is not surprising, since the underlying rationale is quite general. If you want to predict the way some set of sensory signals will change and evolve over time, a good thing to do is to learn how those sensory signals are determined by interacting external causes. And a good way to learn about those interacting causes is to try to predict how the sensory signal will change and evolve over time.

Now try to imagine this this on a very grand scale. To predict the visually presented scene, the system must learn about edges, blobs, line segments, shapes, forms, and (ultimately) objects. To predict text, it must learn about interacting “hidden” causes in the linguistic domain: causes such as sentences, words, and letters. To predict all of our rich multi-modal plays of sensory data, across many scales of space and time, it must learn about interacting hidden causes such as tables, chairs, cats, faces, people, hurricanes, football games, goals, meanings, and intentions. The structured world of human experience, if this is correct, comes into view only when all manner of top-down predictions meet (and “explain away”) the incoming waves of sensory information. What propagates forwards (through the brain, away from the sensory peripheries) is then only the mismatches, at every level, with predicted activity.

This makes functional sense. Given that the brain is ever-active, busily predicting its own states at many levels, all that matters (that is, all that is newsworthy, and thus ought to drive further processing) are the incoming surprises: unexpected deviations from what is predicted. Such deviations result in prediction errors reflecting residual differences, at every level and stage of processing, between the actual current signal and the predicted one. These error signals are used to refine the prediction until the sensory signal is best accommodated.

Prediction error thus “carries the news”, and is pretty much the hero (or anti-hero) of this whole family of models. So much so, that it is sometimes said that:

In predictive coding schemes, sensory data are replaced by prediction error, because that is the only sensory information that has yet to be explained. (Feldman & Friston 2010, p. 2)

We can now savor the radicalism. Where traditional, feed-forward-based views see a progressive (though top-down modulated) flow of complex feature-detection, the new view depicts a progressive, complex flow of feature prediction. The top-down flow is not mere attentional modulation. It is the core flow of structured content itself. The forward-flowing signal, by contrast, has now morphed into a stream of residual error. I want to suggest, however, that we treat this apparently radical inversion with some caution. There are two reasons for this—one conceptual, and one empirical.

The first (conceptual) reason for caution is that the “error signal” in a trained-up predictive coding scheme is highly informative. Prediction error signals carry detailed information about the mismatched content itself. Prediction errors are thus as structured and nuanced in their implications as the model-based predictions relative to which they are computed. This means that, in a very real sense, the prediction error signal is not a mere proxy for incoming sensory information – it is sensory information. Thus, suppose you and I play a game in which I (the “higher, predicting level”) try to describe to you (the “lower level”) the scene in front of your eyes. I can’t see the scene directly, but you can. I do, however, believe that you are in some specific room (the living room in my house, say) that I have seen in the past. Recalling that room as best I can, I say to you “there’s a vase of yellow flowers on a table in front of you”. The game then continues like this. If you are silent, I take that as your agreeing to my description. But if I get anything that matters wrong, you must tell me what I got wrong. You might say “the flowers are yellow”. You thus provide an error signal that invites me to try again in a rather specific fashion—that is, to try again with respect to the colour of the flowers in the vase. The next most probable colour, I conjecture, is red. I now describe the scene in the same way but with red flowers. Silence. We have settled into a mutually agreeable description.[3]

The point to note is that your “error signal” carried some quite specific information. In the pragmatic context of your silence regarding all other matters, the content might be glossed as “there is indeed a vase of flowers on the table in front of me but they are not yellow”. This is a pretty rich message. Indeed, it does not (content-wise) seem different in kind to the downward-flowing predictions themselves. Prediction error signals are thus richly informative, and as such (I would argue) not radically different to sensory information itself. This is unsurprising, since mathematically (as Karl Friston has pointed out[4]) sensory information and prediction error are informationally identical, except that the latter are centred on the predictions. To see this, reflect on the fact that prediction error is just the original information minus the prediction. It follows that the original information is given by the prediction error plus the prediction. Prediction error is simply error relative to some specific prediction and as such it flags the sensory information that is as yet unexplained. The forward flow of prediction error thus constitutes a forward flow of sensory information relative to specific predictions.

There is more to the story at this point, since the (complex, non-linear) ways in which downward-flowing predictions interact are importantly different to the (simple, linear) effects of upward-flowing error signals. Non-linearities characterize the multi-level construction of the predictions, which do the “heavy lifting”, while the prediction error signals are free to behave additively (since all the complex webs of linkage are already in place). But the bottom line is that prediction error does not replace sensory information in any mysterious or conceptually challenging fashion, since prediction error is nothing other than that sensory information that has yet to be explained.

The second (empirical) reason for caution is that it is, in any case, only one specific implementation of the predictive brain story depicts the forward-flow as consisting solely of prediction error. An alternative implementation (due to Spratling 2008 and 2010—and see discussion in Spratling 2013) implements the same key principles using a different flow of prediction and error, and described by a variant mathematical framework. This illustrates the urgent need to explore multiple variant architectures for prediction error minimization. In fact, the PP schema occupies just one point in a large and complex space of probabilistic generative-model-based approaches, and there are many possible architectures and possible ways of combining top-down predictions and bottom-up sensory information in this general vicinity. These include foundational work by Hinton and colleagues on deep belief networks (Hinton & Salakhutdinov 2006; Hinton et al. 2006), work that shares a core emphasis on the use of prediction and probabilistic multi-level generative models, as well as recent work combining connectionist principles with Bayesian angles (see McClelland 2013 and Zorzi et al. 2013). Meanwhile, roboticists such as Tani (2007), Saegusa et al. (2008), Park et al. (2012), Pezzulo (2008), and Mohan et al. (2010) explore the use of a variety of prediction-based learning routines as a means of grounding higher cognitive functions in the solid bedrock of sensorimotor engagements with the world. Only by considering the full space of possible prediction-and-generative-model-based architectures and strategies can we start to ask truly pointed experimental questions about the brain and about biological organisms; questions that might one day favor one of these models (or, more likely, one coherent sub-set of models[5]) over the rest, or else may reveal deep faults and failings among their substantial common foundations.

2.2 Motor control is just more top-down sensory prediction

I shall, however, continue to concentrate upon the specific explanatory schema implied by PP, as this represents (it seems to me) the most comprehensive and neuroscientifically well-grounded vision of the predictive mind currently available. What makes PP especially interesting—and conceptually challenging—is the seamless integration of perception and action achieved under the rubric of “active inference”.

To understand this, consider the motor system. The motor system (like the visual cortex) displays a complex hierarchical structure.[6] Such a structure allows complex behaviors to be specified, at higher levels, in compact ways, the implications of which can be progressively unpacked at the lower levels. The traditional way of conceptualizing the difference, however, is that in the case of motor control we imagine a downwards flow of information, whereas in the case of the visual cortex we imagine an upwards flow. Descending pathways in the motor cortex, this traditional picture suggests, should correspond functionally to ascending pathways in the visual cortex. This is not, however, the case. Within the motor cortex the downwards connections (descending projections) are “anatomically and physiologically more like backwards connections in the visual cortex than the corresponding forward connections” (Adams et al. 2013, p. 1).

This is suggestive. Where we might have imagined the functional anatomy of a hierarchical motor system to be some kind of inverted image of that of the perceptual system, instead the two seem fundamentally alike.[7] The explanation, PP suggests, is that the downwards connections, in both cases, take care of essentially the same kind of business—namely the business of predicting sensory stimulation. Predictive processing models subvert, we saw, the traditional picture with respect to perception. In PP, compact higher-level encodings are part of an apparatus that tries to predict the play of energy across the sensory surfaces. The same story applies, recent extensions (see below) of PP suggest, to the motor case. The difference is that motor control is, in a certain sense, subjunctive. It involves predicting the non-actual sensory trajectories that would ensue were we performing some desired action. Reducing prediction errors calculated against these non-actual states then serves (in ways we are about to explore) to make them actual. We predict the sensory consequences of our own action and this brings the actions about.

The upshot is that the downwards connections, in both the motor and the sensory cortex, carry complex predictions, and the upwards connections carry prediction errors. This explains the otherwise “paradoxical” (Shipp et al. 2013, p. 1) fact that the functional circuitry of the motor cortex does not seem to be inverted with respect to that of the sensory cortex. Instead, the very distinction between the motor and the sensory cortex is now eroded—both are in the business of top-down prediction, though the kind of thing they predict is (of course) different. The motor cortex here emerges, ultimately, as a multimodal sensorimotor area issuing predictions in both proprioceptive and other modalities.

In this way, PP models have been extended (under the umbrella of “active inference”—see Friston 2009; Friston et al. 2011) to include the control of action. This is accomplished by predicting the flow of sensation (especially that of proprioception) that would occur were some target action to be performed. The resulting cascade of prediction error is then quashed by moving the bodily plant so as to bring the action about. Action thus results from our own predictions concerning the flow of sensation—a version of the “ideomotor” theory of James (1890) and Lotze (1852), according to which the very idea of moving, when unimpeded by other factors, is what brings the moving about. The resulting schema is one in which:

The perceptual and motor systems should not be regarded as separate but instead as a single active inference machine that tries to predict its sensory input in all domains: visual, auditory, somatosensory, interoceptive and, in the case of the motor system, proprioceptive. ( Adams et al. 2013 , p. 4)

In the case of motor behaviors, the key driving predictions, Friston and colleagues suggest, are predictions of the proprioceptive patterns[8] that would ensue were the action to be performed (see Friston et al. 2010). To make an action come about, the motor plant responds so as to cancel out proprioceptive prediction errors. In this way, predictions of the unfolding proprioceptive patterns that would be associated with the performance of some action serve to bring that action about. Proprioceptive predictions directly elicit motor actions (so traditional motor commands are simply replaced by those proprioceptive predictions).

This erases any fundamental computational line between perception and the control of action. There remains, to be sure, an obvious (and important) difference in direction of fit. Perception here matches neural hypotheses to sensory inputs, and involves “predicting the present”; while action brings unfolding proprioceptive inputs into line with neural predictions. The difference, as Elizabeth Anscombe (1957) famously remarked,[9] is akin to that between consulting a shopping list to select which items to purchase (thus letting the list determine the contents of the shopping basket) and listing some actually purchased items (thus letting the contents of the shopping basket determine the list). But despite this difference in direction of fit, the underlying form of the neural computations is now revealed to be the same. Indeed, the main difference between the motor and the visual cortex, on this account, lies more in what kind of thing (for example, the proprioceptive consequences of a trajectory of motion) is predicted, rather than in how it is predicted. The upshot is that:

The primary motor cortex is no more or less a motor cortical area than striate (visual) cortex. The only difference between the motor cortex and visual cortex is that one predicts retinotopic input while the other predicts proprioceptive input from the motor plant. (Friston et al. 2011, p. 138)

Perception and action here follow the same basic logic and are implemented using the same computational strategy. In each case, the systemic imperative remains the same: the reduction of ongoing prediction error. This view has two rather radical consequences, to which we shall now turn.

2.3 Efference copies and distinct “controllers” are replaced by top-down predictions

A long tradition in the study of motor control invokes a “forward model” of the likely sensory consequences of our own motor commands. In this work, a copy of the motor command (known as the “efference copy”; Von Holst 1954) is processed using the forward model. This model captures (or “emulates”—see Grush 2004) the relevant biodynamics of the motor plant, enabling (for example) a rapid prediction of the likely feedback from the sensory peripheries. It does this by encoding the relationship between motor commands and predicted sensory outcomes. The motor command is thus captured using the efference copy which, fed to the forward model, yields a prediction of the sensory outcome (this is sometimes called the “corollary discharge”). Comparisons between the actual and the predicted sensory input are thus enabled.

But motor control, in the leading versions of this kind of account, requires in addition the development and use of a so-called “inverse model” (see e.g., Kawato 1999; Franklin & Wolpert 2011). Where the forward model maps current motor commands in order to predicted sensory effects, the inverse model (also known as a controller) “performs the opposite transformation […] determining the motor command required to achieve some desired outcome” (Wolpert et al. 2003, p. 595). Learning and deploying an inverse model appropriate to some task is, however, generally much more demanding than learning the forward model, and requires solving a complex mapping problem (linking the desired end-state to a nested cascade of non-linearly interacting motor commands) while effecting transformations between varying co-ordinate schemes (e.g., visual to muscular or proprioceptive—see e.g., Wolpert et al. 2003, pp. 594–596).

PP (the full “action-inclusive” version just described) shares many key insights with this work. They have common a core emphasis on the prediction-based learning of a forward (generative) model, which is able to anticipate the sensory consequences of action. But active inference, as defended by Friston and others—see e.g., Friston (2011); Friston et al. (2012)—dispenses with the inverse model or controller, and along with it the need for efference copy of the motor command. To see how this works, consider that action is here reconceived as a direct consequence of predictions (spanning multiple temporal and spatial scales) about trajectories of motion. Of special importance here are predictions about proprioceptive consequences that implicitly minimize various energetic costs. Subject to the full cascade of hierarchical top-down processing, a simple motor command now unfolds into a complex set of predictions concerning both proprioceptive and exteroceptive effects. The proprioceptive predictions then drive behavior, causing us to sample the world in the ways that the current winning hypothesis dictates.[10]

Such predictions can be couched, at the higher levels, in terms of desired states or trajectories specified using extrinsic (world-centered, limb-centered) co-ordinates. This is possible because the required translation into intrinsic (muscle-based) co-ordinates is then devolved to what are essentially classical reflex arcs set up to quash priorioceptive prediction errors. Thus:

if motor neurons are wired to suppress proprioceptive prediction errors in the dorsal horn of the spinal cord, they effectively implement an inverse model, mapping from desired sensory consequences to causes in intrinsic (muscle-based) coordinates. In this simplification of conventional schemes, descending motor commands become topdown predictions of proprioceptive sensations conveyed by primary and secondary sensory afferents. (Friston 2011, p. 491)

The need (prominent in approaches such as Kawato 1999; Wolpert et al. 2003; and Franklin & Wolpert 2011) for a distinct inverse model/optimal control calculation has now disappeared. In its place we find a more complex forward model mapping prior beliefs about desired trajectories to sensory consequences, some of which (the “bottom level” prorioceptive ones) are automatically fulfilled.

The need for efference copy has also disappeared. This is because descending signals are already (just as in the perceptual case) in the business of predicting sensory (both proprioceptive and exteroceptive) consequences. By contrast, so-called “corollary discharge” (encoding predicted sensory outcomes) is now endemic and pervades the downwards cascade, since:

[…] every backward connection in the brain (that conveys topdown predictions) can be regarded as corollary discharge, reporting the predictions of some sensorimotor construct. (Friston 2011, p. 492)

This proposal may, on first encounter, strike the reader as quite implausible and indeed too radical. Isn’t an account of the functional significance and neurophysiological reality of efference copy one of the major success stories of contemporary cognitive and computational neursocience? In fact, most (perhaps all) of the evidence often assumed to favour that account is, on closer examination, simply evidence of the pervasive and crucial role of forward models and corollary discharge—it is evidence, that is to say, for just those parts of the traditional story that are preserved (and made even more central) by PP. For example, Sommer & (Wurtz’s influential (2008) review paper makes very little mention of efference copy as such, but makes widespread use of the more general concept of corollary discharge—though as those authors note, the two terms are often used interchangeably in the literature. A more recent paper, Wurtz et al. (2011), mentions efference copy only once, and does so only to merge it with discussions of corollary discharge (which then occur 114 times in the text). Similarly, there is ample reason to believe that the cerebellum plays a special role here, and that that role involves making or optimizing perceptual predictions about upcoming sensory events (Bastian 2006; Roth et al. 2013). But such a role is, of course, entirely consistent with the PP picture. This shows, I suggest, that it is the general concept of forward models (as used by e.g., Miall & Wolpert 1996) and corollary discharge, rather than the more specific concept of efference copy as we defined it above, that enjoys the clearest support from both experimental and cognitive neuroscience.

Efference copy figures prominently, of course, in one particular set of computational proposals. These proposals concern (in essence) the positioning of forward models and corollary discharges within a putative larger cognitive architecture involving multiple paired forward and inverse models. In these “paired forward inverse model” architectures (see e.g., Wolpert & Kawato 1998; Haruno et al. 2003) motor commands are copied to a stack of separate forward models that are used to predict the sensory consequences of actions. But acquiring and deploying such an architecture, as even its strongest advocates concede, poses a variety of extremely hard computational challenges (see Franklin & Wolpert 2011). The PP alternative neatly sidesteps many of these problems—as we shall see in section 2.4. The heavy lifting that is usually done by traditional efference copy, inverse models, and optimal controllers is now shifted to the acquisition and use of the predictive (generative) model—i.e., the right set of prior probabilistic “beliefs”. This is potentially advantageous if (but only if) we can reasonably assume that these beliefs “emerge naturally as top-down or empirical priors during hierarchical perceptual inference” (Friston 2011, p. 492).

The deeper reason that efference copy may be said to have disappeared in PP is thus that the whole (problematic) structure of paired forward and inverse models is absent. It is not needed, because some of the predicted sensory consequences (the predicted proprioceptive trajectories) act as motor commands already. As a result, there are no distinct motor commands to copy, and (obviously) no efference copies as such. But one could equally well describe the forward-model-based predictions of proprioceptive trajectories as “minimal motor commands”: motor commands that operate (in essence) by specifying results rather than by exerting fine-grained limb and joint control. These minimal motor commands (proprioceptive predictions) clearly influence the even wider range of predictions concerning the exteroceptive sensory consequences of upcoming actions. The core functionality that is normally attributed to the action of efference copy is thus preserved in PP, as is the forward-model-based explanation of core phenomena, such as the finessing of time-delays (Bastian 2006) and the stability of the visual world despite eye-movements (Sommer & Wurtz 2006; 2008).

2.4 Cost functions are absorbed by predictions.

Active inference also sidesteps the need for explicit cost or value functions as a means of selecting and sculpting motor response. It does this (Friston 2011; Friston et al. 2012) by, in essence, building these in to the generative model whose probabilistic predictions combine with sensory inputs in order to yield behaviors. Simple examples of cost or value functions (that might be applied to sculpt and select motor behaviors) include minimizing “jerk” (the rate of change of acceleration of a limb during some behavior) and minimizing rate of change of torque (for these examples see Flash & Hogan 1985 and Uno et al. 1989 respectively). Recent work on “optimal feedback control” minimizes more complex “mixed cost functions” that address not just bodily dynamics but also systemic noise and the required accuracy of outcomes (see Todorov 2004; Todorov & Jordan 2002). Such cost functions (as Friston 2011, p. 496 observes) resolve the many-one mapping problem that afflicts classical approaches to motor control. There are many ways of using one’s body to achieve a certain goal, but the action system has to choose one way from the many available. Such devices are not, however, needed within the framework on offer, since:

In active inference, these problems are resolved by prior beliefs about the trajectory (that may include minimal jerk) that uniquely determine the (intrinsic) consequences of (extrinsic) movements. (Friston 2011, p. 496)

Simple cost functions are thus folded into the expectations that determine trajectories of motion. But the story does not stop there. For the very same strategy applies to the notion of desired consequences and rewards at all levels. Thus we read that:

Crucially, active inference does not invoke any “desired consequences”. It rests only on experience-dependent learning and inference: experience induces prior expectations, which guide perceptual inference and action. (Friston et al. 2011, p. 157)

Notice that there is no overall computational advantage to be gained by this reallocation of duties. Indeed, Friston himself is clear that:

[…] there is no free lunch when replacing cost functions with prior beliefs [since] it is well-known [Littman et al. (2001)] that the computational complexity of a problem is not reduced when formulating it as an inference problem. (2011, p. 492)

Nonetheless, it may well be that this reallocation (in which cost functions are treated as priors) has conceptually and strategically important consequences. It is easy, for example, to specify whole paths or trajectories using prior beliefs about (you guessed it) paths and trajectories! Scalar reward functions, by contrast, specify points or peaks. The upshot is that everything that can be specified by a cost function can be specified by some prior over trajectories, but not vice versa.

Related concerns have led many working roboticists to argue that explicit cost-function-based solutions are inflexible and biologically unrealistic, and should be replaced by approaches that entrain actions in ways that implicitly exploit the complex attractor dynamics of embodied agents (see e.g., Thelen & Smith 1994; Mohan & Morasso 2011; Feldman 2009). One way to imagine this broad class of solutions (for a longer discussion, see Clark 2008, Ch. 1) is by thinking of the way you might control a wooden marionette simply by moving the strings attached to specific body parts. In such cases:

The distribution of motion among the joints is the “passive” consequence of the […] forces applied to the end-effectors and the “compliance” of different joints. (Mohan & Morasso 2011, p. 5)

Solutions such as these, which make maximal use of learnt or inbuilt “synergies” and the complex bio-mechanics of the bodily plant, can be very fluently implemented (see Friston 2011; Yamashita & Tani 2008) using the resources of active inference and (attractor-based) generative models. For example, Namikawa et al. (2011) show how a generative model with multi-timescale dynamics enables a fluent and decomposable (see also Namikawa & Tani 2010) set of motor behaviors. In these simulations:

Action per se, was a result of movements that conformed to the proprioceptive predictions of […] joint angles [and] perception and action were both trying to minimize prediction errors throughout the hierarchy, where movement minimized the prediction errors at the level of proprioceptive sensations. (Namikawa et al. 2011, p. 4)

Another example (which we briefly encountered in the previous section) is the use of downward-flowing prediction to side-step the need to transform desired movement trajectories from extrinsic (task-centered) to intrinsic (e.g., muscle-centered) co-ordinates: an “inverse problem” that is said to be both complex and ill-posed (Feldman 2009; Adams et al. 2013, p. 8). In active inference the prior beliefs that guide motor action already map predictions couched (at high levels) in extrinsic frames of reference onto proprioceptive effects defined over muscles and effectors, simply as part and parcel of ordinary online control.

By re-conceiving cost functions as implicit in bodies of expectations concerning trajectories of motion, PP-style solutions sidestep the need to solve difficult (often intractable) optimality equations during online processing (see Friston 2011; Mohan & Morasso 2011) and—courtesy of the complex generative model—fluidly accommodate signaling delays, sensory noise, and the many-one mapping between goals and motor programs. Alternatives requiring the distinct and explicit computation of costs and values thus arguably make unrealistic demands on online processing, fail to exploit the helpful characteristics of the physical system, and lack biologically plausible means of implementation.

These various advantages come, however, at a price. For the full PP story now shifts much of the burden onto the acquisition of those prior “beliefs”—the multi-level, multi-modal webs of probabilistic expectation that together drive perception and action. This may turn out to be a better trade than it at first appears, since (see Clark in in press) PP describes a biologically plausible architecture that is just about maximally well-suited to installing the requisite suites of prediction, through embodied interactions with the training environments that we encounter, perturb, and—at several slower timescales—actively construct.

2 Radical predictive processing

2.1 The core flow of information is top-down, not bottom-up, and the forward flow of sensory information is replaced by the forward flow of prediction error

2.2 Motor control is just more top-down sensory prediction

2.3 Efference copies and distinct “controllers” are replaced by top-down predictions

2.4 Cost functions are absorbed by predictions.