[1]
For this observation, see Friston (2005), p. 825, and the discussion in Hohwy (2013).
[2]

This does not mean that there are no cells in v1 or elsewhere whose responses match the classical profile. PP claims that each neural area contains two kinds of cell, or at least supports two functionally distinct response profiles, such that one functionality encodes error and the other current best-guess content. This means that there can indeed be (as single cell recordings amply demonstrate) recognition cells in each area, along with the classical response profiles. For more on this important topic, see Clark (2013a).

[3]

To complete the image using this parlour game, we’d need to add a little more structure to reflect the hierarchical nature of the message-passing scheme. We might thus imagine many even-higher-level “prediction agents” working together to predict which room (house, world, etc.) the layers below are currently responding to. Should sufficient prediction error signals accrue, this ensemble might abandon the hypothesis that signals are coming in from the living room, suggesting instead that they are from the boudoir, or the attic. In this grander version (which recalls the “mixtures of experts” model in machine learning—see Jordan & Jacobs 1994)—there are teams (and teams of teams) of specialist prediction agents, all trying (guided top-down by the other prediction agents, and bottom-up by prediction errors from the level below) to decide which specialists should handle the current sensory barrage. Each higher-level “prediction agent”, in this multi-level version, treats activity at the level below as sensory information, to be explained by the discovery of apt top-down predictions.

[4]
Personal communication.
[5]

One such subset is, of course, the set of hierarchical dynamic models (see Friston 2008).

[6]
The appeal to hierarchical structure in PP, it should be noted, is substantially different to that familiar from treatments such as Felleman & Van Essen (1991). Although I cannot argue for this here (for more on this see Clark 2013b; in press) the PP hierarchy is fluid in that the information-flows it supports are reconfigurable moment-by-moment (by, for example, changing be and theta band oscillations —see Bastos et al. 2015). In addition, PP dispenses entirely with the traditional idea (nicely reviewed, and roundly rejected, in Churchland et al. 1994) that earlier levels must complete their tasks before passing information “up” the hierarchy. The upshot is that the PP models are much closer to dynamical systems accounts than to traditional, feed forward, hierarchical ones.
[7]

For the full story, see Adams et al. (2013). In short: “[t]he descending projections from motor cortex share many features with top-down or backward connections in visual cortex; for example, corticospinal projections originate in infragranular layers, are highly divergent and (along with descending cortico-cortical projections) target cells expressing NMDA receptors” (Adams et al. 2013, p. 1).

[8]

Proprioception is the “inner” sense that informs us about the relative locations of our bodily parts and the forces and efforts that are being applied. It is to be distinguished from exteroceptive (i.e., standard perceptual) channels such as vision and audition, and from interoceptive channels informing us of hunger, thirst, and states of the viscera. Predictions concerning the latter may (see Seth 2013 and Pezzulo 2014) play a large role in the construction of feelings and emotions.

[9]

Anscombe’s target was the distinction between desire and belief, but her observations about direction of fit generalize (as Shea 2013 notes) to the case of actions, here conceived as the motoric outcomes of certain forms of desire.

[10]

For a simulation-based demonstration of the overall shape of the PP account, see Friston et al. (2012). These simulations, as the authors note, turn out to implement the kind of “active vision” account put forward in Wurtz et al. (2011).

[11]

Malfunctions of this precision-weighting apparatus have recently been implicated in a number of fascinating proposals concerning the origins and persistence of various forms of mental disturbance, including the emergence of delusions and hallucinations in schizophrenia, “functional motor and sensory symptoms”, Parkinson’s disease, and autism—see Fletcher & Frith (2009), Frith & Friston (2012), Adams et al. (2012), Brown et al. (2013), Edwards et al. (2012), and Pellicano & Burr (2012).

[12]

There are related accounts of how dogs catch Frisbees—a rather more demanding task due to occasional dramatic fluctuations in the flight path (see Shaffer et al. 2004).

[13]

Current thinking about switching between model-free and model-based strategies places them squarely in the context of hierarchical inference, through the use of “Bayesian parameter averaging”. This essentially associates model-free schemes with simpler (less complex) lower levels of the hierarchy that may, at times, need to be contextualized by (more complex) higher levels.

[14]

For a thorough rehearsal of the positive arguments, see Clark (2008). For critiques, see Rupert (2004, 2009), Adams & Aizawa (2001), and Adams & Aizawa (2008). For a rich sampling of the ongoing debate, see the essays in Menary (2010) and Estany & Sturm (2014).