[1]
Unless stated otherwise, all page numbers refer to the target paper by Anil Seth.
[2]
It is more general, because predictive processing only plays a role in it if combined with the Laplace approximation (which entails, roughly, that probability distributions are approximated by Gaussian distributions). This approximation, however, also turns FEP into a more specific version, by assuming that the brain codes probability distribution as Gaussian distributions (which is not entailed by the general predictive processing framework discussed in Clark 2013, for instance).
[3]
In fact, the free-energy principle seems to be partly inspired by cybernetic ideas. Friston (2010, p. 127), for instance, cites Ashby (1947) when explaining the motivation for FEP.
[4]
“[...] we observe under constant own activity, and thereby achieve knowledge of the existence of a lawful relation between our innervations and the presence of different impressions of temporary presentations [Präsentabilien]. All of our willful movements through which we change the appearance of things should be considered an experiment, through which we test whether we have grasped correctly the lawful behavior of the appearance at hand, i.e. its supposed existence in determinate spatial structures.” (My translation)
[5]
It should be noted that Popper rejected interpretations of confirmation (or corroboration) in terms of probabilities (cf. Popper 2005[1934], ch. X), as well as Bayesian interpretations of probability theory (cf. Popper 2005[1934], ch. *XVII). Here, I only suggest that a useful analogy between Popper’s theory of science and the Bayesian brain can be drawn.
[6]
Seth identifies PP and the Bayesian brain (cf. p. 1). I follow suit in this commentary.
[7]
“These considerations suggest proposing not verifiability, but falsifiability as a demarcation criterion; […] An empirical-scientific system must be able to break down in the light of empirical evidence.” (My translation)
[8]
I am grateful to Thomas Metzinger for pointing me to Lakatos’ work on falsificationism.
[9]
“Regarding such auxiliary hypotheses we stipulate that we allow only those hypotheses for which the ‘degree of falsifiability’ of the system is not decreased, but increased; in this case the introduction of auxiliary hypotheses means an improvement: The system prohibits more than before.” (My translation)
[10]
Lakatos (1970) points out that Popper himself never made a sharp distinction between naïve and sophisticated falsificationism, but that he accepted the assumptions underlying sophisticated falsificationism, at least in parts of his work—whereas the person Karl Popper may have been more of a naïve than a sophisticated falsificationist.
[11]
Two possible reasons why the probability of the currently assumed model decreases are offered by the authors: either there is a hyper-prior to the effect that the world changes (which is why a static hypothesis becomes less likely over time), or there are random effects that lead to multistability, such that neural dynamics switch from one basin of attraction to another (cf. Hohwy et al. 2008, p. 692).
[12]

In fact, it seems that the notion of incommensurability has been inspired by Gestalt switches (as in the perception of a Neckar cube), which are very similar to phenomena like binocular rivalry. However, Kuhn explicitly pointed out that there is a crucial difference between a Gestalt switch and a paradigm change: “[…] the scientist does not preserve the gestalt subject’s freedom to switch back and forth between ways of seeing. Nevertheless, the switch of gestalt, particularly because it is today so familiar, is a useful elementary prototype for what occurs in full-scale paradigm shift” (1962, p. 85). I am grateful to Sascha Fink for drawing my attention to this statement.

[13]
It should be noted that Gregory ascribes “far less explanatory power” (1980, p. 196) to perceptions than to scientific hypotheses.
[14]
As I am using the term here, the depth of a model can be measured by its location in the predictive processing hierarchy (that is, whether it is high or low in the hierarchy). Estimates at higher levels track features that change more slowly (i.e., features that remain invariant when things change, for instance, when the subject changes her perspective on a perceptual object like a tomato by walking around the tomato or by turning it—hence the term “perspective-(in)dependence”). A model of a perceived object is deep when it represents features that change relatively slowly. Alternatively, one could stipulate that a model is deep when it represents features that change slowly and features that change more quickly. In fact, this may come closer to what Hohwy has in mind, but it blurs the distinction between perspective-dependence and causal integration. Hohwy writes: “[c]oncurrents are causes that do not interact on their own with other causes (presumably a fence won’t occlude a concurrent)” (2014, p. 128). But encapsulated causes can be represented both at lower parts of the hierarchy (possible example: afterimages) and at higher parts of the hierarchy (possible example: certain conscious thoughts). This suggests that at least causal encapsulation can be dissociated from perspective-dependence and -independence.
[15]
The inverted model is the posterior distribution, the computation of which is based on the likelihood and the prior (see above).
[16]
Another possible term for this would be causally open, in the sense that it is represented as being in potential causal exchange with other objects in its surrounding. By integration, I thus do not mean integration within (or internal integration), but integration with other objects.
[17]
Thanks to Jennifer Windt for suggesting immersive video games as a further example.
[18]
Perspective-invariant representations are maximally perspective-independent.
[19]
In fact, it may be that the corners only constitute hypothetical endpoints. Thanks to Jennifer Windt for pointing this out.
[20]
This may point to an aspect regarding which Hohwy's characterization of causal depth is ambiguous.
[21]
In fact, asomatic OBEs may be a better example than asomatic dream experiences, since such dreams typically lack concrete objects (cf. LaBerge & DeGracia 2000). I am grateful to Jennifer Windt for pointing this out.
[22]
This could be a case in which there is a particularly strong demand for the general ability of PP to combine “fast and frugal solutions” with “more structured, knowledge-intensive strategies” (Clark this collection).
[23]
For more information on the project, see: http://feelspace.cogsci.uni-osnabrueck.de/