2 Active inference and hypothesis testing

A central claim in my target paper is that active inference, typically considered as the resolution of sensory prediction errors through action, should also (perhaps primarily) be considered as furnishing disruptive and/or disambiguatory evidence for perceptual hypotheses. This claim transparently calls on analogies with hypothesis testing in science (as well as on counterfactually-equipped generative models), and so invites comparisons with theoretical frameworks for scientific discovery, as Wiese nicely develops. In particular, Wiese notes that I do not “say much about what it takes to disconfirm or falsify a given hypothesis or model”, inviting me to “provide a refined treatment of the relation between falsification and active inference” (this collection, p. 2). This is what I shall attempt in this first section.

2.1 The abductive brain

Wiese rightly says that a strict Popperian analogy for active inference is inappropriate since Popperian falsification relies on hypotheses that are derived deductively. Deductive inferences are necessary inferences, meaning that their falsification in turn falsifies the premises (theories) from which they derive. Active inference in the Bayesian brain is not deductive for two important reasons. First, as Wiese notes, Bayesian inference is inherently probabilistic so that competing hypotheses become more or less likely, rather than corroborated or falsified. Probabilistic weighting of hypotheses suggests a process of induction rather than deduction. Inductive inferences are non-necessary (i.e., they are not inevitable consequences of their premises) and are assessed by observation of outcome statistics, by analogy with classical statistical inference. Second, Bayesian reasoning pays attention not just to outcome frequencies but to properties of the explanation (hypothesis) itself, as captured by the slogan that (Bayesian) perception is the brain’s “best guess” of the causes of its sensory inputs. This indicates that the Bayesian brain is neither deductive nor inductive but abductive (Hohwy 2014), where abduction is typically understood as “inference to the best explanation”. In Bayesian inference, what makes a “best” explanation rests not only on outcome frequencies, but also on quantification of model complexity (models with fewer parameters are preferred), and by priors, likelihoods, as well as hyper-priors which may make some prior-likelihood combinations more preferable than others. Importantly, abductive (and inductive) processes are ampliative, meaning that they are capable of going beyond that which is logically entailed by their premises. This is important for the Bayesian brain, because the fecundity and complexity of the world (and body) requires a flexible and open-ended means of adaptive response.

So, the Bayesian brain is an abductive brain. But I would like to go further, recalling that active inference enables predictive control in addition to perception. This emphasis is particularly clear in the parallels with cybernetics and applications to interoception developed in the target article, where allostasic[1] control of ‘essential variables’ is paramount, and where predictive models are recruited towards this goal Conant & Ashby 1970; Seth 2013). In this light, active inference in the cybernetic Bayesian brain becomes a process of “inference to the best prediction”, where the “best” predictions are those which enable control and homeostasis under a broad repertoire of perturbations.[2] It will be interesting to fully develop criteria for “best-making” in this control-oriented form of abductive inference.

2.2 Sophisticated falsificationism, active inference, and model disambiguation

Where does this leave us with respect to theories of scientific discovery? Strict Popperian falsification was already discounted as an analogy for active inference. At the other extreme, parallels with Kuhnian paradigm shifts also seem inappropriate since these are not based on inference whether deductive, inductive, or abductive. Also, such shifts are typically unidirectional: having dispensed with the Copernican world-view once, we are unlikely to return to it in the future. These two points challenge Wiese’s analogy between paradigm shifts and perceptual transitions in bistable perception (see Wiese’s footnote 12, this collection, p. 9). What best survives in this analogy is an appeal to hierarchical inference, where changes in “paradigm” correspond to alternations between hierarchically deep predictions, each of which recruit more fine-grained predictions which themselves each explain only part of the ongoing sensorimotor flux, under the hyper-prior that perceptual scenes must be self-consistent (Hohwy et al. 2008).

Wiese himself seems to favour Lakatos’ interpretation of Popper, a “sophisticated falsificationism” where theories (perceptual hypotheses) can be modified rather than rejected outright, when predictions are not confirmed, and where hypotheses are not tested in isolation (more on this later). As Wiese shows, sophisticated falsification fits well with some aspects of Bayesian inference, like model updating. According to Lakatos, core theoretical commitments can be protected from immediate falsification by introducing “auxiliary hypotheses” which account for otherwise incompatible data (1970). The key criterion - in the philosophy of science sense - is that these auxiliary hypotheses are progressive in virtue of making additional testable predictions, as opposed to degenerate, which is when the core commitments become less testable.[3] This maps neatly to counterfactually-equipped active inference, where hierarchically deep predictive models spawn testable counterfactual sensorimotor predictions which are selected on the basis of precision expectations, and which lead to effective updating (rather than “falsification”) of perceptual hypotheses. As Wiese notes, a good example of this is given by Friston and colleagues’ model of saccadic eye movements (Friston et al. 2012). When it comes to model comparison, sophisticated falsification may even approximate some aspects of abductive inference: “Explaining away is another example of sophisticated falsification. Even when two or more models are compatible with the evidence … there can be reason to prefer one of them and reject the other” (Wiese this collection, p. 7). This strongly recalls Bayesian model comparison and “inference to the best explanation”, if not its control-oriented “inference to the best prediction” form.

One important clarification is needed about Wiese’s interpretation of model comparison, highlighting the critical roles of action and counterfactual processing. Wiese rightly emphasizes the important insight of Popper and Lakatos that hypotheses are never tested in isolation, mandating a process of comparison among competing models or hypotheses. However, he implies a sequential testing of each hypothesis: “balloons being launched and then shot done, one by one” (see Wiese this collection, p. 6). This is quite different from the interpretation of model comparison pursued in my target article, where multiple models are considered in parallel, and where counterfactual predictions are leveraged to select the action (or experiment) most likely to disambiguate competing models. In Bayesian terms this is reflected in a shift towards model comparison and averaging (FitzGerald et al. 2014; Rosa et al. 2012), as compared to inference and learning on a single model. Bongard and colleagues’ evolutionary robotics example was selected precisely because it illustrates this point so well (Bongard et al. 2006). Here, repeated cycles of model selection and refinement lead to the prescription of novel actions that best disambiguate the current best models (note the plural). Indeed, it is the repeated refinement of disambiguatory actions that gives Bongard’s starfish robot its compelling “motor babbling” appearance. To re-iterate: different actions may be specified when the objective is to disambiguate multiple models in parallel, as compared to testing models one-at-a-time. In the setting of the cybernetic Bayesian brain this example is important for two reasons: it underlines the importance of counterfactual processing (to drive the selection of disambiguatory actions) and it emphasizes that predictive modelling can be seen as a means of control in addition to discovery, explanation, or representation. In this sense it doesn’t matter how accurate the starfish self model is – what matters is whether it works.

2.3 Science as control or science as discovery?

The distinction between explanation and control returns us to the philosophy of science. Put simply, the views of Popper, Lakatos, and (less so) Kuhn, are concerned with how science reveals truths about the world, and how falsification of testable predictions participates in this process. Picking up the threads of abduction, control-oriented active inference, and “inference to the best prediction”, we encounter the possibility that theories of scientific discovery might themselves appear differently when considered from the perspective of control. Historically, it is easy to see the narrative of science as a struggle to gain increasing control over the environment (and over people), rather than a process guided by the lights of increasing knowledge and understanding.[4] A proper exploration of this territory moves well beyond the present scope (see e.g., Glazebrook 2013). In any case, whether or not this perspective helps elucidate scientific practice, it certainly suggests important limits in how far analogies can be taken between philosophies of scientific discovery and the cybernetic Bayesian brain.