2 Minimizing free energy (or average prediction error minimization)

Consider the following very broad, very simple, but ultimately also very far-reaching claim: the brain’s main job is to maintain the organism within a limited set of possible states. This is a fairly trivial claim, since it just reflects that there is a high probability of finding a given organism in some and not other states, combined with the obvious point that the organism’s brain, when in good working order, helps explain this fact. It is the brain’s job to prevent the organism from straying into states where the organism is not expected to be found in the long run. This can be turned around such that, for any given organism, there is a set of states where it is expected to be found, and many states in which it would be surprising to find it. This is surely an entirely uncontroversial observation: we don’t find all creatures with equal probability in all possible states (e.g., in and out of water). Indeed, since an organism’s phenotype results from the expression of its genes together with the influence of the environment, we might define the phenotype in terms of the states we expect it to be found in, on average and over time: different phenotypes will be defined by different sets of states. This way of putting it then defines the brain’s job: it must keep the organism within those expected states. That is, the brain must keep the organism out of states that are surprising given the organism it is—or, in general, the brain must minimize surprise (Friston & Stephan 2007).

Here surprise should not be understood in commonsense terms, in the way that a surprise party, say, is surprising. “Surprise” is technically surprisal or self-information, which is a concept from information theory. It is defined as the negative log probability of a given state, such that the surprise of a state increases the more improbable it is to find the creature in that certain state (in this sense a fish out of water is exposed to a lot of surprise). Surprise is then always relative to a model, or a set of expectations (being out of water is not surprising given a human being’s expectations). States in which an organism is found are described in terms of the causal impact from the environment on the organism (for example, the difference to the fish between being in water and being out of water). This, in turn, can be conceptualized as the organism’s sensory input, in a very broad sense, including not just visual and auditory input but also important aspects of sensation like thermoreception, proprioception, and interoception. Surprising states are then to be understood as surprising sensory input, and the brain’s job is to minimize the surprise in its sensory input—to keep the organism within states in which it will receive the kind of sensory input it expects.

To be able to use this basic idea about the brain’s overall function in an investigation of all the things minds do we need to ask how the brain accomplishes the minimization of surprise. It cannot assess surprise directly from the sensory input because that would require knowing the relevant probability distribution as such. To do this it would need to, impossibly, average over an infinite number of copies of itself in all sorts of possible states in order to figure how much of a surprise a given sensory input might be. This means that to do its job, the brain needs to do something else; in particular it must harbor and finesse a model of itself in the environment, against which it can assess the surprise of its current sensory input. (The model concerns expected sensory states, it is thus a model of the states of the brain, defined by the sensory boundary in both interoceptive and exteroceptive terms, see Hohwy 2014.)

Assume then that the brain has a model—an informed guess—about what its expected states are, and then uses that model to generate hypotheses that predict what the next sensory input should be (this makes it a generative model). Now the brain has access to two quantities, which it can compare: on the one hand the predicted sensory input, and on the other the actual sensory input. If these match, then the model is a good one (modulo statistical optimization). Any difference between them can be conceived as prediction error, because it means that the predictions were erroneous in some way. For example, if a certain frequency in the auditory input is predicted, then any difference from what the actual auditory input turns out to be is that prediction’s error.

The occurrence of prediction error means the model is not a good fit to the sensory samples after all, and so, to improve the fit, the overall prediction error should be minimized. In the course of minimizing prediction error, the brain averages out uncertainty about its model, and hence implicitly approximates the surprise. It is guaranteed to do this by minimizing the divergence between the selected hypothesis and the posterior probability of the hypothesis given the evidence and model. The guarantee stems from the facts that this is a Kullback-Leibler divergence (KL-divergence) which is always zero (when there is no divergence) or positive (when there is prediction error), and which therefore creates an upper bound on the surprise—minimizing this bound will therefore approximate surprise.

The key notion here is that the brain acts to maintain itself within its expected states, which are estimated in prediction error minimization. This is known as the free energy principle, where free energy can be understood as the sum of prediction error (this and the following is based on key papers, such as Friston & Stephan 2007, Friston 2010, as well as introductions in Clark 2013 and Hohwy 2013). Prediction error minimization itself instantiates probabilistic, Bayesian inference because it entails that the selected hypothesis becomes the true posterior, given evidence and model. On this view, the brain is a model of the world (including itself) and this model can be considered the agent, since it acts to maintain itself in certain states in the world.