2 Towards a more complete approach to enculturation: Cognitive integration and predictive processing

In order to appreciate the descriptive power of the enculturated approach, it is necessary to specify the mechanistic underpinnings of the acquisition of cognitive practices. In his summary of the CI framework, Menary (this collection, p. 2) argues that “[a]lthough the framework is unified by a dynamical systems description of the evolution of processing in the hybrid and multi-layered system, it recognises the novel contributions of the distinct processing profiles of the brain, body and environment.” However, the dynamical systems style approach to the acquisition and enactment of cognitive practices in the version first introduced in Menary (2007a, pp. 42-48) does not exhaustively specify the distinct, yet highly interactive neuronal and bodily components of cognitive processing. Furthermore, it does not account for LDP, simply because it remains neutral to the concrete realization of its neuronal component system. Finally, the dynamical systems approach, on Menary’s construal, helps illustrate what the interactive contribution of neuronal and extracranial bodily components to human cognition might amount to. Yet, it does not spell out the mutual influence that neuronal and extracranial bodily components have over each other.

This is where predictive processing (PP) enters the picture. In the remainder of this commentary I will argue that the PP approach provides the resources for a more detailed account of how human cognitive systems become enculturated and how they are subject to integrated cognition.

2.1 Cognitive integration: Five theses about human cognition

In its original version (cf. Menary 2007a), CI is constituted by five theses. They emphasize the different aspects that are crucial for an integrationist approach to cognitive processing: 1. Human cognition is continuous with animal cognition on both diachronic and synchronic scales. However, it has a special status in that it is situated in a particular cognitive niche and heavily rests upon neural plasticity which is itself an adaptation (continuity thesis). 2. Certain cognitive processes are hybrid because they are constituted by neuronal and extracranial bodily components (hybrid mind thesis). 3. In the course of ontogenetic hybrid cognitive processing, both the constitutive neuronal and extracranial bodily functions are transformed (transformation thesis). 4. The bodily manipulation of specific environmental resources plays a crucial functional role in integrated cognitive processes (manipulation thesis). 5. These manipulations are constrained by cognitive norms, which are acquired through learning, and which realize socio-culturally developed habits for the interaction with cognitive resources (cognitive norms thesis).

In addition to the continuity thesis and the cognitive transformation thesis, which are given centre stage in Menary’s target paper, the hybrid mind thesis is important in that it acknowledges the close interaction of neuronal and extra-neuronal bodily sub-processes in the completion of cognitive tasks. In other words, certain cognitive processes “involve the integration of neural manipulations of vehicles and bodily manipulations of environmental vehicles” (Menary 2010, p. 236; see also Menary 2007b, p. 627). The notion of bodily manipulation as it is used here goes back to Mark Rowlands’ (1999, pp. 23f) account of environmentalism, which claims that “cognitive processes are, in part, made up of manipulation of relevant structures in the cognizer’s environment”. In this context, manipulation is defined as “any form of bodily interaction with the environment – manual or not, intrusive or otherwise – which makes use of the environment in order to accomplish a given task” (ibid., p. 23). Thus, subscribing to the manipulation thesis amounts to the assumption that “[c]ognitive processing often involves these online bodily manipulations of the cognitive niche, sometimes as individuals and sometimes in collaboration with others” (Menary this collection, p. 3). Importantly, it is assumed that extracranial bodily manipulations causally interact with neural sub-processes, thereby stressing the hybridity of cognitive processes (cf. Menary 2007a, p. 138). In addition to highlighting the constitutive role of embodied engagements with “external” cognitive resources as proposed by Rowlands (1999), cognitive integrationists claim that the manipulation of these resources is constrained by cognitive norms. In this vein, Menary (2007a, p. 5; 2010, p. 233) argues that “[o]ur abilities to manipulate the extrabodily environment are normative and are largely dependent on our learning and training histories.” The idea that certain cognitive abilities are normatively structured thus concerns the individual’s interaction with specific resources provided by the cognitive niche. Importantly, the normatively constrained ways in which environmental resources are integrated into cognitive processes are shared by many individuals. Put differently, the normativity of cognitive practices helps “[…] stabilise and govern interactive thought across a population of similar phenotypes” (Menary this collection, p. 4). Furthermore, the acquisition of a certain cognitive practice is tightly connected with the acquisition of the relevant cognitive norms in the course of scaffolded learning. This is because “we learn cognitive practices by learning the cognitive norms that govern the manipulation of vehicles” (Menary 2007b, p. 628).

From these five theses defended by CI it follows that there should be two distinct, yet interdependent levels of description for cognitive practices. First, there is the social level of description. On this level, cognitive practices need to be approached by highlighting the interactive, cooperative cognitive achievements of a large group of individuals sharing the same cognitive niche. Second, cognitive practices can be investigated by approaching them on an individual level of description. In this case, the acquisition and enactment of a certain cognitive practice is described with regards to a certain individual. However, any individual level description needs to acknowledge that certain cognitive capacities of an enculturated individual are rendered possible only by the individual’s ongoing interaction with its socio-culturally shaped environment in normatively constrained ways. This means to do justice to the broader socio-cultural context of enculturated cognition, while being interested in a precise description of its neuronal and extracranial bodily sub-components. In this commentary I will operate on the individual level of description without denying that it is important to develop a fine-grained description on the social level by specifying the properties of a certain cognitive niche and the conditions under which it could have emerged.

To this end, I will now proceed by summarizing the most important features of the predictive processing (PP) approach that will help specify the mechanistic underpinnings of enculturated cognition.

2.2 An outline of predictive processing

Recently, the idea that human perception, action, and cognition can be described and explained in terms of hierarchically organized predictive processing mechanisms implemented in the human brain has enjoyed widespread attention within cognitive neuroscience (e.g., Friston 2005, 2010; Friston et al. 2012), philosophy of mind, and philosophy of cognitive science (e.g., Clark 2012, 2013, this collection; Hohwy 2011, 2012, 2013, 2014, this collection; Seth this collection). The overall epistemic goal of this emerging approach is to describe perceptual, sensorimotor, and cognitive target phenomena within a single framework by relying on unifying mechanistic principles. Accounts of PP generally assume that human perception, action, and cognition are realized by Bayesian probabilistic generative models implemented in the human brain. Since the human brain does not have immediate access to the environmental causes of sensory effects, it has to infer the most probable state of affairs in the environment giving rise to sensory data (cf. Seth this collection, pp. 4f). PP approaches solve this inverse problem by assuming that generative models in accordance with Bayes’ rule are implemented in the human brain. On this construal, a generative model “[…] aims to capture the statistical structure of some set of observed inputs by tracking […] the causal matrix responsible for that very structure” (Clark 2013, p. 182). In order to be able to infer the causes of sensory effects, generative models encode probability distributions. Each generative model provides several hypotheses about the causes of a certain sensory input. The system has somehow to ‘decide’ which hypothesis needs to be chosen in order to account for the cause of the sensory effect. The descriptive power of Bayes’ rule lies in its capacity to capture the probabilistic estimations underlying these choices. Applied to the case of human perception, action, and cognition, Bayesian generative models are assumed to be realized in hierarchically organized structures comprising multiple, highly interactive low- and high-level cortical areas. This is referred to as the Bayesian brain hypothesis (cf. Friston 2010, p. 129). The hierarchical organization of probabilistic generative models is combined with a specific version of predictive coding, where predictive coding “depicts the top-down flow as attempting to predict and fully ‘explain away’ the driving sensory signal, leaving only any residual ‘prediction errors’ to propagate forward within the system” (Clark 2013, p. 182). That is to say, selected hypotheses inform prior predictions about the sensory input to be expected at each level of the hierarchy. These predictions fulfil the function of encoding knowledge about statistical regularities of patterns in the observable (or any imaginable) world. This hypothesis selection proceeds in accordance with Bayes’ rule. The processing of sensory input gives rise to prediction errors. Prediction errors carry neuronally realized information about “[…] residual differences, at every level and stage of processing, between the actual current signal and the predicted one” (Clark this collection, p. 4). Importantly, it is only prediction errors, and not sensory input per se, that are fed forward within the hierarchy (cf. Clark 2013, pp. 182f; Hohwy 2012, p. 3, 2013, p. 47, 2014, p. 4). The overall aim of this multi-level processing mechanism is to minimize prediction error, that is, to reduce or to ‘explain away’ the discrepancy between predictions and the actually given sensory input that is an effect of environmental (or bodily) causes (cf. Clark 2013, p. 187; Hohwy 2011, p. 269, 2013, p. 88). This is known as prediction error minimization.[2]

Prediction error minimization is a special way of minimizing free energy in accordance with the principle “that any self-organizing system that is at equilibrium with its environment must minimize its free energy” (Friston 2010, p. 127). Applied to human perception, cognition, and action, minimizing free energy means minimizing the amount of unbound energy available to the perceiving, cognizing, and acting organism. This is where prediction error enters the picture. As Andy Clark (2013, p. 186) puts it, “[p]rediction error reports this information-theoretic free energy, which is mathematically constructed so as always to be greater than ‘surprisal’ (where this names the sub-personally computed implausibility of some sensory state given a model of the world […]).” The relationship between free energy and surprisal then is that “[…] free energy is an upper bound on surprise, which means that if agents minimize free energy, they implicitly minimize surprise” (Friston 2010, p. 128). Suprisal, however, cannot be estimated directly by the system, because “there is an infinite number of ways in which the organism could seek to minimize surprise and it would be impossibly expensive to try them out” (Hohwy 2012, p. 3). The solution to this problem lies in implicitly minimizing surprisal (and its upper bound, i.e., free energy) by minimizing prediction error (cf. Hohwy 2013, p. 85, this collection, 3; see also Seth this collection, p. 6). It is exactly here where prediction error minimization avails itself as a tractable expression of more general life-sustaining mechanisms.

Prediction error minimization can be achieved in two distinct, yet complementary ways. The first of these is perceptual inference, which can be described as

[…] an iterative step-wise procedure where a hypothesis is chosen, and predictions are made, and then the hypothesis is revised in light of the prediction error, before new and hopefully better predictions are made on the basis of the revised hypothesis. (Hohwy 2013, p. 45)

That is, prediction errors are propagated up the hierarchy leading to an adjustment of the initial hypothesis, thereby achieving an approximation of the hypothesis generating the predictions and the actually given input. The adjustment of predictions and hypotheses in the face of fed-forward prediction error occurs at every level of the hierarchy until any prediction error is accommodated. This complex process comprising multiple levels is known as perception: “Perception thus involves ‘explaining away’ the driving (incoming) sensory signal by matching it with a cascade of predictions pitched at a variety of spatial and temporal scales” (Clark 2013, p. 187; see also Clark 2012, p. 762).

On Andy Clark’s account of PP, one important consequence of this is that the traditional distinction between perception and cognition becomes blurred. It is replaced by a reconceptualization of perceptual and cognitive processes as a continuous employment of the same prediction error minimizing mechanism on multiple scales:

All this makes the lines between perception and cognition fuzzy, perhaps even vanishing. In place of any real distinction between perception and belief we now get variable differences in the mixture of top-down and bottom-up influence, and differences of temporal and spatial scale in the internal models that are making predictions. Top-level (more ‘cognitive’) models intuitively correspond to increasingly abstract conceptions of the world, and these tend to capture or depend upon regularities at larger temporal and spatial scales. Lower-level (more ‘perceptual’) ones capture or depend upon the kinds of scale and detail most strongly associated with specific kinds of perceptual contact. (Clark 2013, p. 190)

Consequently, processes typically associated with perception or cognition can only be distinguished by considering the temporal and spatial resolution of the instantiation of PP mechanisms and the levels at which model revision ensues, respectively. This relationship between perception and cognition becomes important once we consider how enculturated cognition has been rendered possible on both phylogenetic and ontogenetic time scales. For it helps specify how evolutionary continuity could have been rendered possible in the first place. The evolutionary development of perception and cognition (and, as we shall see, of action too) may have proceeded from more perceptual generative models present in many other animals to more cognitive generative models exclusively realized in humans. This is in line with Roepstorff’s (2013, p. 45) observation that “[t]he underlying neural models are basically species-unspecific, and the empirical cases move back and forth between many different model systems.” Referring to this observation, Clark (this collection, p. 14) emphasizes that “[t]he basic elements of the predictive processing story, as Roepstorff (2013, p. 45) correctly notes, may be found in many types of organism and model-system.” Thus, while certain (lower-level) model parameters and processing stages of prediction error minimization are shared by many organisms, there certainly are specific (higher-level) processing routines that are shared only by enculturated human organisms in a certain cognitive niche.

Furthermore, the idea that perception and cognition are continuous is relevant for considerations of the ontogenetic development of enculturated cognitive functions. This is because it anchors higher-order cognitive operations in more basic perceptual processes and thus allows for a fine-grained description of a certain developmental trajectory leading to cognitive transformation. Bearing in mind the hierarchical structure of generative models, another interesting consequence of the PP style approach to perception and cognition is that lower (i.e., more perceptual) levels of the generative model influence higher (i.e., more cognitive) levels by means of fed-forward prediction error. Vice versa, higher levels of the hierarchical generative model influence lower levels by means of fed-backward predictions (cf. Hohwy 2013, p. 73). This will become more important when we explore how reading acquisition can be described as an ongoing enculturating process of prediction error minimization.

Perceptual inference is only one way of minimizing prediction error. The second is active inference, where “[…] the agent will selectively sample the sensory input it expects” (Friston 2010, p. 129). The idea is that the system can minimize prediction error by bringing about the states of affairs (i.e., the environmental hidden causes) that are predicted by a certain hypothesis. This is achieved by performing any type of bodily movements, including eye movements, that make the selected prediction come true. The predictions at play in active inference are counterfactual, because

[…] they say how sensory input would change if the system were to act in a certain way. Given that things are not actually that way, prediction error is induced, which can be minimized by acting in the prescribed way. (Hohwy 2013, p. 82; italics in original; see also Clark this collection, p. 6; Friston et al. 2012, p. 2)

Accordingly, in active inference the selected prediction is held constant and leads to bodily activities that minimize prediction error by altering the sensory input such that it confirms the prediction. Therefore, active inference is of crucial importance for prediction error minimization, “[…] since it provides the only way (once a good world model is in place and aptly activated) to actually alter the sensory signal so as to reduce sensory prediction error” (Clark 2013, p. 202).

This suggests that perceptual and active inference, or perception and bodily action for that matter, mutually influence each other, thereby minimizing prediction errors and optimizing hypotheses generating ever new predictions. However, perceptual and active inference have a “different direction of fit” (Hohwy 2013, p. 178; see also Hohwy this collection, p. 13; Clark this collection, p. 7).[3] This is because in perceptual inference, predictions are aligned to the sensory input, while active inference is a matter of aligning the sensory input to the predictions. It follows “[…] that to optimally engage in prediction error minimization, we need to engage in perceptual inference and active inference in a complementary manner” (Hohwy 2013, p. 91). Since both perceptual and active inference are aimed at minimizing prediction error and optimizing generative models, “[p]erception and action […] emerge as two sides of a single computational coin” (Clark 2012, p. 760).

As emphasized earlier, perception and cognition are deeply related to the extent that both phenomena are the result of the same underlying functional and neuronal mechanisms. By extension, action is also deeply intertwined with cognition. This follows from the assumptions that 1. perception and cognition are continuous and 2. perception and action are subject to the same principles of prediction error minimization. As Seth (this collection, p. 5) puts it, both ways of prediction error minimization “[…] unfold continuously and simultaneously, underlining a deep continuity between perception and action […].” Yet, perceptual and active inference fulfil distinct functional roles in their ongoing attempt to minimize prediction error. This becomes even more obvious once we take the free energy principle into account: “The free energy principle […] does not posit any fundamental difference between perception and action. Both fall out of different reorganizations of the principle and come about mainly as different directions of fit for prediction error minimization […]” (Hohwy this collection, p. 13). Active inference plays a crucial role in cognition (understood as prediction error minimization comprising many higher-level predictions), for it helps minimize prediction error throughout the cortical hierarchy by bringing about the states of affairs in the environment that are predicted on higher levels. Therefore, on Clark’s (2013, p. 187) account, which he dubs action-oriented predictive processing, prediction error minimization “[…] depicts perception, cognition and action as profoundly unified and, in important respects, continuous.

PP accounts of human perception, action, and cognition distinguish between first-order and second-order statistics. In contrast to first-order statistics, which amount to minimizing prediction error by means of perceptual and active inference, second-order statistics are concerned with estimating the precision of prediction error. In second-order statistics, the influence of fed-forward prediction error on higher levels of the hierarchical generative model is dependent upon its estimated precision. Neuronally, the estimation of precision is captured in terms of increasing or decreasing the synaptic gain of specific error units (cf. Feldman & Friston 2010, p. 2). That is, “[t]he more precision that is expected the more the gain on the prediction error in question, and the more it gets to influence hypothesis revision” (Hohwy 2013, p. 66; see also Friston 2010, p. 132). Conversely, if the precision is expected to be poor on the basis of second-order statistics, the synaptic gain on the error unit is inhibited such that the prediction on the supraordinate level is strengthened (cf. ibid., p. 123). It has been proposed that precision estimation is equivalent to attention. This means that “attention is nothing but optimization of precision expectations in hierarchical predictive coding” (Hohwy 2013, p. 70; see also Feldman & Friston 2010, p. 2). For current purposes, it is sufficient to focus in the main on first-order statistics. However, it is important to bear in mind the crucial modulatory role precision estimation plays in prediction error minimization.

2.3 Combining cognitive integration and predictive processing

To what extent is it feasible to describe the mechanisms underlying cognitively integrated processes and enculturated cognition in terms of prediction error minimization? After having summarized CI and the core ideas of the PP framework I will argue in this section that there are many aspects of the CI approach that can be enriched by making a crucial assumption, namely that PP can account for many components constituting cognitive practices on at least functional and neuronal levels of description.

First, a major conceptual consequence of PP is that perception, action, and cognition are both continuous and unified, if this approach proves correct. This is because they follow the same principles of prediction error minimization, yet are characterized by important functional differences. This kind of complementarity fits neatly with the hybrid mind thesis defended by CI. Recall that the hybrid mind thesis claims that cognitive processes are constituted by both neuronal and extracranial bodily components. By taking prediction error minimization into account, this claim can be cashed out by assuming that the neuronal components are equal to perceptual inferences at multiple levels of the cortical hierarchy, while the bodily components are mechanistically realized by active inferences. The hybrid mind thesis emphasizes the indispensable, close and flexible coordination of neuronal and bodily components responsible for the completion of a cognitive task. The PP framework, or so I shall argue, provides the resources for a careful description of the underlying mechanisms at play. It does so by depicting human organisms as being constantly engaged in prediction error minimization by optimizing hypotheses in the course of perceptual inference and by changing the stimulus array in the course of active inference.

A second advantage of the prediction error minimization framework is that it helps cash out the manipulation thesis. This thesis, recall, states that “the manipulation of external vehicles [is] a prerequisite for higher cognition and embodied engagement [is] a precondition for these manipulative abilities” (Menary 2010, p. 232). In terms of the PP framework, bodily manipulation can be understood as an instance of active inference occurring in specific contexts. That is, in order to complete a certain cognitive task, the system changes its sensory input by altering certain components of its cognitive niche. This becomes even more obvious once we take into account that embodied activity is also a means of increasing confidence in sensory input by optimizing its precision. As suggested by Hohwy (this collection, p. 6), “expected precision drives action such that sensory sampling is guided by hypotheses that the system expects will generate precise prediction error.” Applied to an organism’s interaction with its socio-culturally shaped environment, Hohwy (2013, p. 238) argues “[…] that many of the ways we interact with the world in technical and cultural aspects can be characterized by attempts to make the link between the sensory input and the causes more precise (or less uncertain).” However, bodily manipulation is more than just a contributing factor to prediction error minimization (and precision optimization). In order to acknowledge this, we need to take into account that bodily manipulations are a crucial component of the performance of cognitive practices. In the performance of a cognitive practice, the minimization of prediction error and the optimization of precision is not an end in itself. Rather, it serves to facilitate the completion of a certain cognitive task. Furthermore, the concrete bodily manipulations given in terms of active inference are subject to cognitive norms that constrain the ways in which human organisms interact with cultural resources, such as tokens of a representational writing system. That is to say that the performance of a cognitive practice is not an individualistic enterprise. Rather, in completing a cognitive task, the individual is deeply immersed into a socio-cultural context which is shared by many human organisms.

Third, it is the normative constraints on cognitive practices that render their performance efficient and, in many cases at least, successful. This is because compliance with these norms induces what Andy Clark (2013, p. 195) calls “path-based idiosyncrasies”. That is, one of the reasons why the coordination of neuronal and bodily components in the manipulation of cultural resources is beneficial certainly is that it takes place in a normatively constrained “multi-generational development of stacked, complex ‘designer environments’ for thinking such as mathematics, reading, writing, structured discussion, and schooling” (ibid.). That is to say that the performance of cognitive practices in compliance with certain norms has the overall advantage of reducing cognitive effort, which can be captured as the minimization of overall prediction error and the optimization of precision on a sub-personal level of description. At the same time, however, cognitive practices themselves can be described, or so I shall argue, as having prediction error minimization as their underlying mechanism. This double role of cognitive practices, described in terms of prediction error minimization, can be fully appreciated once we consider the cognitive transformations brought about by the ongoing interaction with cultural resources.

Fourth, our cognitive capacities and the various ways we complete cognitive tasks are profoundly augmented by our neuronal and bodily engagements with the socio-culturally structured environment through ontogenesis (cf. Menary 2006, p. 341). Put differently, “cognitive transformations occur when the development of the cognitive capacities of an individual are sculpted by the cultural and social niche of that individual” (Menary this collection, p. 8). This niche includes mathematical symbol systems, representational writing systems, artifacts, and so forth. It is this immersion and, importantly, the scaffolding provided by other inhabitants of the cognitive niche that ideally lead to the transformation of neuronal and extracranial bodily components constituting cognitive processes, to enculturation that is. The PP framework, or so I shall argue, offers a highly promising account of learning that is most suitable for a sub-personal level description of cognitive transformation. On the construal of PP, learning flows naturally from the mechanism of prediction error minimization. For learning can generally be construed as a sub-personally realized strategy of optimizing models and hypotheses in the face of ever new prediction error: “Learning is then viewed as the continual updating of internal model parameters on the basis of degree of predictive success: models are updated until they can predict enough of the signal” (Hohwy 2011, p. 268). Broadly understood, ‘learning’ thus figures as an umbrella term referring to the ongoing activity of prediction error minimization and model optimization throughout the lifetime of a human organism. This is because potentially ever new and “surprisaling” sensory signals need to be “explained away” by perceptual and active inference. For current purposes, however, “learning” can also be understood in a rather narrow sense as the acquisition of a certain skill, which is also subject to prediction error minimization through perception, action, cognition, and the modulation of attention. It is the individual’s socio-culturally structured environment that delivers new sensory signals helping optimize parameters of the generative model:

But those training signals are now delivered as part of a complex developmental web that gradually comes to include all the complex regularities embodied in the web of statistical relations among the symbols and other forms of socio-cultural scaffolding in which we are immersed. We thus self-construct a kind of rolling ‘cognitive niche’ able to induce the acquisition of generative models whose reach and depth far exceeds their apparent base in simple forms of sensory contact with the world. (Clark 2013, p. 195)

However, complex skills that are targeted at the completion of cognitive tasks cannot be learned simply by being exposed to the right kind of “training signal” in the cognitive niche. What is additionally needed is engagement in activities that are scaffolded by inhabitants of that cognitive niche who have already achieved a sufficient degree of expertise. This is what Menary (this collection) calls “scaffolded learning”. From the perspective of PP, this amounts to the strategy of exposing predictive systems to highly structured, systematically ordered patterns of sensory input in the cognitive niche. This, however, needs to be complemented by a fine-grained personal-level description of the kind of interactions between experts and novices that is needed in order to pass on the right set of cognitive norms. Furthermore, the kind of cognitive transformation at play here requires a description of the neuronal changes that are correlated with the acquisition of a certain cognitive practice. That is, we need a more fine-grained account of LDP and how it might be realized in the human cortex. From the perspective of the PP framework, one plausible conjecture at this point is that LDP can be captured in terms of effective connectivity. Effective connectivity reports the causal interaction of neuronal assemblies across multiple levels of the cortical hierarchy (and across different brain areas) as a result of attention in terms of precision estimation. This line of reasoning is implied by Clark (2013, p. 190) who argues that “[a]ttention […] is simply one means by which certain error-unit responses are given increased weight, hence becoming more apt to drive learning and plasticity, and to engage in compensatory action.” This last point is important, since it stresses that it is not only perceptual inference that drives learning and contributes to the improvement of generative models, but also active inference. However, this approach to the acquisition of action patterns in concert with an optimization of precision might raise the worry that learning is depicted here as being a rather internalistic, brain-bound affair. But once we acknowledge that it is the performance and ongoing improvement of embodied active inferences that play an indispensable functional role in the completion of cognitive tasks, it becomes obvious that this worry is not warranted. For it is the efficient interaction of neuronal and extracranial bodily components (i.e., perceptual and active inferences in terms of PP) that results from learning and the efficient engagement of human organisms with their environment. Furthermore, LDP can now be considered in terms of the precision-weighted optimization of hypotheses throughout the cortical hierarchy and the ever new patterns of effective connectivity, as new cognitive practices are acquired and successfully performed. The sub-personal description of cognitive transformation in terms of prediction error minimization also does justice to neuronal reuse as a guiding principle of the allocation of neuronal resources for phylogenetically recent cognitive functions such as arithmetic or reading.

From this, the following question arises: What is the actual relationship between CI and PP supposed to be and what is the scope of this theory synthesis? First of all, the position developed in this commentary is neutral with regards to metaphysical consequences that may or may not result from the idea that CI and PP can be integrated into a unified theoretical framework. Rather, this position has an instrumentalist flavour to the extent that it tries to answer the question by which means socio-culturally shaped target phenomena can be best investigated both conceptually and empirically. Thus, the combination of CI and PP is valid only to the extent that it displays great descriptive as well as predictive power and is supported by many results stemming from empirical research. As such, the new approach on offer here is contingent upon the current state of research in cognitive science. It is falsifiable by new empirical evidence or convincing conceptual considerations that directly speak against it. Furthermore, it sidesteps the concern that PP and the underlying free energy principle might be trivial because they can be applied to any target phenomenon by telling a “just-so story”. This is because the combination of CI and PP is applied to specific domains, namely to classes of cognitive processes that count as cognitive practices, with reading being the paradigm example.[4] Thus the approach advocated can be seen as a modest contribution to the project aiming at a “[…] translation into more precise, constricted applications to various domains, where predictions can be quantified and just-so stories avoided” (Hohwy this collection, p. 14).

The idea that CI and PP can be combined can lead to different degrees of commitment.[5] First, I do not assume that CI necessarily requires PP. Hypothetically, it is conceivable that another theory of neuronal and bodily functioning might be more suited to cashing out cognitive practices and enculturation more convincingly and more extensively. To date, PP appears to be the best unifying framework that helps specify exhaustively the functional and neuronal contributions of bodily and neuronal sub-processes giving rise to cognitive practices and enculturation. This is because PP offers a fine-grained functional and neuronal description of perception, action, cognition, attention, and learning that does justice to the complex interactions stipulated by CI and the associated approach to enculturation.

Second, it could be assumed that CI and PP are merely compatible. This would mean that CI and PP were self-sufficient and co-existent theoretical frameworks whose claims and key assumptions do not necessarily contradict each other. This compatibility assumption is too weak for various reasons thar have been presented in this commentary so far. For it is the purpose of the theory synthesis sketched here to enrich and refine the notion of enculturation and the associated theses defended by CI. Furthermore, to the extent that PP directly speaks to complex cognitive phenomena and learning, it benefits from the effort of CI to do justice to the socio-culturally shaped context in which these phenomena can be developed. This is to say that CI and PP can be directly referred to each other in ways that I have started to illustrate in this section.

Finally, from this it follows that both frameworks are more than just compatible – they are complementary. Taken together, they provide us with complex and far-reaching conceptual tools for investigating complex cognitive phenomena that are shaped by the individual’s immersion in its cognitive niche. Thus, the complementarity of CI and PP leads to a new integrative framework that I dub enculturated predictive processing (EPP).

2.4 Defending enculturated predictive processing

At first glance, the EPP framework might appear to be unwarranted. For prediction error minimization could be construed as being a purely internalistic, brain-bound affair that does not leave any room for the idea that cognitive processes are constituted both by neuronal and extracranial bodily components that are normatively constrained, socially scaffolded, and deeply anchored in a socio-culturally structured environment.

First, consider a position that takes for granted that cognitive processes can be coherently described in terms of prediction error minimization, but which denies that cognitive processes are co-constituted by neuronal and bodily sub-processes operating on socio-cultural resources. Such a position is defended by Jakob Hohwy (2013, p. 240) who argues that “[…] many cases of situated and extended cognition begin to make sense as merely cases of the brain attempting to optimize its sensory input so it, as positioned over against the world, can better minimize error.” In particular, according to his interpretation of the prediction error minimization framework, “[…] the mind remains secluded from the hidden causes of the world, even though we are ingenious in using culture and technology to allow us to bring these causes into sharper focus and thus facilitate how we infer to them.” (ibid., p. 239)

For Hohwy, this directly follows from the causal relations holding between the predictive system and the environmental causes it constantly tries to infer. According to him (ibid., p. 228), this relation needs to be characterized as “direct” and “indirect” at the same time:

[…] the intuition that perception is indirect is captured by its reliance on priors and generative models to infer the hidden states of the world, and the intuition that perception is direct is captured by the way perceptual inference queries and is subsequently guided by the sensory input causally impinging on it.

Since the causal relation that holds between a predictive system comprised of inverted generative models and the world is partly indirect, so the argument goes, the system is in constant embodied interaction and direct contact with its environment only insofar as it tries to make the effects of hidden causes fit the predictions. This precludes the theoretical possibility of depicting prediction error minimizing systems as being situated, scaffolded, integrated, or extended.

However, this line of reasoning fails to acknowledge the conceptual necessity of emphasizing the functional role of embodied active inference in terms of its contribution to the minimization of prediction error and the optimization of predictions. For even if the causal relations holding between a predictive, generatively organized system and environmental causes are mediated by hypotheses, predictions, prediction errors and precision estimation as encoded in the cortical hierarchy, it does not follow that this system is just a passive receiver of sensory input that informs it about remote states in the environment. Similarly, it does not necessarily follow from the prediction error minimization framework that it “[…] creates a sensory blanket – the evidentiary boundary – that is permeable only in the sense that inferences can be made about the causes of sensory input hidden beyond the boundary”, as Hohwy (2014, p. 7) claims. Rather, the predictive system is part of its socio-culturally structured environment and has many possibilities for bodily acting in that environment in order to facilitate its own cognitive processing routines. Considering embodied active inference, it turns out that the causal relation holding between embodied action (in terms of bodily manipulation) and changes of the set of available stimuli in the environment is as direct as any causal relation could be. This is because these changes are an immediate effect of these very prediction error-minimizing and precision-optimizing actions, which in turn contribute to the performance of cognitive tasks. Furthermore, we need to take into account that genuinely human cognitive processes occur in a culturally sculpted cognitive niche, which is characterized by mathematical symbol systems, representational writing systems, artifacts, and the like, and other human organisms with whom we interact. These cognitive resources have unique properties that render them particularly useful for the completion of cognitive tasks.[6] For example, consider the regularity of line arrangements and the orderliness of succeeding letters in an alphabetic writing system. Once learned and automatized, following these normative principles facilitates several types of cognitive processing routines. That is to say that it is the socio-culturally shaped sensory input itself that has an important impact on the concrete realization of prediction error minimization. This cannot be accounted for if we assume that the predictive processing of cognitive resources is an internalistic, secluded endeavour.

Second, consider a line of reasoning that goes against the compatibility of CI with the prediction error minimization framework, that might be put forward by an integrationist. She might agree that we need a mechanistic description of the neuronal and bodily components which jointly constitute cognitive processes in the close interaction with socio-cultural resources. But she might continue to argue that the performance of cognitive practices is more than just the minimization of prediction error and the optimization of precision.[7] From the perspective of PP, it needs neither to be denied that human cognitive systems as a whole aim to fulfil cognitive purposes by completing cognitive tasks and that they do so by engaging in cognitive practices. Nor should it be rejected that cognitive practices are normatively constrained and that cognitive systems are deeply immersed in a socio-culturally structured environment, which in turn provides these very norms through scaffolding teaching. However, the important theoretical contribution made by the prediction error minimization framework is its providing of a sub-personal, mechanistic description of the underlying neuronal and bodily sub-processes that turns out to be parsimonious, conceptually coherent, and empirically plausible. In addition, PP also offers a description of the close interaction of the neuronal and bodily components constituting cognitive practices by offering a concise description of the ongoing, mutually constraining interplay of perceptual and active inferences. More generally, this section should have established that all important claims and assumptions made by CI in favour of cognitive practices, such as the hybridity, the transformative efficacy, and the enculturated nature of cognitive processes, can be supplemented and refined by taking the prediction error minimization framework into account.

The arguments in favour of the EPP framework directly speak to the current debate within philosophy of mind and philosophy of cognitive science about the relationship between the prediction error minimization framework and approaches to situated, distributed, integrated, or extended cognition. On the one hand, Jakob Hohwy (2013, 2014) denies on both methodological and metaphysical grounds that there is anything like these types of cognition from the perspective of prediction error minimization. According to him, this is because predictive systems have only indirect access to the world. Furthermore, there is “the sensory boundary between the brain and the world” which prohibits predictive systems from engaging in any variant of situated, distributed, integrated, or extended cognition including CI (Hohwy 2013, p. 240). On the other hand, Andy Clark (2013, p. 195) argues that the PP framework at least “[…] offers a standing invitation to evolutionary, situated, embodied, and distributed approaches to help ‘fill in the explanatory gaps’ while delivering a schematic but fundamental account of the complex and complementary roles of perception, action, attention, and environmental structuring.” Once we take the arguments and considerations in favour of EPP into account we have reasons to think that EPP lends support to Clark’s construal of the PP framework. This will become even more persuasive once we take empirical data and a paradigm case of EPP into account.