3 Reading acquisition: A case of enculturation

So far, I have argued that the notion of enculturation and key claims made by CI can be enriched by taking the PP framework into account. In particular, the hybridity, embodiedness, and transformative character of enculturated cognition can be mechanistically described in terms of prediction error minimization. However, cognitive practices cannot be fully reduced to prediction error minimization, since they have a normative dimension that needs to be investigated on a personal level of description.

This section serves to illustrate the validity of the line of reasoning put forward in this commentary. This will be done by showing that reading acquisition, understood as another case of enculturation next to mathematical cognition, can be fruitfully described from the perspective of EPP.

3.1 Scaffolded learning and the acquisition of cognitive norms

One crucial aspect of learning to perform a cognitive practice is the acquisition of the relevant cognitive norms, where this class of norms “govern[s] manipulations of external representations, which aim at completing cognitive tasks” (Menary 2010, p. 238). In the case of reading, these norms concern the recognition and identification of tokens of a representational writing system. In alphabetic writing systems, important cognitive norms are derived from the so-called alphabetic principle, where this principle amounts to the “mapping [of] written units onto a small set of elements – the phonemes of a language” (Rayner et al. 2001, p. 33; see also Snowling 2000, p. 87). Specifically, the correspondence of graphemes to phonemes puts culturally established, normative constraints on the ways in which individual letters (and combinations thereof) are related to phonological units. The normative scope of these correspondences is best illustrated by differences across languages and orthographies. As pointed out by Ziegler & Goswami (2006, p. 430), “[i]n some orthographies, one letter or letter cluster can have multiple pronunciations (e.g. English, Danish), whereas in others it is always pronounced in the same way (e.g. Greek, Italian, Spanish).[8] This demonstrates that the degree of consistency or transparency of grapheme-phoneme correspondences is subject to arbitrary stipulations by a linguistic, literate community employing a specific orthographic system. These stipulations are normative insofar as they constrain the ways in which combinations of letters are pronounced and written words are correctly related to spoken words. The acquisition of this normative knowledge needs “explicit instruction in the alphabetic principle” (Rayner et al. 2001, p. 57).[9] It follows that learning these norms is socially structured and dependent upon the cooperation of experts with novices. This fits neatly with Menary’s (2013, p. 361) following assumption:

Manipulative norms and interpretative norms apply to inscriptions of a public representational system and are never simply dependent on an individual. Indeed, it is the individual who must come to be transformed by being part of the community of representational system users.

Acquiring knowledge about grapheme-phoneme correspondences, especially in an inconsistent orthography such as English, puts demands not only on the novice, but also on the teachers who assist her in learning these correspondences. For the teachers, being experts in reading, need to break down their automatic identification and recognition skills in order to be able to teach the norms underlying the relationship between graphemes and phonemes. As Sterelny (2012, p. 145) points out more generally, “[e]xpert performance is often rapid and fluent, without obvious components. Learning from such performance is difficult. It becomes much easier if the task is overtly decomposed into segments, each of which can be represented and practiced individually.” In the present context, the most successful strategy of teaching grapheme-phoneme correspondence has turned out to be so-called phonics instruction (cf. Rayner et al. 2001, pp. 31f): “[…] teaching methods that make the alphabetic principle explicit result in greater success among children trying to master the reading skills than methods that do not make it explicit” (ibid., p. 34). This goes along with teaching novices that spoken language consists of phonemes. That is, children’s reading acquisition is dependent upon, or at least co-develops with phonological awareness, where this is understood as “[…] the ability to perceive and manipulate the sounds of spoken words” (Castles & Coltheart 2004, p. 78). The metalinguistic awareness that spoken language consists of phonemes must be explicitly acquired and allows the novice to learn that these units correspond to letters, or combinations thereof. It is still debated whether phonological awareness is a prerequisite for learning to read or whether it is co-emergent with basic letter decoding skills. However, as suggested by Castles & Coltheart (2004, p. 104), “[…] it may not be possible for phonemic awareness to be acquired at all in the absence of instruction on the links between phonemes and graphemes.” Thus, it seems safe to assume that phonological awareness clearly facilitates the ability to relate graphemes to phonemes. There are other components of meta-linguistic awareness that influence the successful application of norms governing alphabetic representational writing systems. Beginning readers are already proficient speakers of their native language and are able to fluently apply syntactic, semantic, and pragmatic norms in their everyday conversations. However, they are usually unable to explicitly represent that utterances are made up of sentences and that sentences are made up of combinations of words (cf. Frith 1985, p. 308; Rayner et al. 2001, p. 35). To novices, these basic properties must be made explicitly available in order to put those novices in the position to apply knowledge about them automatically and fluently at later stages of reading acquisition. Furthermore, novices need to be acquainted with the convention, which is fairly obvious to expert readers, that alphabetic writing systems are decoded from left to right and from the top to the bottom of a page. These basic personal-level components of the acquisition of reading skills provide the cognitive norms necessary for the development of reading understood as a cognitive practice. It is these norms that govern the successful manipulation of representational vehicles belonging to an alphabetic writing system that need to be established by social interaction between learners and teachers. Thus, becoming proficient in applying the alphabetic principle, getting to grips with phoneme-grapheme correspondences, and developing phonological and metalinguistic awareness are cases of scaffolded learning.

3.2 Reading acquisition and neuronal transformation

Next to scaffolded learning, another crucial aspect of cognitive transformation is LDP (cf. Menary 2013, p. 356, this collection, p. 8). Indeed, in the case of reading acquisition, there is unequivocal evidence pointing to “[…] plastic changes in brain function that result from the acquisition of skills” (Ansari 2012, p. 93). By the same token, Ben-Shachar et al. (2011, p. 2397) emphasize that “[…] culturally guided education couples with experience-dependent plasticity to shape both cortical processing and reading development.” As Schlaggar & McCandliss (2007, p. 477) point out, the application of knowledge about grapheme-phoneme correspondences in novice readers “[…] implicates the formation of functional connections between visual object processing systems and systems involved in processing spoken language.” The left ventral occipitotemporal (vOT) area appears to play a crucial role in establishing these connections.

As mentioned by Menary (this collection), there has been consensus on the contribution of the vOT area to a neuronal reading circuit. In a series of experiments, Stanislas Dehaene, Laurent Cohen and their colleagues have made the remarkable discovery that neuronal activation in one particular region of the left vOT area is reliably and significantly associated with visual word recognition in adult, non-pathological readers (Cohen & Dehaene 2004; Dehaene 2005, 2010; Dehaene & Cohen 2011; Dehaene et al. 2005; McCandliss et al. 2003; Vinckier et al. 2007). This region, especially the left ventral occipito-temporal sulcus next to the fusiform gyrus, frequently responds to visually presented words regardless of the size, case, and font in which they are made available (cf. Dehaene 2005, p. 143; McCandliss et al. 2003, p. 293). This consistent finding has led these researchers to call it the visual word form area (VWFA), since it crucially contributes to “[…] a critical process that groups the letters of a word together into an integrated perceptual unit (i.e. a ‘visual word form’)” (McCandliss et al. 2003, p. 293). However, it is debatable whether the left vOT area is almost exclusively dedicated to visual word recognition in expert readers, or whether this area serves several functions having to do with the (visual) identification of shapes more broadly construed (see Price & Devlin 2003, 2004, for a discussion). Nevertheless, the findings by Dehaene and his colleagues that the left vOT area plays a crucial role in the overall visual word recognition process is important and widely acknowledged, although the interpretations of its functional contribution differ.

An important motivation for research on the overall function of the left vOT area stems from considerations on the phylogenetic development of visual word recognition. Considering that writing systems were invented only approximately 5400 years ago, it is unlikely that the ability to read is the result of an evolutionary process (cf. Dehaene 2005, p. 134, 2010, p. 5; McCandliss et al. 2003, p. 293). In a nutshell, the crucial question is how visual word recognition is possible given “[…] that the human brain cannot have evolved a dedicated mechanism for reading” (Dehaene & Cohen 2011, p. 254). This is also referred to as the “reading paradox” (Dehaene 2010, p. 4). The solution to this paradox proposed by Dehaene and his colleagues is to assume “[…] that plastic neuronal changes occur in the context of strong constraints imposed by the prior evolution of the cortex” as a result of the human organism being exposed to tokens of a certain writing system (Dehaene & Cohen 2011, p. 254). Specifically, the idea is “[…] that writing evolved as a recycling of the ventral visual cortex’s competence for extracting configurations of object contours” (ibid.). This view, which has been dubbed the neuronal recycling hypothesis (cf. Dehaene 2005, p. 150), suggests that existing neuronal functions associated with visual cognition are “recycled” for the phylogenetically recent, ontogenetically acquired capacity to recognize visually presented words (cf. Cohen & Dehaene 2004, p. 468; see also Menary 2014, p. 286). This “recycling” is in turn constrained by the overall evolved neuronal architecture and already existing processing mechanisms (cf. Dehaene 2010, pp. 146f). Thus, neuronal recycling is just a special type of neuronal reuse (see Anderson 2010, for a discussion). There are certain conditions that need to be met if a specific cortical area is to be ‘recycled’ for a phylogenetically recent cognitive function (see Menary 2014, p. 288). In the case of visual word recognition, the left vOT area is assumed to exert certain “functional biases” that make it most suitable for the recognition and identification of visually presented words: “(1) a preference for high-resolution foveal shapes; (2) sensitivity to line configurations; and (3) a tight proximity, and, presumably, strong reciprocal interconnection to spoken language representations in the lateral temporal lobe” (Dehaene & Cohen 2011, 256). These “functional biases”, however, do not preclude that the left vOT area is still engaged in other cognitive processes such as object recognition in skilled adult readers (cf. Carreiras et al. 2014, p. 93; Dehaene & Cohen 2011, p. 257; Price & Devlin 2004, p. 478). Rather, it helps explain why this area is found to be well-equipped for contributing to the overall process of visual word recognition. However, the question arises what the contribution of the left vOT area to the overall visual word recognition process is supposed to make. According to Cathy Price’s & Joseph Devlin’s (2011) Interactive Account (IA), the contribution of the left vOT area can be best described and explained in terms of PP. In line with the general principles of the PP framework presented above, they generally hold the following assumption: “Within the hierarchy, the function of a region depends on its synthesis of bottom-up sensory inputs conveyed by forward connections and top-down predictions mediated by backward connections” (Price & Devlin 2011, p. 247). In other words, the suggested synthesis equals the prediction error that results from the discrepancy between top-down predictions and bottom-up sensory information. Applied to the patterns of neuronal activation associated with visual word recognition, this assumption is specified as follows:

For reading, the sensory inputs are written words (or Braille in the tactile modality) and the predictions are based on prior association of visual or tactile inputs with phonology and semantics. In cognitive terms, vOT is therefore an interface between bottom-up sensory inputs and top-down predictions that call on non-visual stimulus attributes. (Price & Devlin 2011, p. 247)

Accordingly, the vOT area is supposed to be associated with a distinct level of the hierarchical generative model responsible for visual word recognition mediating between higher-level, language-related predictions and bottom-up visual information. It follows that “[…] the neural implementation of classical cognitive functions (e.g. orthography, semantics, phonology) is in distributed patterns of activity across hierarchical levels that are not fully dissociable from one another” (ibid., p. 249). Specifically, IA proposes a neuronal mechanism that is able to demonstrate how linguistic knowledge about phonology and semantics, encoded in top-down predictions, causally interacts with bottom-up information. This is because it is held that a prediction error is generated each time bottom-up information diverges from the associated top-down prediction. In turn, the resulting prediction error is associated with significant activation in the left vOT area. Empirical evidence supporting this approach to the functional contribution of the left vOT area to visual word recognition in expert readers is widely available (see, e.g., Bedo et al. 2014; Kherif et al. 2011; Kronbichler et al. 2004; Schurz et al. 2014; Twomey et al. 2011).

In reading acquisition, the left vOT area appears to be an equally important contributor to visual word recognition. According to Price & Devlin (2011, p. 248), the activation level of the vOT area develops in a non-linear fashion, as the proficiency in visual word recognition increases:

In pre-literates, vOT activation is low because orthographic inputs do not trigger appropriate representations in phonological or semantic areas and therefore there are no top-down influences […]. In early stages of learning to read, vOT activation is high because-top-down predictions are engaged imprecisely and it takes longer for the system to suppress prediction errors and identify the word […]. In skilled readers, vOT activation declines because learning improves the predictions, which explain prediction error efficiently […].

That is, IA assumes that the level of activation within the left vOT area is dependent upon the general establishment and refinement of a generative model comprising both lower-level areas associated with visual processing and higher-level cortical areas associated with phonological and semantic knowledge. If this account turns out to be correct, the blurredness of the distinction between perception and cognition as suggested by Clark (2013) becomes vitally important. For it is the mutual interplay of lower-level processing stages (traditionally associated with visual processing) and higher-level processing stages (traditionally associated with phonological and semantic processing) that renders the successful acquisition of visual word recognition possible in the first place. Evidence in favour of IA comes from studies demonstrating that there is a significant increase of activation in this area as a result of exposure to visually presented words in beginning readers across different research paradigms and methodologies employing fMRI (e.g., Ben-Shachar et al. 2011; Gaillard et al. 2003; Olulade et al. 2013). Furthermore, two longitudinal ERP studies (Brem et al. 2010; Maurer et al. 2006) demonstrate that the left-lateralized occipito-temporal N1 effect, an effect associated with print sensitivity, does not develop in a linear fashion in the course of reading acquisition. Rather, Maurer et al.’s (2006, p. 756) comparison of their results obtained from their child participants with an adult control group indicates that “[i]nstead of a linear increase with more proficient reading, the development is strongly nonlinear: the N1 specialization peaks after learning to read in beginning readers and then decreases with further reading practice in adults following an inverted U-shaped developmental time-course.” In this vein, Brem et al. (2010, p. 7942) interpret their results by suggesting that “[t]he emergence of print sensitivity in cortical areas during the acquisition of grapheme-phoneme correspondences is in line with the inverse U-shaped developmental trajectory of print sensitivity of the ERP N1, which peaks in beginning readers […].

Another consequence of Price’s & Devlin’s (2011) PP account of reading acquisition is that the activation level within the vOT should be associated with the degree of accuracy of top-down predictions in the face of bottom-up signals. This is supported by various studies demonstrating that higher-level activations of cortical areas associated with language processing are also present in beginning readers. For example, Turkelhaub et al. (2003, p. 772) report that “[a]ctivity in the left ventral inferior frontal gyrus increased with reading ability and was related to both phonological awareness and phonological naming ability. […] Brain activity in the anterior middle temporal gyrus also increased with reading ability”, where this area is associated with semantic processing. Similarly, Gaillard et al. (2003) report activation in the middle temporal gyrus, which is frequently associated with semantic processing in expert readers (e.g., Bedo et al. 2014, p. 2; Price & Mechelli 2005, p. 236; Vogel et al. 2013, p. 231; Vogel et al. 2014, p. 4). Furthermore, they report significant activation patterns in left IFG, which is associated with both phonological and semantic processing.

In the light of much empirical evidence in favour of Price’s & Devlin’s (2011) approach to the neuronal changes corresponding to reading acquisition, it seems safe to assume that it is empirically plausible and can account for many data derived from experiments in cognitive neuroscience. However, to what extent can this approach be conceptually enriched? Recall that learning a new skill such as reading is just a special case of overall prediction error minimization according to the PP framework. On this construal, learning to read means becoming increasingly efficient in predicting linguistic, visually presented input as a result of long-term exposure to types of this input and the optimization of hypotheses through perceptual inference. The careful instruction in relating graphemes to phonemes, phonological and metalinguistic awareness, and the normatively constrained alphabetic principle provides the environmental conditions for efficient and progressively more accurate prediction error minimization. The signals delivered by this highly structured learning environment are estimated as being precise, such that the synaptic gain on error units reporting the discrepancy between (still inaccurate) predictions and prediction error is high. As learning to read proceeds, the predictions become more accurate and the overall influence of prediction error shows a relative decrease. This line of reasoning is supported by Price’s & Devlin’s (2011, p. 248) following suggestion: “At the neural level, learning involves experience-dependent synaptic plasticity, which changes connection strengths and the efficiency of perceptual inference.” Understood this way, LDP and the associated neuronal transformations can be understood as being realized by prediction error minimization in the context of scaffolded learning, which allows a beginning reader to become ever more efficient and successful in this particular cognitive practice.

3.3 Reading acquisition and bodily transformation

Starting from the hybrid mind thesis defended by CI, which states that certain cognitive processes are constituted by both neuronal and extracranial bodily sub-processes, it seems natural to assume that reading acquisition also is associated with the transformation of bodily sub-processes. That is, in the course of enculturation it is the enactment of bodily manipulation that is transformed in addition to the neuronal changes occurring as a result of LDP. In terms of PP, this assumption leads to the suggestion that it is not only perceptual inferences that are causally relevant for learning described in terms of prediction error minimization, but also active inferences that allow for ever more efficient sub-personally employed strategies for “explaining away” incoming sensory input. Recall that eye movements are just a special case of active inference (see e.g., Friston et al. 2012). Their functional contribution to prediction error minimization becomes vitally important for a complete account of visual word recognition and its acquisition. This is because visual word recognition, in both novices and experts, is rendered possible by the coordination of perceptual and active inference. From the perspective of CI, the idea here is that the ways in which an individual bodily manipulates a certain cognitive resource is importantly improved in the course of cognitive transformation. Applied to reading acquisition, this leads to the assumption that specific eye movement patterns become more efficient as a result of reading instruction and iterate exposure to a certain type of cognitive resource (say, sentences printed on a piece of paper).

Recently, it has become possible to investigate eye movements in beginning readers by employing eye-tracking methodologies. Converging evidence suggests that beginning readers make more fixations (i.e., acquisition of visual information in the absence of oculomotor activities), saccades (i.e., oculomotor activities), and regressions (i.e., backward saccades), and exhibit longer fixation durations and smaller saccade amplitudes than proficient and expert readers (cf. Joseph et al. 2013, p. 3; Rayner et al. 2001, p. 46). More specifically, these tendencies are assessed in a longitudinal eye-tracking study reported by Huestegge et al. (2009). They measured eye movements during an oral reading task in second and fourth graders of a German primary school and additionally assessed overall reading skills and oculomotor behaviour beyond reading (cf. Huestegge et al. 2009, p. 2949). Their results indicate that the fourth graders, in comparison to the second graders, show a decrease of fixation duration, gaze duration, total reading time, refixations, and saccadic amplitudes (cf. ibid., p. 2956). Huestegge et al. (2009, p. 2958) attest that the younger, less proficient readers show a “[…] refixation strategy, with initial saccade landing positions located closer to word beginnings.” Similarly to Huestegge et al. (2009), Seassau et al. (2013) report a longitudinal study comparing the performance of 6- to 11-year-old children in a reading task and a visual task. In line with the empirical evidence already mentioned, their results indicate that “[w]ith age, children’s reading capabilities improve and they learn to read by making larger progressive saccades, fewer regressive saccades and shorter fixations […]” (Seassau et al. 2013, p. 6). Furthermore, it is demonstrated that the eye movement patterns employed in reading and in visual search diverge with increasing reading proficiency (cf. ibid., p. 9).

An explanation of these results in terms of PP is straightforward. In beginning readers, the predictions initiating active inference occurring in a highly-structured linguistic environment are inaccurate, such that the generation and execution of eye movements in terms of active inference is not as efficient as it is in the case of expert readers. By the same token, the inaccuracy of the currently selected prediction makes it necessary to sample the visually available linguistic environment more thoroughly, explaining the “refixation strategy” and the execution of comparatively more saccades. As reading skills improve, resulting from increasingly efficient prediction error minimization through perceptual inference as already suggested, the accuracy of predictions becomes increasingly optimal, therefore allowing for more efficient active inference. More efficient active inference, in turn, allows for more efficient perceptual inference, since both types of inference mutually influence each other. This line of reasoning is supported by Huestegge et al.’s (2009, p. 2957) claim informed by the results of their study “[…] that only linguistic, not oculomotor skills were the driving force behind the acquisition of normal oral reading skills.” Thus, the increase in efficiency of eye movements in beginning readers does not result from an increase in oculomotor capabilities per se, but works in tandem with higher-level linguistic knowledge encoded in predictions, which are associated with representations in higher-order cortical areas. As a result, the improvement of active inference in the course of reading acquisition works in tandem with the improvement of perceptual inference. This highlights that learning to read does not only result in neuronal, but also in bodily transformations. As such, the optimization of eye movements in the course of reading acquisition highlights the importance of bodily manipulation in the efficient enactment of reading understood as a cognitive practice. This also means to suggest that a complete account of enculturation should not only pay attention to scaffolded learning and LDP, but also to the developmental trajectory of bodily manipulation.