3 Representations: What are they good for?

PP, Madary suggests, provides a new kind of lever for naturalizing intentionality and mental content. Might it also offer a new perspective upon the vexed topic of internal representation? Varela et al. are explicit that, on the enactivist conception “cognition is no longer seen as problem solving on the basis of representations” (1991, p. 205). PP, however, deals extensively in internal models – models that may (see Clark this collection) be rich, frugal, and all points in-between. The role of such models is to control action by predicting and bringing about complex plays of sensory data. This, the enactivist might fear, is where our promising story about neural processing goes conceptually astray. Why not simply ditch the talk of inner models and internal representations and stay on the true path of enactivist virtue?

This issue requires a lot more discussion than I can attempt here.[4] Nonetheless, the remaining distance between PP and the enactivist may not be as great as that bald opposition suggests. We can begin by reminding ourselves that PP, although it openly trades in talk of inner models and representations, invokes representations that are action-oriented through and through. These are representations that are fundamentally in the business of serving up actions within the context of rolling sensorimotor cycles. Such representations aim to engage the world, rather than to depict it in some action-neutral fashion, and they are firmly rooted in the history of organism-environment interactions that served up the sensory stimulations that installed the probabilistic generative model. What is on offer is thus just about maximally distant from a passive (“mirror of nature” – see Rorty 1979) story about the possible fit between model and world. For the test of a good model is how well it enables the organism to engage the world in a rolling cycle of actions that maintain it within a window of viability. The better the engagements, the lower the information-theoretic free energy (this is intuitive, since more of the system’s resources are being put to “effective work” in engaging the world). Prediction error reports this information-theoretic free energy, which is mathematically constructed so as always to be greater than “surprisal” (where this names the sub-personally computed implausibility of some sensory state given a model of the world – see Tribus 1961). Notice also that the prediction task uses only information clearly available to the organism, and is ultimately defined over the energies that impinge on the organism’s sensory surfaces. But finding the best ways to predict those energetic impacts can (as substantial bodies of work in machine learning amply demonstrate[5]) yield a structured grip upon a world of interacting causes.

This notion of a structured grip is important. Early connectionist networks were famously challenged (Fodor & Pylyshyn 1988) by the need to deal with structure – they were unable to capture part-whole hierarchies, or complex nested structures in which larger wholes embed smaller components, each of which may itself be some kind of structured entity. For example, a city scene may consist of a street populated by shops and cars and people, each of which is also a structured whole in its own right. Classical approaches benefitted from an easy way of dealing with such issues. There, digital objects (symbol strings) could be composed of other symbols, and equipped with pointers to further bodies of information. This apparatus was (and remains) extremely biologically suspect, but it enabled nesting, sharing, and recombination on a grand scale – see Hinton (1990) for discussion. Such systems could easily capture structured (nested, often hierarchical) relationships in a manner that allowed for easy sharing and recombination of elements. But they proved brittle and inflexible in other ways, failing to display fluid context-sensitive responsiveness, and floundering when required to guide behavior in time-pressured real-world settings.[6]

Connectionist research has since spawned a variety of methods – some more successful than others - for dealing with structure in various domains. At the same time, work in robotics and in embodied and situated cognitive science has explored the many ways in which structure in the environment (including the highly structured artificial environments of text and external symbol systems) could be exploited so as to reap some of the benefits associated with classical forms of inner encoding, without (it was hoped) the associated costs of biological implausibility – see, for example, Pfeifer & Bongard (2007). Perhaps the combination of a few technical patches and a much richer reliance upon the use of structured external resources would address the worries about dealing with structure? Such was the hope of many, myself included.

On this project, the jury is still out. But PP can embrace these insights and economies while providing a more powerful overall solution. For it offers a biologically plausible means, consistent (we saw) with as much reliance on external scaffolding as possible, of internally encoding and deploying richly structured bodies of information. This is because each PP level (perhaps these correspond to cortical columns – this is an open question) treats activity at the level below as if it were sensory data, and learns compressed methods to predict those unfolding patterns. This results in a very natural extraction of nested structure in the causes of the input signal, as different levels are progressively exposed to different re-codings, and re-re-codings of the original sensory information. These re-re-codings (I think of them as representational re-descriptions in much the sense of Karmiloff-Smith 1992) enable us, as agents, to lock us onto worldly causes that are ever more recondite, capturing regularities visible only in patterns spread far in space and time. Patterns such as weather fronts, persons, elections, marriages, promises, and soccer games. Such patterns are the stuff of which human lives, and human mental lives, are made. What locks the agent on to these familiar patterns is, however, the whole multi--level processing device (sometimes, it is the whole machine in action). That machine works (if PP is correct) because each level is driven to try to find a compressed way to predict activity at the level below, all the way out to the sensory peripheries. These nested compressions, discovered and annealed in the furnace of action, are what I (following Hinton 1990) would like to call “internal representations”.

What are the contents of the many states governed by the resulting structured, multi-level, action-oriented probabilistic generative models? The generative model issues predictions that estimate various identifiable worldly states (including states of the body, and the mental states of other agents).[7] But it is also necessary, as we saw in Clark (this collection) to estimate the context-variable reliability (precision) of the neural estimations themselves. It is these precision-weighted estimates that drive action, and it is action that then samples the scene, delivering percepts that select more actions. Such looping complexities exacerbate an important consequence that Madary nicely notes. They make it even harder (perhaps impossible) adequately to capture the contents or the cognitive roles of many key inner states and processes using the terms and vocabulary of ordinary daily speech. That vocabulary is “designed” for communication, and (perhaps) for various forms of cognitive self-stimulation (see Clark 2008). The probabilistic generative model, by contrast, is designed to engage the world in rolling, uncertainty-modulated, cycles of perception and action. Nonetheless, high-level states of the generative model will target large-scale, increasingly invariant patterns in space and time, corresponding to (and allowing us to keep track of) specific individuals, properties, and events despite large moment-by-moment variations in the stream of sensory stimulation. Unpacked via cascades of descending prediction, such higher-level states simultaneously inform both perception and action, locking them into continuous circular causal flows. Instead of simply describing “how the world is”, these models - even when considered at those “higher” more abstract levels - are geared to engaging those aspects of the world that matter to us. They are delivering a grip on the patterns that matter for the interactions that matter.

Could we perhaps (especially given the likely difficulties in specifying intermediate-level contents in natural-language terms) have told our story in entirely non-representational terms, without invoking the concept of a hierarchical probabilistic generative model at all? One should always beware of sweeping assertions about what might, one day, be explanatorily possible! But as things stand, I simply don’t see how this is to be achieved. For it is surely that very model-invoking schema that allows us to understand how it is that these looping dynamical regimes arise and enable such spectacular results. The regimes arise and succeed because the system self-organizes around prediction-error so as to capture organism-salient patterns, at various scales of space and time, in the (partially self-created) input stream. These patterns specify complex, inter-animated structures of bodily and worldly causes. Subtract this guiding vision and what remains is just the picture of complex looping dynamics spanning brain, body, and world. Consider those same looping dynamics from the multi-level model-invoking explanatory perspective afforded by PP, however, and many things fall naturally into place. We see how statistically-driven learning can unearth interacting distal and bodily causes in the first place, revealing a structured world of human-sized opportunities for action; we see why, and exactly how, perception and action can be co-constructed and co-determining; and we unravel the precise (and happily un-mysterious) sense in which organisms may be said to bring forth their worlds.