4 1-3E – First-order embodiment, second-order embodiment, third-order embodiment

Before I describe how the framework of 1-3E itself can be exploited for a pluralistic picture of social cognition, let me describe the framework in more detail. Metzinger’s goal is to provide a framework which shows how the experience of being a self is generated within an embodied system (cf. Metzinger 2014, p. 272). The basic assumption is that experiential phenomena (such as phenomenal selfhood) can be described at several different levels: they have a specific phenomenal quality (i.e., phenomenological level of description), which is brought forth by underlying computations and representations (i.e., computational/representational level of description). These are implemented by their physical counterparts (i.e., implementational level of description). 1-3E is a theory about the grounding relations between them, that is, the grounding relations holding between phenomenal properties of representational states and their physical and computational resources. In a broader context, Metzinger claims that “the self” is not a thing or an entity (2004), but rather the phenomenal product of a complex computational process which happens to take place in embodied systems. If that is the case, however, the following question arises: How exactly is the experience of being a self generated within an embodied system? In other words, what are the grounding relations of phenomenal selfhood?[7]

Metzinger introduces three levels: first-order embodiment, second-order embodiment and third-order embodiment (Metzinger 2006, 2014). Importantly, these concepts not only describe different levels of embodiment and their relation within one system, they also refer to different classes of systems which possess different kinds of embodiment. To see this, think of the following three systems which all possess a body and some sort of skillful behavior: a worm, an advanced robot (e.g., the “starfish”, see Metzinger 2007), and a human in a waking state. As for the worm, it is safe to say that, in order to navigate its environment, it directly exploits its physical (i.e., bodily) resources. It is highly unlikely, however, that one would find any rule-based computation over an explicit symbol-like representational structure in the worm’s nervous system. In Metzinger’s terms, this kind of system possesses first-order embodiment (1E system). In contrast to this rather rudimentary kind of embodiment, 2E systems (i.e., systems which possess second-order embodiment) do unconsciously represent themselves as embodied. This means that they have some kind of body model that can be exploited by the system in several ways (e.g., as a functional tool for motor control) and sustains skillful interaction with the environment. Importantly, 2E enables counterfactual representation, i.e., the ability to represent possible states without actual execution. The body model thus functionally underlies both physical and virtual behavior (see Cruse & Schilling this collection). What 2E systems are lacking, however, is a phenomenal representation of themselves as embodied systems. While a robot like the starfish can use its unconscious body representation to steer movements, it does not experience itself as doing so. Only systems that possess third-order embodiment (3E systems) experience this phenomenal quality of being an agent that owns a body. Humans in non-pathological waking states, for example, possess this kind of embodiment. Along with the ability to use their body model in the same way as 2E systems do, they have the additional sense of owning and controlling this model (cf. Metzinger 2014, pp. 274–275). Interestingly, it is also here that we once again find the phenomenology of “directness” and “immediacy”. It is important to note that 2E and 3E systems always possess lower levels of embodiment as well, since they build onto each other and higher levels presuppose the existence of lower levels. In this way, 1-3E can be seen as a grounding theory. To briefly summarize, systems that phenomenally represent themselves as embodied agents possess 3E. Phenomenal properties of states, described at this level, are computationally grounded by referring to a unified representation of the body – second-order embodiment. This unconscious body model, in turn, is grounded in physical and bodily resources, which are described at the lowest level of the hierarchy.[8]

Metzinger is clear about the relation between 2E and 3E; the representational content of 2E is “elevated to the level of global availability and integrated with a single spatial situation model plus a virtual window of presence” (ibid., p. 274). However, one thing that remains relatively vague in his theory is the relation between 1E and 2E. The problem I see here is that Metzinger does not explicitly describe what actually grounds 2E and which role bodily structures play besides that of yielding a grounding relation.[9] A 1E system is defined as a “purely physical, reactive system”, which adapts to its environment by exploiting its physical resources. This is not, in my view, what is being represented by a 2E system, which represents itself “as an embodied agent” (ibid., p. 273). What is needed is a more detailed and specific description of 1E and its relation to 2E. Therefore, the discussion of 1E in my own proposal is twofold. First, I analyze the low-level mechanisms that can be described at this level, claiming that they enable basic social skills (e.g., coupling). Second, I describe which neural, bodily and perhaps even extra-bodily structures most likely underlie social processes that are located at the level of 2sE.

There is one important aspect of 3E that I wish to describe in more detail as it will be crucial for my theory. Metzinger distinguishes two kinds of phenomenal properties instantiated by conscious representational states; they can be either transparent or opaque. Notice that he uses those terms in a rather counterintuitive way I will try to make sense of in the following.[10] An analogy that might help to do so is to think of the difference between a freshly cleaned and a quite dirty window front. In the first case, when the glass is transparent, we can see everything behind it while not perceiving the glass as a medium we are looking through. However, if the glass is dirty and opaque, we will not only have trouble seeing the things behind it, we will also perceive the window itself as something we are looking through.[11] In analogy, consider mental states (and their processing stages) as either transparent or opaque. A mental state is opaque when it is experienced as a representational state. A quite straightforward example is explicit thought where an individual is consciously aware of the fact that she is thinking. The process of representation is represented as such in this case, and is therefore opaque. In contrast, if a state is transparent, earlier processing stages are not phenomenally represented; they are not part of the experience of an individual. In the case of phenomenal selfhood, for example, all that is experienced is the sense of being a self in a world. The fact that this experience is a representational process is not part of its phenomenal content. Note that the distinction between phenomenal properties of epistemic mechanisms (such as computations and representations) and epistemic mechanisms themselves is central to the concept of transparency. If we do not experience that a specific phenomenal state is generated subpersonally, when the underlying processes are not elevated to the level of experience, all we experience is the subjective, phenomenological profile of that state. Such a claim is only valid, however, if we assume that these two levels are actually distinct, which seems to be denied by some philosophers in the phenomenological tradition.

In what follows, I will modify parts of the 1-3E framework in order to make it suitable for a pluralistic view of social understanding. The basic scaffold of the theory is retained, since its hierarchical structure is helpful for describing a multi-facetted phenomenon like social cognition. It also offers the possibility for future research to pair 1-3E and 1-3sE with other hierarchical theories of cognition, such as the predictive processing framework (PP; Clark 2013b; Hohwy 2013). PP has not only been described as a very promising theory to unify perception, action and cognition (Clark 2013b), it has also been fruitfully applied to social cognition (Kilner et al. 2007). 1-3sE has the potential to integrate this explanatorily powerful approach, the details of which can be spelled out in future research, but cannot be pursued in this commentary. I furthermore adopt the idea that different levels of embodiment represent different levels of sophistication and complexity in a system. In order to strengthen this idea and to give an even more differentiated view of social understanding, I aim to make the difference between transparent and opaque social states more obvious. While the general distinction between transparency and opacity is retained, I will modify this aspect in order to make it fruitful for social understanding. To do so, I introduce the concept of “3sE+”, which describes experiences in social situations that need explicit and conscious thinking.

Transparency makes it furthermore possible, according to Metzinger, to distinguish one’s own body from that of others (cf. Metzinger 2014, p. 274). However, there is an objection I wish to make about this point. I claim that a self-other distinction that functionally serves to identify one’s own body in contrast to those of others is already present at the level of 2sE and thus can be achieved without phenomenally representing one’s body. I will argue for this claim in more detail in the next section.

Additionally, my proposal offers novel ways to enrich Metzinger’s original account. He claims that the functional structure of the body model opens a window into social cognition (cf. ibid., p. 273). However, I suggest that this could be a bidirectional relation. There are hints in the literature that being immersed in a social environment is crucial and formative for more general cognitive skills and their development. For example, anecdotal evidence shows that emotional neglect of caregivers severely impairs the physical and mental development of children (Zimmer 1989). Empirical research furthermore shows that the presence, interaction, perception and emotional engagement of and with others shape self-related body representations (e.g., Furlanetto et al. 2013; Schilbach et al. 2013). Longo & Tsakiris (2013) thus conclude that this line of research suggests a strong connection between first-person and so-called second-person (Schilbach et al. 2013) processes, which needs to be considered by researchers of each camp: “Such findings support a model of first-person perspective according to which our sense of self is plastically affected by multisensory information as it becomes available during self-other interactions” (Longo & Tsakiris 2013, p. 430). I thus conclude that it should not only be considered how the development of a self-model influences social cognition, but also which role social processes play in forming such a self-model. This opens interesting and new questions for research on both social cognition and the self. One could ask, for example, whether some social cognitive skills are necessary for the development of a stable self-model or whether there are “genuinely social” parts of the self-model.