3 Multiplicity needs coherence

While this surely is an attractive way to describe social understanding, and does justice to its oft-proclaimed manifoldness, these mechanisms have been described in several theoretical frameworks that operate under different (and partly contradictory) metaphysical background assumptions.[3] Thus, a simple combination of them does not come easily. Simulation and theory-based inference have been described within a computationalist, cognitivist framework which often assumes that the mind is mainly a representational and internal device (Bruin & Kästner 2012), i.e., a functional structure locally realized in the brains of individual organisms. Bodily and environmental structures play at most an enabling or causal role for a specific internal mechanism. In contrast, DP and primary interaction, both of which are concepts stemming from the phenomenological tradition, have their roots in an enactive account of cognition (cf. Gallagher 2008, p. 537), thus rejecting basic metaphysical assumptions of cognitivism (e.g., representationalism, reductionism, mechanistic explanations; Rowlands 2009).[4] The theoretical background of DP and primary interaction views the mind as a non-representational, relational device which emerges within the skillful interaction between organism and environment:

The enactive interpretation is not simply a reinterpretation of what happens extra-neurally, out in the intersubjective world of action where we anticipate and respond to social affordances. More than this, it suggests a different way of conceiving brain function, specifically in nonrepresentational, integrative and dynamical terms. (Gallagher et al. 2013, p. 422)

More specifically, enactive and phenomenological approaches to social cognition not only see the body as part of cognitive processing, they also assign a very important status to interaction. While enactive theories display interaction as (at least possibly) constituting social cognitive processes (De Jaegher & Paolo 2007, p. 493), traditional mindreading theories have not even considered interaction to be an element which could influence social cognition (cf. Fuchs & Jaegher 2009, p. 466).

There are several reasons why ST and TT have been spelled out in a more cognitivist set of assumptions, while DP and primary interaction have been described in reference to an enactive framework. Although their roots in the history of ideas plays an important role, there are deeper systematic reasons why it makes sense to couch them in different sets of metaphysical assumptions. To see this, consider the relation between the external world and internal processing in either framework. A rather cognitivist view assumes that the task of the brain is to figure out the outside world and that this world is internally represented.[5] Since other people belong to this world outside of one’s own mind, it follows that the causes for their behavior need to be inferred by internal representation processing as well. Because it is assumed that the brain is the only mental organ (Hohwy this collection), the location of (social) cognitive processing thus can be said to be inside one individual’s head. Simulation and theorizing fit neatly into this picture of the mind; they are inference processes which function to disambiguate social input and are implemented by specific neural mechanisms. By contrast, an enactive view of social cognition as has been described by De Jaegher and colleagues and advocated by Gallagher, presupposes two different assumptions. First, in order to assume that interaction dynamics carry as much of the “cognitive load” to understand other minds as is proposed, a relational view of the mind enters the picture. It is important to understand that an enactive view is not the same as an externalist view, which could be compatible with assumptions of the cognitivist camp (cf. Rowlands 2009, p. 54). The mind is, according to such an enactive perspective, neither internal nor external; it constitutes itself within the relation (hence relational) between an embodied agent and its environment (cf. Di Paolo & Thompson 2014, p. 68; Engel et al. 2013, p. 202). Such a view enables the claim that interactions are examples of this unfolding mental process and thus constitute social cognition. This claim is incompatible with an internalist perspective, which does not ascribe any constitutional power to mind-external properties.

Furthermore, if the external world and the minds of others could be directly perceived without further mental processing or inference, neither simulation nor theoretical inference would be needed. This is exactly the point of the non-cognitivist camp, as becomes obvious in this quote by Newen: “The mental states of others are not hidden, and need not to be inferred on the basis of perceiving the behavior; rather, behavior is an expression of the mental phenomena that, in seeing the behavior, is also directly seen” (this collection, p. 5). What does it mean that something can be directly seen? Gibson (1979) introduced DP in relation to his famous conception of “affordances”: “The affordances of things for an observer are specified in stimulus information. They seem to be perceived directly because they are perceived directly” (Gibson 1977, p. 79). Importantly, the direct perception of affordances is possible because, according to Gibson, affordances are physically real (i.e., they exist independent of the perceiving subject) and as such are perceivable properties of objects in the environment (cf. 1979, p. 129). Note how this is crucially different from a view which assumes that object properties need to be mentally represented, thus requiring an intermediary step.[6] However, Gallagher makes explicit in a footnote (cf. 2008, p. 537) that his conception of DP is not to be entirely equated with a Gibsonian notion of the term. Gallagher emphasizes that he does not deny the underlying complexity of perceptual processing, much rather he counts those processes as belonging to perception. He thus puts forth the conception of “smart perception”:

But this informing process is already built into the perceptual process so that as I consciously perceive, my perception is already informed by the relevant sub-personal processing. I don’t first perceive and then add memory in order to recognize my car. My perception, in this sense, is direct even if the sub-personal sensory processing that underpins it follows a complex and dynamic route. (ibid., p. 537)

Even with that kind of definition, his view still presupposes that there are properties of external objects that can be “directly” picked up, that exist independently from the perceiving subject. As such, it is indeed reminiscent of a Gibsonian conception. The difference between cognitivist and non-cognitivist pictures of social cognition, in the cases that I just described, seems to boil down to the metaphysical assumption of whether or not there are hidden causes in the outside world that require an inference or representational mechanism in order to access and process them. While ST and TT clearly assume such a view, DP denies it. Therefore, I claim that MV cannot simply combine theoretical elements that draw on such considerable metaphysical differences.

Another important difference between these theoretical approaches is how each treats the issue of phenomenology. While the experiential nature of social encounters plays at most a minor role in mindreading theories, such as ST and TT, the phenomenal level is of paramount importance for the enactive camp, who advocate for DP. This becomes most obvious in the claim that the experienced smoothness and immediacy of social interactions tells us something about the epistemic access to other minds. However, “directness” as a concept in academic research is relative to a specific level of description. Let me explain this in more detail. Consider Gallagher’s argument that smart perception is a subpersonally informed mechanism (cf. ibid., p. 537) that directly enables an individual to perceive the minds of others without “additional mental effort.” It is based largely on the rapid activation of mirror neurons (30-100ms, ibid., p. 541), such that he claims a distinction between a merely perceptual process and an additional mental process does not make sense. In his words:

A distinction at the neural level between activation of the visual cortex and activation of the pre-motor cortex does not mean that this constitutes a distinction between processes that are purely perceptual and processes that involve something more than perception. (ibid., p. 541)

The question that follows is how one should individuate mental mechanisms, and I suggest that functional properties are much more substantial and conceptually relevant individuation criteria than temporal properties. It is, to me, highly questionable whether temporal correlation justifies assuming that there is mechanistic inseparability. The functional role of a mental mechanism seems a much less arbitrary criterion. Furthermore, it enables a more fine-grained view of the subpersonal processes that underlie social cognition. Instead of talking about perception—which could include all processes if only they are activated in a more or less specific amount of time—it is possible to take a closer look at which brain region correlates with which mechanism. If mechanisms are individuated by their functional role instead of the temporal properties of the physical realizers of this functional role, it makes sense to assume that the visual system and the mirror neuron system are distinct. If they are, however, it is unfeasible to speak of “smart perception”. This concept presupposes that perceptual and post-perceptual processes can coherently be described as one mechanism, which I reject. Additionally, the concept of “direct perception” does not apply anymore either, since mirror mechanisms should be seen as a functionally distinct and therefore intermediary step in the process of understanding others. I thus conclude that DP—as described by Gallagher—does not coherently apply to the subpersonal level of description.

This relates to my main point, namely that there are different levels of description at which a phenomenon can be scrutinized. At the phenomenological level, DP can be described as the experience of directly and immediately perceiving the other person’s mental states. I walk into my living room, I see my friend’s face and I experience myself as instantaneously knowing that she is really upset. However, this experiential quality of directness is brought forth by a subpersonal process, which is indirect, as I have argued above. At any other level of description, therefore, directness does not apply. In this view, DP is a phenomenal quality of some mental states and should thus not be confused with the epistemic mechanism itself. The simultaneity in our everyday experience does not justify anything on other levels of description. I therefore argue that DP should be treated as a phenomenal quality of some social encounters instead of assigning it the status of an epistemic strategy to access other minds.

Note that Newen does not explicitly support a phenomenological or enactive view of the mind, nor does he make any claims about the metaphysics of social cognition. What he does do, however, is emphasize Gallagher’s conception of DP and primary interaction as being the main sources for an epistemic access to other minds (cf. Newen this collection, p. 8). If Newen was to reject the strong claims of a non-representational view of (social) cognition, however, it is questionable how closely his notions of DP and primary interaction, as core concepts of his theory, actually relate to their original formulations. This leaves us with two options. The first is to assume that Newen fully endorses the views of his oft-cited colleague. In this case, the problem of compatibility becomes obvious. The second, and more likely possibility is that the author does not support DP and primary interaction with all their metaphysical implications. It indeed seems that he rather re-formulates both concepts so that they possibly fit into a representational framework. According to Newen (ibid., p.5), DP is realized by a process of pattern recognition and primary interaction – although Newen explicitly cites Gallagher & Hutto (2008) – is characterized as follows: “[…] I notice a social act being directed towards me and so start to interact, such that a standard interaction is realized, which may be nonlinguistic but may also involve linguistic communication […]” (Newen this collection, p. 7). What is problematic here is that one of the most interesting and valuable features of MV gets lost, namely its potential to fulfill demands of the interactive turn. A true fulfillment would require widening the theoretical scope of social cognition by going beyond the study of individual brains and considering bodily, interactive and phenomenological processes more carefully.

What needs to be reconciled and made conceptually consistent is thus our choice of a specific, unified methodological framework—our overarching theoretical approach of simulation, theory-based inference, DP and primary interaction—since they all describe important aspects of social understanding. It should be a common aim to work with a coherent set of metaphysical assumptions, since whether or not one agrees on either set of background assumptions has important implications for both theoretical and empirical research. Not only does that decision influence our choice of the unit of analysis, i.e., how we frame the explanatory unit for empirical research. For a long time, this unit has been one individual observing another. It has been claimed, however, that this does not properly reflect the real nature of social cognition, and thus a shift is needed:

The explanatory unit of social interaction is not the brain, or even two (or more) brains, but a dynamic relation between organisms, which include brains, but also their own structural features that enable specific perception-action loops involving social and physical environments, which in turn affect statistical regularities that shape the structure of the nervous system […]. (Gallagher et al. 2013, p. 422)

When an enactive or phenomenological perspective is adopted and the status of interaction as constituting social cognition is accepted, this adds an additional level of analysis (i.e., an “interactionist stance”; De Jaegher et al. 2010) while erasing one that is profound and fundamental for most researchers: representation. Furthermore, the shared goal to pay more attention to the body, interaction and phenomenology comes with many methodological challenges. For all these reasons it should be in the common interest of the research field to find a way to ease the tensions.

As I have shown, Newen tries to combine four elements that might not be entirely compatible. However, the core of his idea is highly valuable, and certainly should not be rejected. What his pluralistic account of social cognition claims is that there are low-level social mechanisms that mainly rely on interaction and do not need complex or explicit thought, while higher-level, sophisticated mechanisms play a just as important role for the phenomenon. While some social situations require processes that allow complex thinking, other contexts can be intuitively disambiguated. In what follows, I will sketch an alternative framework, based on Metzinger’s theory of three-level embodiment, which I claim is able to integrate the four elements while operating on coherent background assumptions. Additionally, it has the potential to fulfill the demands of the interactive turn by paying more attention to interactive contexts, the role of the body and the importance of phenomenology.