3 The epistemic strategy for understanding others

3.1 What about simulation?

According to Goldman’s (2006) elaborate simulation account, we must distinguish between low-level and high-level mindreading. “Mindreading”, in his view, comprises all cases of evaluating the mental state(s) of another person that normally lead to a language-based attribution of a mental state to a person. In the case of high-level mindreading, this is

[…] mindreading with one or more of the following features: (a) it targets mental states of a relatively complex nature, such as propositional attitudes; (b) some components of the mindreading process are subject to voluntary control; and (c) the process has some degree of accessibility to consciousness. (Goldman 2006, p. 147)

The paradigmatic case of high-level mindreading is understanding another person’s decision. Third-person attribution of a decision consists of:

  1. imagining propositional attitudes in a form of enactment imagination;

  2. using (the same) decision-making mechanisms (as in the first-person case);

  3. projecting the result of using that mechanism onto a third person by attributing a decision.

We can easily present cases in which these proposed essential steps are not involved. For (i), to understand a person suffering from a delusion of persecution, we are not able to deploy enactment imagination: Their case is just too different from our own experience. And the same may be true in cases of deep cultural difference. For (ii), if I have experience with the other person such that I know that he has idiosyncratic, non-rational decision-making habits when making weekend plans, I can use this knowledge to model his decision and not my own decision-making apparatus, since I have experience that my own apparatus differs from his (at least concerning weekend plans). For (iii), grant for the sake of argument that we have a plausible candidate for the beliefs and desires of the other and we use this for enactment imagination as well as input for my own decision-making apparatus, thus reaching a decision to do action A. Then, according to Goldman, I should project this decision onto the other person. Yet there remains an essential gap, which is noted by Goldman but not adequately addressed by him: He observes the necessity of “quarantining” my idiosyncratic background beliefs if I want to come to an adequate projection of the decision to do action A. Suppose I am warranted in presupposing that the other wants an ice-cream, has money, and that there is a nearby cafeteria where he can get one: then the decision-making apparatus may come to the decision to buy an ice-cream. If, however, I am a person who is extremely parsimonious with money, then my own background desire to save money may prevent me from buying the ice-cream in the same situation, and so this intervenes and I do not attribute the decision to buy an ice-cream to the other. But it seems that the desire to save money is—often, at least—an idiosyncratic desire that I should not use in my projection. Yet how do I know which of my own beliefs and desires are idiosyncratic and do not relate to the person I aim to understand? To solve this problem, I must already possess some view about the attitudes of the other as compared to me; yet this was what we were aiming to understand. In general, then, Goldman’s theory of high-level mindreading has difficulties even getting off the ground: It starts by making presuppositions about the beliefs and desires of the other person, where this is exactly what we were aiming to understand. The same problem appears again in the projection phase, as just illustrated. Thus, high-level mindreading is a very special case of simulating a decision of the other, specifically when I already know a lot about the other, which I can use as input. This leaves open the question of how we get this information at all. Goldman tries to account for problems of this kind by accepting the importance of inference-based strategies and the organization of the prior information in form of a theory. Thus he is no longer developing a pure simulation theory but rather a hybrid account. Nevertheless, the counterexamples are not rare but in fact quite typical, and thus they cast doubt on the typicality and pervasiveness of high-level simulation in mindreading decisions.

Goldman may, however, appeal to his strategy of low-level mindreading, which is characterized as an activity that is “comparatively simple, primitive, automatic, and largely below the level of consciousness” (2006, p. 113). Goldman uses as a paradigmatic case face-based recognition of emotion, and he makes an additional appeal to “mirror neurons”, proposing that mirror neurons are not only relevant in the case of understanding motor activities (in both observing and doing them) but also for recognizing mental phenomena like pain and disgust. The most elaborate case relevant to this area of discussion concerns the study of disgust: It has been shown that experiencing disgust and observing disgust are dependent on certain mirror neurons that are activated in both cases (Wicker et al. 2003). Yet what exactly can we learn from this observation? I develop a critical position on the explanatory potential of mirror neurons in two steps. First, I argue that if mirror neurons could provide us with the whole story of how we understand others, this story would not be given as a case of simulation. Second, I cite evidence that mirror neurons do not provide the core part of the story of understanding others in cases of understanding emotions. Let us start with criticism of the claim that low-level mindreading is a case of simulation. Here I mainly rely on lines of criticism worked out by Gallagher (2007), who claims that “simulation is a personal-level concept that cannot be legitimately applied to subpersonal processes” (p. 363). Even if we do not accept Gallagher’s claim, the two core features of simulation would be lacking in the case of resonance processes implemented by mirror neurons: There is neither a first-person perspective involved nor a type of pretence that includes a projection from a first-person perspective to a third-person perspective: “Thus, according to ST, simulation involves the instrumental use of a first-person model to form a third-person ‘as if’ or a ‘pretend’ mental state. For subpersonal processes, however, both of these characterizations fail” (Gallagher 2007, p. 360). Why are mirror neurons not an essential part of understanding others? They represent a type of action or emotion that is independent from a first- or third-person perspective; but the distinction between self and other is an essential part of understanding others. Thus a simulation process cannot be fully captured in its essential aspects by the mirror-neuron processes (see Vogeley & Newen 2002).

This criticism of high-level and low-level mindreading does not imply that simulation processes never take place: rather, it suggests that it is only so-called high-level simulation that we can characterize as simulation, and also that it is implausible that simulation is the standard strategy for everyday understanding of others. The latter claim is also based on the observation that we often rely on automatic, intuitive understanding of others without any conscious considerations.

3.2 What about theory-based inferences?

The same general line of criticism can be developed with respect to theory-based inferences. Such inferences may sometimes be relevant, but are not always so; neither are they the standard strategy for understanding others. Theory-based inferences are important when we are confronted with cases that we find strange or surprising, i.e., situations where we meet another person suffering from a mental disease which we know nothing about, or where the person belongs to a culture that is radically different from ours. In such scenarios, we consciously build hypotheses about the relevant mental phenomena, as well as about the best behavioural strategy to adopt. But most everyday scenarios in which we understand others are not of this type; quite the contrary, we are generally involved in well-known situations with individuals or types of persons with whom we are familiar. There is an effortless application of our know-how regarding dealing with other humans, without any need to rationalize through theory-based inferences. The reply of the advocate of TT would be: Even if the relevant knowledge-how does not involve an explicit theory-based inference, it is only applicable because we rely on implicit theory-based inferences. The criticism of this line of thought is twofold: The status of implicit inferences is very unclear, because inferences are defined as relations between propositions; and there is evidence that implicit information processes are often non-propositional in nature. For example, in the case of experts, very often the epistemic strategy in their field is complex visual pattern-matching without any inferences; with their superior organization of knowledge, for instance, a chess expert can rapidly perceive a promising move, or a medical expert can quickly notice an inconsistency in a suggested diagnosis. The process of smoothly using this information mainly relies on fine-grained pattern-discrimination and pattern-matching (Gobet 1997) in the relevant situation, rather than on drawing inferences (which only becomes the case if the expert has to consider problematic situations). This is supported by observations of the way people recall chess positions: When seeing a chess board that contains a real, meaningful arrangement, chess experts excel as compared to novices in recalling positions, but perform no better for scrambled, impossible positions (Gobet & Simon 1996). This indicates that they are able to “see” meaningful patterns that a novice cannot see. They may use this ability in addition to making inferences, but inferences are not so much their basic access strategy as an additional one.[2] If neither the strategy of simulation nor the strategy of theory-based inferences is the standard strategy upon which our smooth, everyday understanding of others is based, what form does epistemic access to others’ mental states take?

3.3 What about direct perception?

In recent years Gallagher (2008) has argued that our epistemic access to others’ mental phenomena is essentially based on direct perception. The mental states of others are not hidden, and need not be inferred on the basis of perceiving others’ behaviour; rather, behaviour is an expression of the mental phenomena that, in seeing the behaviour, is also seen directly. What does the claim of direct perception involve? Gallagher explains his main idea with an analogy: I can directly see my car. It would be inadequate to claim that I only directly see the colour, the shape, and the material, and then have to infer that it is my car. This is also supported by the fact that, when seeing the car, I at the same time see its drivability. This view does not deny that object-perception involves complex and partially hierarchically-organized brain processes, but it introduces the notion of “smart” perception: If I have learned the concept CAR and I am used to driving cars, I can see a car directly; and in seeing my car I may also see concomitant affordances such as its drivability. The same is true in the case of understanding others: according to Gallagher, by seeing their face and body posture in a specific situation, I can directly see that someone fears an aggressive dog. This can be realized by visual pattern-matching without inferences (see footnote 3 and Newen et al. forthcoming). This is a convincing comparison, especially as regards its potential to give a unified account of both basic perception and what Gallagher calls “smart” perception. The latter are cases in which it appears plausible to accept that perception can be modulated by conceptual information, these usually being described as cases of cognitive penetration (see Macpherson 2012; Vetter & Newen 2014).

Let us illustrate both the basic and the smart perception of an emotion. Basic perception of an emotion takes place when we see fear, joy, anger, or sadness in the face of a person while relying mainly on a single feature, or small group of features, connected with facial expression (Ekman et al. 1972).[3] This can be done through a bottom-up perceptual process that involves almost no top-down influences, especially if the facial expression is very characteristic of an emotion pattern. In the case of smart perception, the perception of the emotion is modulated by higher-order cognitive processes. To show this, we need a case in which the same facial input leads to a different perception of an emotion as a result of conceptual input. Such cases have indeed been discovered: If we first hear a story describing a very unjust situation that makes us expect the person we are going to see to be angry, we have a strong tendency to see a typical “Ekman” fearful face as an angry face: for example, if I am told that the relevant person made a reservation at the restaurant, waited for an hour while many other people who had come in later were served first, and that after a further hour was informed that she would have to wait for at least another hour, then I have a strong expectation of seeing anger. This has been shown to make us see a typical fearful face as an angry face (Carroll & Russell 1996). Smart perception of an emotion is a cognitively-penetrated perception of an emotion, and it is also important for seeing more complex emotions that do not have the typical Ekman facial expressions: if I know that John is jealous of Peter, because he told me so, and I have seen several episodes of Peter behaving intimately towards John’s wife Anne, and the next day I see another episode of John flirting with Anne while Peter observes them, I can directly see the jealousy in Peter’s face. There is no need for inference-based evaluations. This is parallel to Gallagher’s case of seeing one’s car: we may describe both cases as cases of seeing as: seeing my car as a car (by knowing which affordances come with it) and seeing John’s face as evincing jealousy. I illustrated these cases of direct perception because I think Gallagher makes an important point when he claims that the main source of understanding others is direct perception (whether basic or smart). Nevertheless, there are clear limits to direct perception as a form of epistemic access.

Although Gallagher has in the past shown a tendency to overgeneralize the importance of the role of direct perception (2008), he is well aware that there remain cases that cannot be accounted for without going beyond direct perception. This is the case especially concerning our understanding of propositional attitudes—e.g., someone’s desire to take a summer holiday with his elder brother in western Turkey. Propositional attitudes are normally radically underdetermined by expressive elements such as facial expressions, gestures, body postures, etc., in a given situation. In general, therefore, complex human cognitive phenomena of this underdetermined type are communicated by linguistic exchange, or else have to be inferred or simply guessed on the basis of available information. The latter often happens in situations of nontransparent communication due to norms in social situations, or due to the fact that at least one person wants to hide her beliefs and intentions. Since these situations are also part of our everyday life, inferential processes remain part of our everyday understanding of others. Thus, although direct perception is a very important epistemic strategy that we may use in cases of face-based perception of emotion, even “smart” direct perception is not the basic strategy employed to understand complex beliefs, desires, and intentions of others. The latter require inferential processes as well. Thus, we are left with three strategies (simulation, theory-based inferences, direct perception), where none is a clearly dominant standard strategy relevant to all mental phenomena.

But there is at least one further candidate we should take into account, namely understanding though primary interaction (Gallagher & Hutto 2008). All the epistemic strategies discussed so far can apply to situations in which I am simply observing the other without being involved in any interaction. As we have already mentioned, Gallagher views this as a radical defect of such accounts; intuitive understanding of others is part of our everyday life, and this is especially the case if I am not in a purely observational situation but am directly involved in some kind of interaction. Intuitive understanding may then be characterized just by the fact that I notice a social act being directed towards me and so start to interact, such that a standard social interaction is realized, which may be non-linguistic but may also involve linguistic communication—e.g., friendly greetings exchanged while arranging ourselves in line at the office coffee machine. Such a strategy of understanding can only be dominant if the interaction is situated within many conventions, such that smooth understanding can take place without theoretical considerations about the others’ beliefs and intentions (de Bruin et al. 2012). But is understanding though primary interaction, as it already takes place in neonate imitation (Meltzoff & Moore 1977, 1994), really the main or the standard strategy for understanding others? Again, even if we grant that this is an important strategy in basic understanding of others, even in adults—e.g., in minimal understanding deployed by smoothly interacting with a stranger who is taking the same bus—we need more advanced strategies to frame estimations about the ramifications of the situation—e.g., whether taking this bus in an unknown city, by night, and with such people on board, is a reasonable risk to take.

3.4 The multiplicity view

To summarize thus far. We use at least four epistemic strategies to understand others, and we learn to use these strategies on the basis of evidence of successful application in the past in relevantly similar situations. We prefer to use simulation strategies where we have evidence that the other is similar to us in respect of many features that are relevant to the situation of evaluation. We typically use theory-based inferences if we need to account for complex mental phenomena or if an intuitive understanding is, for whatever reason, not available. We use understanding by primary interaction in cases in which we are involved in interaction with the other and only need to understand her or him to a limited degree, such that acting according to conventions is sufficient for a smooth interaction. Finally, we normally rely on direct perception of mental phenomena when we are in an observational stance towards the other and have a rich, well-organized body of experience that allows us to recognize mental phenomena as patterns. This is rather easy in cases of emotion recognition, more complex in recognizing intentions, and almost impossible in understanding complex propositional attitudes of others. Only the combination of all four strategies, in full sensitivity to the context and applied on the basis of our experience in successfully using the strategies, makes us experts in understanding others. Thus, we have reached a first main conclusion concerning strategies of understanding, this being what I call the multiplicity view:

The multiplicity view =Df There is no standard default strategy of understanding others, but in everyday cases of understanding others we rely on a multiplicity of strategies that we vary depending on the context and on our prior experiences (and which are eventually also triggered by explicit training).[4]

This thesis is also supported by a closer look at mental disorder such as Asperger’s syndrome, which is a variant of autism (Fiebich & Coltheart under review). People with Asperger’s syndrome lack an intuitive understanding of others. They are unable to directly perceive emotions on the basis of facial expressions, and they tend to avoid social interaction (Vogeley 2012). Thus intuitive understanding by primary interaction or direct perception is not available for them. Since they also tend to experience themselves as being different (Vogeley 2012), they do not use simulation as a strategy: so they are left principally with theory-based inferences (Kuzmanovic et al. 2011). And this is what we can observe: persons who are autistic try to understand others by asking for theoretical guidance; thus they might ask how long one is allowed to look into the eyes of another person (Kai Vogeley, personal communication; his expertise is based on regular treatment of more than 300 patients). They also learn what people think in typical situations, but become lost in new situations. Since we have to deal with new situations almost every day, autistic people notice their tendency to get lost and many of them avoid social encounters. This special situation is explained by the fact that in contrast to the usual multiplicity of strategies of understanding, they are left with theory-based inferences alone. People with Down’s syndrome are in a contrary kind of situation: they have a good intuitive understanding of others’ emotions, but, due to typically very constrained cognitive abilities, they lack any theory-based inferences. In the early years of childhood—where cognitive skills are not so important as in kindergarten or school—their social life is very similar to the social life of children without Down’s syndrome; but in later life the interdependence of social interaction with cognitive abilities leads to more problems in building an inclusive social life (Buckley et al. 2002). Thus, the normal multiplicity of strategies may be strongly constrained in some conditions of mental disorders. Furthermore, we can roughly cluster direct perception and interaction as the main epistemic access for an intuitive understanding of others, while inference-based understanding is based mainly either on a (high-level) simulation strategy or theory-based inferences (including inferences from narratives, see below). Since in our everyday life most of what is going on is intuitive understanding of others, it is especially important to highlight the relevance of social perception. In what follows, I will argue that the most important unit of clustering information about others is neither a facial unit nor an emotion type (or some other subpersonal unit), but the whole person—and thus a primary aspect of epistemic access is our ability to perceive persons. We perceive persons and their mental settings mainly by directly perceiving them, and/or interacting with them. In addition, we can come to judgments regarding persons by simulating them and/or through inference-based understanding.