6 Teleosemantics and the puzzles of early human social cognition

6.1 Millikan’s developmental puzzle

To further undermine the mindreading thesis, Millikan (1984, 2000, 2004, 2005) has also appealed to findings from the developmental psychological investigation of early human social cognition, showing that “children younger than about four, although fairly proficient in the use of language, don’t yet have concepts of such things as beliefs, desires, and intentions” (Millikan 2005, p. 204). If such children do not have such concepts, then, unlike adults, they cannot reflectively engage in tasks of mindreading, i.e., in tracking the contents of others’ intentions, beliefs, and desires. To the extent that they can engage in verbal understanding, this further shows that verbal understanding cannot rest on mindreading (or belief–desire psychology).

As Millikan emphasizes, much developmental evidence shows that before they are at least four years old the majority of human children systematically fail elicited-response false-belief tasks. (In the terminology of developmental psychologists Baillargeon et al. 2010, elicited-response tasks are tasks in which a participant is requested to generate an explicit answer in response to an explicit question.) For example, in the Sally-Anne test, after Sally places her toy in the basket, she leaves. While Sally is away, Anne moves Sally’s toy from the basket to the box. When Sally returns, participants, who know the toy’s actual location, are explicitly asked to predict where Sally (who falsely believes her toy to be in the basket) will look for her toy. The evidence shows that the majority of three-year-olds, “although quite proficient in the use of language” (in Millikan’s terms, Millikan 2005, p. 204), typically point to the box (i.e., the toy’s actual location), not to the basket where the agent falsely believes her toy to be (cf. Wimmer & Perner 1983, Baron-Cohen et al. 1985 and Wellman et al. 2001 for a meta-analysis).

Millikan assumes that the failure of most three-year-olds in such elicited-response false-belief tasks demonstrates that they lack what she calls a “representational theory of mind”. In a nutshell, she assumes that success at elicited-response false-belief tasks is a necessary condition for crediting an individual with a representational theory of mind (i.e., the ability to track the contents of others’ false beliefs). Acceptance of this assumption gives rise to Millikan’s developmental puzzle, which is “to understand how very young children can be aware of the intentions and of the focus of attention of those from whom they learn language without yet having this sort of sophisticated theory of mind” (Millikan 2005, p. 205). Before explaining why Millikan’s assumption is contentious, I shall briefly examine Millikan’s solution to her own puzzle.

Millikan’s solution involves three related ingredients, the most important of which is her thesis that normal verbal understanding is an extended form of perception (which does not require thinking about a speaker’s intention at all). Second, she argues that young children can understand the goal-directedness of a speaker’s communicative action without tracking the content of her communicative intention. Third, she argues that young children can understand the referential focus of a speaker’s attention without having a sophisticated theory of mind. As I understand it, much of the argument for the possibility of understanding the referential focus of a speaker’s attention without having a sophisticated theory of mind rests on the thesis that verbal understanding is an extended form of perception. As I have already expressed doubts about the thesis that verbal understanding is an extended form of perception, I shall now briefly examine the second thesis: that young children could understand the goal-directedness of a speaker’s communicative without tracking the content of her communicative intention.

Millikan (2005, pp. 206–207) offers two main reasons for granting young children the ability to recognize the goal-directedness of a speaker’s communicative action without granting them a full representational theory of mind. First, she argues that the evidence shows that mammals (dogs and cats and non-human primates, presumably, as well) lack a representational theory of mind but have the ability to recognize the goal-directedness of each other’s behavior. So by parity, very young children should also be granted the ability to recognize the goal-directedness of others’ actions, including speakers’ communicative actions. Second, she argues that communicative actions are cooperative actions. When young children are engaged in some cooperative action (including a communicative action) with a caretaker, they can easily keep track of the shared goal of the cooperative action, while tracking the focus of the speaker’s visual attention, without having a full representational theory of mind.

On the one hand, there is evidence that non-human primates recognize the goals of conspecifics engaged in the execution of instrumental actions (Call & Tomasello 2008). On the other hand, there is also evidence that non-human primates—and birds as well—can discriminate knowledgeable agents (who know about, e.g., food from visual perception) from ignorant agents (who don’t know about food because their line of vision is obstructed) in competitive situations (Bugnyar 2011; Call & Tomasello 2008; Dally et al. 2006; Hare et al. 2001; Tomasello et al. 2003). But the question raised by Millikan’s puzzle is to understand what enables very young human children to make sense jointly of a speaker’s goal and the focus of her visual attention, when the speaker is performing a communicative action, not an instrumental action, in a cooperative, not a competitive, context. The fact that non-human primates can represent the goal of an agent’s instrumental action and discriminate a knowledgeable from an ignorant agent in a competitive context falls short of providing the required explanation.

Furthermore, two of Millikan’s assumptions are contentious in light of recent findings from developmental psychology. One is her assumption that young children could recognize the goal-directedness of speakers’ communicative actions without a representational theory of mind. The other is her assumption that success at elicited-response false-belief tasks should be taken as a criterion for having the ability to track the contents of others’ false beliefs (and therefore having a representational theory of mind). I shall start with the former, which amounts to denying the asymmetry between instrumental and communicative agency—which I earlier dubbed the thesis of the ostensive nature of human communicative agency.

6.2 The puzzle of imitative learning

The first relevant developmental finding, reported by Gergely et al. (2002), shows that approximately one-year-old human children (fourteen-month-olds) selectively imitate an agent’s odd action. First, infants were provided with ostensive cues whereby an agent made manifest her intention to convey some valuable information by looking into the infants’ eyes and addressing them in motherese. She then told the infants that she felt cold and covered her shoulders with a blanket. She finally performed an odd head-action whereby she turned a light box in front of her by applying her head, in two slightly different conditions. In the hands-occupied condition, she used her hands in order to hold the blanket around her shoulder while she executed the head-action. In the hands-free condition, she ostensibly placed her free hands on the table while she executed the head-action. Gergely et al. (2002) found that while 69% of the children replicated the head-action in the hands-free condition, only 21% did in the hands-occupied condition. In the hands-occupied condition, the majority of children used their own hands to turn the light box on. Csibra & Gergely (2005, 2006) further report that the asymmetry between infants’ replication of the model’s odd head-action in the hands-free and hands-occupied conditions vanishes if the model fails to provide infants with ostensive cues.

Gergely & Csibra (2003) have reported evidence that twelve-month-olds expect agents engaged in the execution of instrumental actions to select the most efficient action as a means towards achieving their goal (or goal-state), in the context of relevant situational constraints. So the findings on imitation reported by Gergely et al. (2002) raise the following puzzle. Many more infants replicated the agent’s head-action when the teleological relation between the agent’s means and the agent’s goal was opaque (in the hands-free condition) than when it was transparent (in the hands-occupied condition). Why did infants reproduce the agent’s head-action more when it was a less efficient means of achieving the agent’s goal of switching the light box on?

The Gricean thesis about the ostensive nature of communicative agency and the asymmetry between instrumental and communicative agency is relevant to answering this puzzle. Arguably, reception of ostensive signals prepared the infants to interpret the agent’s action as a communicative, not an instrumental, action. It made manifest to the infants that the agent intended to make something novel and relevant manifest to them by her subsequent non-verbal communicative action. In the hands-occupied condition, the infants learnt how contact was necessary in order to turn on the light bulb, which was part of an unfamiliar device. Since the model’s hands were occupied, the infants whose own hands were free assumed that that they were free to select the most efficient means at their disposal to achieve the same goal as the model. In the hands-free condition, the model could have used her hands, but she did not. So the infants learnt from the model’s non-verbal demonstration that they could turn the light on by applying their own heads.

On the one hand, the evidence shows that infants construe imitative learning as a response to an agent’s communicative action and that they selectively imitate a model’s action as a function of what they take to be relevantly highlighted by the model’s communicative act (cf. Southgate et al. 2009). On the other hand, further evidence shows that newborns prefer to look at faces with direct gaze over faces with averted gaze. Right after birth, they display sensitivity to eye-contact, infant-directed speech or motherese, and infant-contingent distal responsivity. If preceded by ostensive signals, an agent’s gaze shift has been shown to generate in preverbal human infants a referential expectation, i.e., the expectation that the agent will refer to some object (Csibra & Volein 2008, cf. Csibra & Gergely 2009, and Gergely & Jacob 2013, for review).

One further intriguing piece of evidence for the early sensitivity of human toddlers to the ostensive nature of human communicative agency is offered by experiments that shed new light on the classical A-not-B perseveration error phenomenon first reported by Piaget (1954). Infants between eight and twelve months are engaged in an episodic hide-and-seek game in which an adult repeatedly hides a toy under one (A) of two opaque containers (A and B) in full view of the infant. After each hiding event, the infant is allowed to retrieve the object. During test trials where the demonstrator places the object repeatedly under container B, infants continue to perseveratively search for it under container A where it had been previously hidden. Experimental findings reported by Topal et al. (2008) show that minimizing the presence of ostensive cues results in significant decreases of the perseverative bias in ten-month-olds. This finding is consistent with the assumption that infants do not interpret the hide-and-seek game as a game, but instead as a teaching session about the proper location of a toy.

All this evidence strongly suggests that human infants are prepared from the start to recognize nonverbal ostensive referential signals and actiondemonstrations addressed to them as encoding an agent’s communicative intention to make manifest her informative intention to make some relevant state affairs manifest to the addressee. But of course this raises a puzzle: how could preverbal infants recognize an agent’s communicative intention to make manifest her informative intention? A novel approach to this puzzle has been insightfully suggested by Csibra (2010). According to Csibra, very young infants might well be in a position similar to that of a foreign addressee of a verbal communicative act, who is unable to retrieve a speakers informative intention for lack of understanding of the meaning of the speakers utterance. Nonetheless, the foreign addressee may well recognize being the target of the speaker’s communicative intention on the basis of the speaker’s ostensive behavior. Furthermore, ostensive signals to which preverbal human infants have been shown to be uniquely sensitive can plausibly be said to code the presence of an agent’s communicative intention. If this is correct, then little (if any) further work would be left for preverbal infants to infer the presence of a speaker’s communicative intention after receiving ostensive signals.

6.3 The puzzle about early false-belief understanding

As Millikan has emphasized, much developmental psychology has shown that the majority of three-year-olds fail elicited-response false-belief tasks. For example, when asked to predict where an agent with a false belief will look for her toy, most three-year-olds who know the toy’s location point to the toy’s actual location, and not to the empty location where the mistaken agent believes her toy to be. However, in the past ten years or so, developmental psychologists have further designed various spontaneous-response false-belief tasks, in which participants are not asked any question and therefore not requested to produce any answer. Typical spontaneous-response tasks involve the use of the violation-of-expectation and anticipatory-looking paradigms, which involve two steps. In habituation or familiarization trials, participants are first experimentally induced to form expectations by being repeatedly exposed to one and the same event. Second, in test trials of violation-of-expectation experiments, participants are presented with either an expected or an unexpected event. By measuring the time during which participants respectively look at the expected vs. the unexpected event, psychologists get evidence about the nature and content of the infants’ expectations formed during the habituation or familiarization trials. Psychologists can also use the anticipatory-looking paradigm and experimentally determine where participants first look in anticipation of the agent’s action, thereby revealing their expectation about the content of the agent’s belief.

Thus, in a seminal study based on the violation-of-expectation paradigm by Onishi & Baillargeon (2005), fifteen-month-olds saw an agent reach for her toy either in a green box or in a yellow box when she had either a true or a false belief about her toy’s location. Onishi and Baillargeon report that fifteen-month-olds looked reliably longer when the agent’s action was incongruent rather than congruent with the content of either her true or false belief. In a study based on the anticipatory looking paradigm, twenty-five-month-olds were shown to look correctly towards the empty location where a mistaken agent believed her toy to be, in anticipation of her action (Southgate et al. 2007). Many further subsequent studies show that toddlers and even preverbal human infants are able to track the contents of others’ false beliefs and expect others to act in accordance with the contents of their true and false beliefs.

In a classical experiment by Woodward (1998), six-month-olds were familiarized with an agent’s action, who repeatedly chose one of two toys. In the test trials, the spatial locations of the toys were switched and the infants either saw the agent select the same toy as before at a new location or a new toy at the old location. six-month-olds looked reliably longer at the former than at the latter condition. Luo & Baillargeon (2005) further showed that infants do not look reliably longer at a change of target if, in the familiarization trials, the agent repeatedly reached for the same object, but there was no competing object (for further discussion cf. Jacob 2012). This result has been widely interpreted as showing that six-month-olds are able to ascribe a preference to an agent. Luo (2011) further found that ten-month-olds who know that an agent is in fact confronted with only one object (not two) ascribe a preference to the agent if she falsely believes that she is confronted with a pair of objects, but not if the agent knows (as the infants do) that she is confronted with only one object.

Thus, the psychological investigation of early human social cognition is currently confronted with a puzzle different from that confronted by Millikan: on the one hand, robust findings show that the majority of three-year-olds fail elicited-response false-belief tasks such as the Sally-Anne test. On the other hand, more recent findings based on spontaneous-response tasks show that preverbal infants expect others to act in accordance with the contents of their true and false beliefs. The puzzle is: how do we make sense of the discrepancy between both sets of experimental findings?

So far, psychologists have offered two broad strategies for this, one of which assumes (as Millikan does) that success at elicited-response false-belief tasks is a necessary condition of the ability to ascribe false beliefs to others, which is taken to be the output of “a cultural process tied to language acquisition” (Perner & Ruffman 2005, p. 214). Their burden is to explain away the findings about preverbal infants without crediting them with the ability to track the contents of others’ false beliefs. Thus, the majority of “cultural constructivist” psychologists offer low-level associationist accounts of the findings about preverbal infants based on spontaneous-response tasks. Other psychologists (including Baillargeon et al. 2010; Bloom & German 2000 Leslie 2005; Leslie et al. 2004; Leslie et al. 2005; Scott et al. 2010) argue that the findings about preverbal infants show that they can track the contents of others’ false beliefs. Their burden is to explain why elicited-response false-belief tasks are so challenging for three-year-olds. The prevalent non-constructivist explanation is the processing-load account offered by Baillargeon and colleagues.

The core of the associationist strategy is to account for findings about preverbal human infants based on spontaneous-response tasks on the basis of a three-way association between the agent, the object, and its location. It postulates that infants will look longer in the test trials at events that depart more strongly from the three-way association generated by the familiarization trials. For example, in the test trials of Onishi & Baillargeon (2005), infants should look longer when the agent reaches for her toy in the yellow box if in the familiarization trials the agent placed her toy in the green box on three repeated occasions.

The main obstacle for the associationist path is a recent study by Senju et al. (2011) based on the anticipatory-looking paradigm. In the familiarization stage, eighteen-month-olds experience the effect of wearing either an opaque blindfold through which they cannot see or a trick blindfold through which they can see. In the first trials of the test phase, the children are familiarized to seeing an agent retrieve her toy at the location where a puppet has placed it in front of her. The agent’s action is always preceded by a pair of visual and auditory cues. In the last test trial, the agent first sees the puppet place the toy in one of the two boxes; she then ostensibly covers her eyes with a blindfold, and finally the puppet removes the toy. After the puppet disappears, the agent removes her blindfold and the cues are produced. Using an eye-tracker, Senju et al. (2011) found that only infants who had experienced an opaque blindfold, not infants who had experienced a trick see-through blindfold, reliably made their first saccade towards the empty location in anticipation of the agent’s action.

Senju et al.’s (2011) findings are inconsistent with the associationist strategy: since all infants saw exactly the same events, they should have formed exactly the same threefold association between the agent, the toy, and the location, and on this basis they should have gazed at the same location in anticipation of the agent’s action. But they did not. Only infants whose view had been previously obstructed by an opaque blindfold, not those whose view had not been obstructed by a trick blindfold, expected the blindfolded agent to mistakenly believe that the object was still in the opaque container after the puppet removed it.

The evidence against the associationist strategy is also evidence against the assumption (accepted by Millikan) that success at elicited-response false-belief tasks is a necessary condition for having a representational theory of mind and being able to track the contents of others’ false beliefs. But this assumption is unlikely to be correct if, as several critics of the cultural constructivist strategy have argued, the ability to ascribe false beliefs to others is not a sufficient condition for success at elicited-response false-belief tasks. As advocates of the processing-load account (Baillargeon et al. 2010) have argued, an agent could have the ability to ascribe false beliefs to others and still fail elicited-response false-belief tasks for at least three reasons: she could fail to understand the meaning of the linguistically-encoded sentence used by the experimenter to ask the question. She could fail to select the content of the agent’s false belief in the process whereby she answers the experimenter’s question. She could fail to have the executive-control resources necessary to inhibit the prepotent tendency to answer the question on the basis of the content of her own true belief. I will now argue that solving the puzzle about early belief-understanding may well depend on acceptance of the Gricean thesis of the ostensive nature of communicative agency and the asymmetry between instrumental and communicative agency.

I now want to offer a speculative solution to the puzzle about early false-belief understanding based on two related Gricean assumptions. The first is the asymmetry between the non-ostensive nature of instrumental agency and the ostensive nature of human communicative agency. The second related assumption is that the human ability to track the content of the false belief of an agent of an instrumental action must be a by-product of the ability to deal with deception (e.g., lying) in the context of human communicative agency.

In the typical Sally-Anne elicited-response false-belief task, participants are requested to make sense of two actions performed by two different agents at the same time: they must track the contents of the motivations and epistemic states of a mistaken agent engaged in the execution of an instrumental action (Sally) and they must also make sense of the communicative action performed by the experimenter who asks them “Where will Sally look for her toy?” The findings based on spontaneous-response tasks strongly suggest that much before they become proficient in language use, young human children are able to spontaneously track the contents of the false beliefs of agents of instrumental actions. So the question is: what is it about the experimenter’s question that biases them towards pointing to the toy’s actual location?

In Helming et al. (2014), we have argued that two biases are at work, one of which is a referential bias and the other of which is a cooperative bias. The referential bias itself turns on two components. On the one hand, the experimenter could not ask the question “Where will the agent look for her toy?” unless she referred to the toy. On the other hand, the experimenter shares the participants” correct epistemic perspective on the toy’s location. In answering the experimenter’s question, participants have the option of mentally representing either the toy’s actual location or the empty location (where the mistaken agent believes her toy to be). The experimenter’s question may bias young children’s answer towards the actual location by virtue of the fact that the experimenter both referred to the toy (whose actual location they know) and shared the participants’ correct epistemic perspective on the toy’s actual location (at the expense of the mistaken agent’s incorrect perspective on the empty location). What we further call the cooperative bias is the propensity of young children to help an agent with a false belief about her toy’s location achieve the goal of her instrumental action by pointing to the actual location (cf. Warneken & Tomasello 2006, 2007; Knudsen & Liszkowski 2012), in accordance with their own true belief about the toy’s actual location. If so, then young children might interpret the prediction question “Where will Sally look for her toy?” as a normative question: “Where should Sally look for her toy?” Of course, the correct answer to the normative question is the toy’s actual location, not the empty location where the mistaken agent believes her toy to be.