4 Is verbal understanding an extended form of perception?

4.1 Perceiving the world through language

One basic problem raised by Millikan’s account of the proliferation of intentional conventional signs is that one and the same linguistic form detached from its context of use may belong to different memetic families (or chains of reproductive events). In the reproductive process, what gets copied from one pair of sender-receivers to the next is not merely a linguistic form (e.g., “clear”), but the use of a linguistic form embedded in a particular context. This is why on Millikan’s (2005, Ch. 10, section 3) view, the boundary between semantics and pragmatics is blurry and the process whereby a hearer tracks the memetic lineage of a conventional sign is a pragmatic process. On the teleosemantic approach, the hearer’s task is to retrieve the appropriate context necessary for recognizing the correct memetic family (or lineage) to which a particular conventional sign belongs. In a nutshell, the hearers’ task is to track the domains of intentional conventional signs.

Thus, it would appear that the hearer’s task is quite similar to what is involved in tracking the restricted domain over which the information carried by a locally recurrent natural sign (e.g., tracks made either by quail or by pheasants) is valid. Since tracking the local domains over which the information carried by locally recurrent natural signs is a perceptual task, it is not surprising that Millikan has persistently urged that “in the most usual cases understanding speech is a form of direct perception of whatever speech is about. Interpreting speech does not require making any inferences or having any beliefs about words, let alone about speaker intentions” (Millikan 1984, p. 62).[7] Millikan (2004, p. 122) nicely illustrates her view that verbal understanding is an extended form of perception:

rain does not sound the same when heard falling on the roof, on earth, on snow, and on the water, even though it may be directly perceived as rain through any of these media. Exactly similarly, rain has a different sound when the medium of transmission is the English language (“It’s raining!”). And it sounds different again when the medium of transmission is French or German.

In a nutshell, “during Normal conversation, it is not language that is most directly perceived by the hearer but rather the world that is most directly perceived through language” (Millikan 2005, p. 207).

Furthermore, both ordinary and extended perception rest on translation, not inference: “the first steps in perception involve reacting to natural signs of features of the outer world by translating them into inner intentional representations of these outer features, for example, of edges, lines, angles of light sources in relation to the eye” (Millikan 2004, p. 118). In normal verbal communication, translation plays a twofold role in mediating transfer from the speaker’s belief to the addressee’s belief. First, the speaker of a descriptive utterance translates her belief into a sentential conventional sign. Secondly, the addressee translates the content of the speaker’s utterance into his own new belief (Millikan 1984, 2004, 2005).

4.2 Ordinary and extended perception

Clearly, Millikan’s thesis that verbal understanding is an extended form of perception is not consistent with the Gricean thesis that verbal understanding is an exercise in mindreading. But on the face of it, the thesis that verbal understanding is an extended form of perception (of whatever speech is about) itself is puzzling for at least three related reasons.[8] First of all, as Millikan (2004, Ch. 9) herself recognizes, there is a major difference between the content of a perceptual representation of some state affairs and the verbal understanding of the content of another’s testimony about the very same state of affairs. At an appropriate distance and in good lighting conditions, one could not perceive a cup resting on a table without also perceiving its shape, size, color, texture, content, orientation, and spatial location with respect to the table, to any other object resting on the table, and especially to oneself. As Millikan (2004, p. 122) recognizes, unlike the content of testimony, the content of ordinary perception can be put at the service of action precisely because it provides information about the agent’s spatial relation to an object that is potentially relevant for action. But if an addressee located in a room next to the speaker’s room understands the content of the latter’s utterance of the sentence “There is a cup on the table”, he may endorse the belief that there is a cup on the table without having any definite expectation about the shape, size, color, texture, content, orientation, and spatial location of the cup with respect to himself, the table, or anything else.

Second, the thesis that verbal understanding is an extended form of perception ought to be restricted to the hearer’s verbal understanding of the meanings of descriptive utterances of indicative sentences with a mind-to-world direction-of-fit, which describe facts (or actual states of affairs). It cannot without further modifications be directly applied to the hearer’s verbal understanding of the meanings of prescriptive utterances of imperative sentences whereby a speaker requests an addressee to act so as to turn a possible (non-actual) state of affairs into a fact (or an actual state of affairs). Prescriptive utterances, which have a world-to-mind direction of fit, fail to describe any fact that could be directly perceived at all. So the question arises whether Millikan would be willing to endorse the revised two-tiered thesis that (i) a verbal understanding of a speaker’s descriptive utterance is the perception of whatever the utterance is about and (ii) a verbal understanding of a speaker’s prescriptive utterance is to intend to perform whatever action is most likely to comply with the speaker’s request.

Finally, testimony enables a speaker to convey beliefs whose contents far outstrip the perceptual capacities of either the speaker or her addressee. For example, an addressee may understand that the speaker intends to verbally convey to him her belief that there is no greatest integer, that democracy is the worst form of government except all those other forms that have been tried from time to time, or that religion is the opium of the people. But it does not make much sense to assume that either the speaker or her addressee could perceive what the speaker’s utterance is about.

4.3 Tracking the domains of intentional conventional signs

Furthermore, the thesis that verbal understanding is an extended form of perception clearly rests on the assumption that the process whereby the hearer of a speaker’s utterance tracks the memetic family of the intentional conventional sign used by the speaker is basically the same as the process whereby human and non-human animals track the meanings of locally recurrent natural signs in their circumscribed domain of validity. As I mentioned above, Millikan (2004) argues that perception is the basic process whereby animals track the meanings of locally recurrent natural signs in their circumscribed domain of validity. Crucially, one can track the meanings of locally recurrent natural signs within their circumscribed domain of validity without representing an agent’s psychological state. So the question arises whether a hearer of a speaker’s utterance could always track the memetic family of the intentional conventional signs used by a speaker without representing any of the speaker’s psychological states.

In particular, as Recanati (2007) has argued, the question arises for descriptive utterances containing at least four kinds of conventional expressions considered by Millikan (2004, Chs. 10–12): so-called unarticulated constituents in Perry’s (1986) sense, incomplete definite descriptions, quantifiers, and possessives. Consider first an utterance of (1):

(1) It is raining.

It is unlikely that by an utterance of (1) a speaker means to assert that it is raining somewhere or other at the time of utterance. Instead, she is likely to mean that it is raining at the time of utterance and at the place of utterance (which remains unarticulated in the sentence). If by an utterance of (1), the speaker could only mean that it is raining at the place of utterance, then Millikan’s claim that a hearer need not represent any of the speaker’s psychological states for the purpose of tracking the local domains of intentional conventional signs might be vindicated. However, by an utterance of (1) on the phone, a speaker located in Paris may mean that it is raining in Chicago, not in Paris. Similarly, a French speaker located in Paris may use the incomplete description “the President” to refer, not to the French President, but instead to the President of the US.

For the purpose of understanding an utterance of a sentence containing a universal quantifier, as shown by example (2), the hearer must be able to properly restrict the domain of the quantifier:

(2) Everyone is asleep.

By an utterance of (2), the speaker presumably means to assert, not that everyone in the universe is asleep, but that everyone in some restricted domain (e.g., a relevant household) is asleep.[9] The relevant restricted domain is the domain the speaker has in mind. Finally, by using the possessive construction “John’s book”, the speaker may have in mind many different relations between John and the book: she may mean the book written by John, the book read by John, the book bought by John, the book sold by John, the book John likes, the book John dislikes, the book John just referred to in the conversation, the book John lost, the book John gave to the speaker, the book the speaker gave to John, the book the hearer gave to John, and so on. Unless the hearer hypothesizes what relation the speaker has in mind, he will fail to understand what the speaker means by her utterance of “John’s book”. In none of these four cases does it seem as if the hearer could recognize the memetic family of intentional conventional signs, i.e., track their relevant domains—unless he could represent the contents of some of the speaker’s beliefs or assumptions.