11 Absolute representation

The phenomenal precision principle tells us that if the phenomenology of perception is grounded in its representational content, then peripheral unattended perception must be more imprecise than foveal attended perception. This result applies to contrast, size, spatial frequency and some other properties but not location. However, experimental results to be described in the next section suggest that contrast perception is as precise in foveal attended perception as in peripheral unattended perception. But what this evidence does not tell us is how precise they both are, i.e., whether both are relatively precise or relatively imprecise.

I mentioned a study by Mazviita Chirimuuta & David Tolhurst (2005a) that is relevant to the issue of how precise absolute representations of contrast are in foveal attended perception. Chirimuuta and Tolhurst have a behavioral result that shows that performance in classifying contrasts falls off sharply after 4 contrasts. They have a neural model of contrast identification that suggests that the brain is capable of representing only 4-5 contrasts and that this limit is compatible with very fine-grained discriminations. Chirimuuta’s view is that the response probabilities in the visual system for contrasts are very broad, with the tails of every distribution covering much of the span of possible contrasts. (That is, there is a non-zero probability across almost the whole range of contrasts.) Contrasts can only be identified when the response is near the peak of the probability distribution but two responses can be compared when responses are in the tails so long as the tails do not overlap much.

I’ll start with the behavioral result. She presented subjects with a number of patches of up to 8 grades of contrast that were labeled “1” through “8” in each sequence of trials. Subjects looked at the contrasts and labels for as long as they liked and could have a refresher any time in the midst of the experiment if they liked. They had to hold the pairs of digits and contrasts in working memory and assign numbers to contrast stimuli. Then, patches were presented for half a second and subjects had to try to give the digit label. Performance was good up to 4 items and fell off drastically for larger sets.

Performance on 4 contrasts was near perfect. Then when new contrasts outside the original range were added, performance fell off, even for the original 4 contrasts. This is a pattern often seen in working memory experiments. For example, wild monkeys participated in an experiment in which an experimenter sets up two buckets and ostentatiously places, one at a time, a number of pieces of apple in each bucket. For example, there might be 4 in one bucket and 3 in the other. The result is that for numbers of slices of 4 or less, monkeys reliably go to the bucket with more but with more than 4 items, performance falls off to chance (Barner et al. 2008; Hauser et al. 2000). Human infants show similar results with a limit closer to 3 (Feigenson et al. 2002).

The number 4 figures in working memory experiments in which subjects are asked to remember digits but are given another simultaneous distraction task to prevent overt strategies of “chunking” digits into units. Subjects can typically remember about 4 digits. In a completely different paradigm, George Sperling showed subjects a grid of letters briefly (1960). Subjects often said they could continue to see all or almost all the items faintly after the patch disappeared. (This kind of image has been called a visual “icon”.) When the grid had 3 rows of 4 items, and subjects were asked to recite as many letters as they could, they could name 3-4 letters. However Sperling gave subjects a cuing system: a high tone for the top row, a medium tone for the middle row and a low tone for the bottom row. When cued, subjects could report 3-4 from any given row.

In a different paradigm, honeybees were trained on a maze in which they had to choose to go either left or right at a T-junction to get a reward. At the entrance of the maze there were dots on each side and the bees had to choose the side with more dots to get the reward. The bees could learn to choose 4 rather than 3 but not 5 rather than 4 (Gross et al. 2009).

The working memory significance of roughly 4 items is so ubiquitous that it stimulated an article called “The magical number 4 in short-term memory: A reconsideration of mental storage capacity” (Cowan 2001). Up until 5-10 years ago, “slot” models of working memory were popular. I think it would now be agreed that roughly slot-like behavior emerges from an underlying working memory system in which there is a pool of resources that is distributed over items differently depending on number and complexity (Ma 2014). George Alvarez & Patrick Cavanagh (2004) suggested that there might be a limit of around 5 items of ideally simple structure but Alvarez’s recent work suggests a more complex picture in which there are a variety of components of working memory that may independently fit a more slot-like or a more pool-like structure (Suchow et al. 2014). Slot-like working memory depends on simple stimuli that are hard to confuse with one another. Stimuli that have shown slot-like behavior include alphanumeric characters, horizontal/vertical rectangles and colors that differ substantially from one another. (I am indebted to conversations with Weiji Ma on this topic.)

So I would suggest that Chirimuuta’s behavioral result probably depends on the fact that subjects had to hold a number of pairs of digits and contrasts in mind in order to categorize the next contrast. (You could try it yourself for say 5 lengths.) They did well up to 4 such pairs and then performance declined radically. The article contains an anecdote that further supports this idea:

DJT [one of the subjects and experimenters] performed an experiment in which 4 contrasts of grating were chosen that were close together whilst still allowing near-perfect identification performance over 50 trials of each: 1, 8, 18 and 27 dB. [Note from NB: this is a different way of quantifying contrast than the percentages used here.] In the 50 trials of each contrast, 1 error of identification was made for each of the 8 and 18 dB gratings. Then, two more contrasts were added to the stimulus set at the lower end (40 and 50 dB); contrast 40 dB should have been easily discriminable from 27 dB. In fact, addition of contrasts 40 and 50 dB resulted in an increase in the errors of identification of the original set of four contrasts over 50 trials of each (8dB – 2 errors; 18dB – 9 errors; 27dB – 6 errors). (Chirimuuta & Tolhurst 2005a, p. 2965)

There are two notable aspects of this anecdote: first, performance over 50 trials of each of 4 contrasts were near perfect despite the fact that the gratings covered only part of the spectrum of contrasts. This suggests that the limit of 4 does not have to do with representations of contrast per se. The second aspect is that in this case as in so much of the work on working memory, adding more possibilities to a set of 4 decreases performance in the original set of 4. I conclude that the behavioral result probably has more to do with working memory than with any limit on perception.

Chirimuuta’s second result, the one that motivates the idea that visual representations of contrast are so indeterminate that only 4-5 levels of identification are possible, is the modeling result based partly on data from monkey V1 neurons. (V1 is the first cortical area that processes vision, the lowest level of the visual system.) The striking fact about this result is that it does not concern working memory at all or indeed any kind of memory. It is only concerned with perceptual representation in V1. The model of V1 neurons comes from another paper that is concerned with the “dipper function”, a notable curve shape in which one contrast stimulus is “masked”—diminished by the processing of another stimulus that follows right after it (Chirimuuta & Tolhurst 2005b). The model predicts that V1 can represent 4 contrasts perfectly with a sharp fall-off at 4, with a capacity to represent slightly more than 5 items.

However, the model based on V1 neurons gets some important facts wrong, for example it predicts poorer performance at high and low contrasts, whereas people actually do better at high and low contrasts. A version of the model with some postulated features that are not based on anything neural can get that right. However, this “curve fitting” approach deprives the model of the neurophysiological support that motivated the original model. Another problem with the model is that what is predicted is “mutual information” shared between contrast stimuli and V1 responses of 2.35 bits. Mutual information is a measure of shared information—in this case between stimuli and V1 neurons. A mutual information value of 2 bits would allow 2² (=4) contrast identifications; a mutual information value of 3 bits would allow 2³ (=8) identifications. This shared information, as Chirimuuta notes (Chirimuuta & Tolhurst 2005a, p. 2968), is “essentially looking at perfect, 100% performance.” For this reason, mutual information is not very useful as a psychophysical measure. And as Chirimuuta notes, its utility is limited for another reason: it is a compressive measure and so large increases in neural activity can be expected to make small differences in information. The issue of 100% performance is especially troublesome since in perceptual systems no performance can be perfect. In particular the convention for a “just noticeable difference” is distinguishability 75% of the time. So it is difficult to know how to compare the absolute identification level of 2.35 bits with a more visually sensible visual identification level.

Further, our experience seems to conflict with the idea that we have distinct visual representations of only 4-5 contrasts. A good reproduction of Figure 8 seems to reveal 6 phenomenologically different contrasts even though the figure covers only a third of the range of contrasts. And the Carrasco results apply to many different parameters, gap size, spatial frequency, etc. You might test it out if you happen to be near a brick wall. Look at the height of one brick, two bricks, three bricks and four bricks. If you are close enough so that those sizes look different from one another, ask yourself whether there are other sizes that look different from all four of those sizes. If Chirimuuta’s result applies more widely, the answer is no. It has to be said though that that sense of distinctness could be due to discriminatory abilities.

Whatever the facts are about how precise foveal attentive perception is, the next section presents evidence that it is not more precise than inattentive peripheral perception.