6 Inference: Beyond the input

In the phenomenon of colour constancy we have already seen a hint of another visual function. Colour is not about the wavelength coming from objects. It is a property of objects that we infer from wavelengths. At some point, conscious perception starts to diverge from the mere physical properties of the input, in a process we call inference. There are many more examples of inference, and many visual illusions capitalize on the fact that our visual mechanisms are constantly trying to make sense of the world. Figure 4 shows the famous Kanizsa triangle. The minimal, strictly physical interpretation of the image is that of three Pac men pointed at each other and three arrowheads pointing outwards. But our perceptual interpretation goes beyond this, in that we see a white triangle hovering over three black circles, occluding another outlined triangle. The illusory triangle is seen as slightly brighter than its surround, and illusory contours mark its “borders”.

Image - figure004.jpgFigure 4: Left: the Kanizsa triangle. Note the illusory brightness increase inside the region of the illusory triangle. Middle: the 2D projection of a cube can in fact originate from a multitude of 3D objects. We regularly interpret it as a cube, however. Right: we see the woman as small, despite our cognitive ability to realize that “this cannot be true”. Our 3D “priors” force us to see her as small (from https://richardwiseman.wordpress.com/2009/09/09/great-table-illusion/).

This process of inference seems to strongly fit the intuitive difference between a camera and conscious vision. It requires the integration of multiple “pixels”, their interaction, and their interpretation beyond what is strictly given by the image itself. And it is in this last aspect in particular that prior knowledge about the world comes into play, and starts to interfere with the stimulus-driven feature-selective categorization of the input.

The Kanizsa triangle can be seen as a specific example of the more general propensity of the visual system to arrive at a representation of surfaces in 3D space (also called the 2.5D sketch). In that representation we seek the most natural interpretation, consistent with our existing experience of how things are in the world. It is simply much more likely that there is a triangle covering circles than that there are three Pac men that happen to be facing each other at exactly 60o angles. The triangle interpretation is generic, whereas the Pac men one would be accidental (Albert & Hoffman 2000). Nakayama & Shimojo (1992) studied various configurations of 3D stimuli, and found that our visual system always strives towards the interpretation that is most generic, i.e., that would least depend on an accidental viewing position of the observer. Interpretations that would not change when the observer happened to shift position are favoured, given that we are constantly moving relative to objects. For example, the 2D image of a cube can in fact arise from an infinite number of shapes (figure 4, middle), yet we tend to favour the “cube” interpretation because it is the most generic one.

Another way of putting it would be to say that the cube interpretation fits our common experience, in that most of the time, these kinds of 2D projections arise from regular 3D cubes: it is the most ecologically valid interpretation. In a modern guise, this aspect of inference is formalized as a Bayesian approach, where vision uses a set of prior probabilities to arrive at the most likely 3D interpretation of a 2D image. The cube has the highest prior, compared to the more irregular shapes. Illusions like the Ames room (where someone changes size when he walks from one corner to the other), or the size illusion shown in figure 4 (right) capitalize on these assumptions: we assume that rooms have rectangular floors and walls, we assume the woman is sitting on a chair. These assumptions are so strongly embedded in our visual hardware that even in the face of strange consequences, such as people growing in size within a few steps or a man holding his hand over a mini-woman, this inference is maintained.

Many more illusions display non-veridical inferences. In the Ebbinghaus illusion, the perceived size of a disk depends on the size of surrounding disks. In the Ponzo and Müller-Leyer illusions we see line segments as having different lengths, while in fact they are the same. These illusions show that the size of an object is an inference that we draw from its context, rather than from the space it occupies on the retina.

To what extent does inference depend on conscious vision? When we have to pick up the disks in the Ebbinghaus illusion, it appears that our hands open at a pre-grip aperture that is in accordance with the disk’s actual size, not its perceptual size. Apparently, size context effects influence perception, and not automatic action—which has led to the idea that we have two largely separate neural pathways, one transforming visual input into conscious perception, and the other translating visual input into automatically guided action (Goodale & Milner 1992).

There is more evidence linking perceptual inference to conscious vision. Harris (Harris et al. 2011) studied whether Kanizsa triangles were still inferred when the inducers were made invisible using CFS. The same setup was used that showed the presence of brightness illusions under CFS (see above). Subjects had to indicate whether the triangle in the suppressed eye was pointing left or right. They were at chance level, indicating that the Kanizsa-type inference depends on consciousness. Another study, however, found that Kanizsa triangles broke through CFS earlier than control stimuli with inducers pointed outwards (Wang et al. 2012), suggesting that Kanizsa-type inference does occur pre-consciously.

At the single neuron level, the detection of illusory contours has been studied quite extensively. Initially, it was found that V2 cells respond in an orientation-selective manner to Kanizsa-type illusory contours (Von der Heydt et al. 1984). More recently, other areas have been shown to be involved as well (Sáry et al. 2008)—area V4 in particular (Cox et al. 2013). And in human neuroimaging studies it was found that Kanizsa-type illusory contours activate many early visual areas (Seghier & Vuilleumier 2006). All these studies used awake animals or humans, so it is difficult to infer whether these responses depend on the conscious state.

Marcel studied the processing of illusory triangles in two blindsight patients. Two inducers were presented in the sighted hemi-field, while one critical inducer was presented in the blind field, either completing the triangle or not. Completed triangles were detected far above chance (~80%), while the detection of the inducer shape was at chance. Moreover, one of the subjects described the illusory triangles as “brighter”, “out there on the screen” and “on top of something” (Marcel 1998).

All in all, the relation between inference and consciousness is unclear, mostly because fairly little work has been done as yet to study the relation directly (i.e., to study the effect of consciousness manipulations on inference and its neural correlate), but also because much of the work that has been done focuses on a single (though very important) phenomenon: the Kanizsa triangle.