10 The function of conscious vision

Could it be that Gestalt grouping and figure-ground segregation (of textured images) only happen to go along with consciousness because they take more time; because they require more elaborate computations, not provided by the many dedicated feedforward pathways and modules of the brain? Normally, vision proceeds in a fast and feedforward fashion, where dedicated neurons detect features and categories. Using its hardwired connections, the visual system can swiftly detect the most relevant objects: food, mates, or dangerous animals. Some objects are more difficult to discern, and require prior knowledge or the computation of neighbourhood relations between image elements: food behind a leaf, a sweet versus a sour apple. That takes slightly more—but not too much more—time, because many of the required interactions are hardwired as well. They are hardwired because the visual system has been exposed to these “visual problems” very often, either during evolution or during visual experience. Then there are visual problems that are even more difficult: a camouflaged animal in a crowded forest (figure 7), only visible via subtle differences in overall texture or motion. In this case, all visual resources and mechanisms have to come to the rescue. Only by combining the input from many neurons in a versatile way can the visual “solution” be found. That may be the function of consciousness in the visual domain: to combine the otherwise unconscious modules and mechanisms in a flexible way so as to solve otherwise unresolvable visual problems leading to a second thesis that we may call:

The SUPER-property of phenomenal representations =Df neural representations require consciousness and invoke phenomenality as soon as what needs to be represented can no longer be represented by a single dedicated module or mechanism, yet requires the interaction of these modules so that a super-positioned representation emerges.

From the point of view of consciousness, a hierarchy of visual functions can then be made. This starts with largely unconscious feature detection and object categorization. These features start to influence each other, and are no longer treated independently, so that categories form that are about the relations between image items (base groupings, short range incremental grouping). With this, there is a transition from the physical properties of the visual input as they are presented to the sensor array to the meaning[38] of these properties (e.g., wavelength to colour). During these operations, features and categories are matched with our knowledge and expectations of the world, embedded in the anatomical organization of the visual cortex, aiding in the transformation from visual input towards meaning (inference). Finally, all this information is combined into an organized percept. The longer these operations take, the more distance has to be travelled in the brain, and the more conscious these operations become.[39]

If nothing interferes, the visual system will always strive towards optimally integrating the available information, so that the richest interpretation of all available information is achieved, and all features have been detected, all inferences have been made, all image elements are combined and all potential ambiguities have been resolved. If this process is cut short, for example by masking or a TMS pulse (Pascual-Leone & Walsh 2001; Silvanto et al. 2005), there is no integrated end-result. And seemingly there is no conscious sensation either. Regardless of this, many features have still been detected, many inferences have been made, and the brain can use this information to achieve its goals. Behaviour may be influenced, or set into motion (Dehaene et al. 1998). Priming will occur, as well as all sorts of unconscious cognition (Van Gaal & Lamme 2012). Without consciousness, and without maximal integration, the visual system is far from helpless. It can do less, but it can still do a lot.

From this perspective, the function of consciousness in vision is just to enable that last push. That is, to resolve the visual issues that cannot be dealt with otherwise.[40] And with that, visual functions grow more complex, and evolve from their basic form into more sophisticated versions. A good example comes—once again—from the processing of faces. The core property of face-selective neurons is to respond in a category-selective manner: they distinguish between faces and other objects. They do so from the very first action-potentials that are fired. At that moment, however, category specificity is still very basic, in the sense that all types of faces evoke a similar response (Rolls 1992). At a later moment in time, however, responses typically become more and more specific. In the monkey visual cortex, face cells distinguish between different viewpoints and different emotional expressions of faces with a delay of about 50 milliseconds relative to the categorical face/non-face response (Sugase et al. 1999). View invariant identity representations arise even later, with a delay of about 200 ms (Freiwald & Tsao 2010). At these delays, the face-selective neurons will have established recurrent interactions with lower (and higher) level neurons across the brain, allowing for these more sophisticated classifications to be expressed in the response.

We may thus conclude that face recognition “as we know it”—i.e., not just categorizing face versus non-face, but seeing that face, knowing what it looks like, who it is, and what emotion it carries—is a visual function tightly linked to conscious rather than unconscious vision. The main reason for this lies in the fact that in conscious recognition we go beyond simple categorization, and move towards a function where the integration of all possible information about that face (its viewpoint, colour, identity, emotional expression, etc.) is required.

This may raise the question of how we then become conscious of an extremely simple stimulus, such as an oriented black line on a completely white background. With such a simple stimulus, there seems to be no need for any elaborate binding, incremental grouping, or inference. Neurons in the primary visual cortex can detect the line and its orientation within a few action potentials. There seems to be no need to call in the functions that are enabled by conscious processing. So why is it, then, that we still see the black line on the white background?

First, it should be noted that the notion of “simple” stimuli is more complex than one would expect. For example, it was shown that subjects can rapidly detect animals or vehicles in complex natural scenes, even when their attention is simultaneously focused on another task. Discriminating large T’s from L’s, or bisected colour disks from their mirror images was impossible under the same dual task paradigm. Apparently, seemingly simple letter or disk stimuli require more attentive processing than seemingly complex natural scenes (Li et al. 2002), suggesting that they take longer and more elaborate processing. In blindsight, subjects can discriminate lines of different orientations, suggesting that conscious processing is not required for these simple stimuli. However, discrimination performance—although above chance—is typically worse than for consciously-seen line segments, suggesting that something is “missing” from the neural representations formed in blindsight compared to those in conscious vision.

So what might the more elaborate processing steps that lift the unconscious representation of a black line towards a conscious representation of that line be? First, it is known that neurons in many visual areas beyond V1 respond to orientated line segments. At each level, receptive fields, and hence spatial frequency preferences, differ. This means that (the orientation of) the line segment is represented at many different spatial scales across the visual cortex. Only the integration of these differently-scaled representations, via recurrent interactions, yields a precise and conscious representation. The same holds for other properties of the “simple” line segment, such as its colour, its depth, and its relation to the background.[41] Indeed, oriented lines are fairly easy to mask (in fact easier than faces), indicating that their conscious percept depends on more elaborate processing steps than expected for such a simple stimulus.