THE USE OF STEREOSCOPIC CUES IN THE PERCEPTION OF NOISE MASKED IMAGES OF NATURAL OBJECTS

Size: px
Start display at page:

Download "THE USE OF STEREOSCOPIC CUES IN THE PERCEPTION OF NOISE MASKED IMAGES OF NATURAL OBJECTS"

Transcription

1 THE USE OF STEREOSCOPIC CUES IN THE PERCEPTION OF NOISE MASKED IMAGES OF NATURAL OBJECTS by Stephan de la Rosa A thesis submitted in conformity with the requirements for the degree of Doctor in Philosophy, Graduate Department of Psychology University of Toronto Copyright by Stephan de la Rosa 2008

2 THE USE OF STEREOSCOPIC CUES IN THE PERCEPTION OF NOISE MASKED IMAGES OF NATURAL OBJECTS by Stephan de la Rosa Degree of Doctor of Philosophy Graduate Department of Psychology University of Toronto 2008 Abstract When seen through a stereoscope, a Gabor pattern (a Gaussian enveloped sinusoid) that is masked by visual noise is more readily detectable when it appears in front of or behind the noise than when it is embedded in the noise itself. The enhanced visibility brought about by stereo cues is referred to as binocular unmasking. In this work, we investigated whether binocular unmasking may also occur with visual objects more complex than simple Gabor patterns, and with tasks more demanding than detection. Specifically, we examined the effects of binocular unmasking in the detection, categorization, and identification of noise masked images of natural objects. We observed the occurrence of binocular unmasking in all three tasks. However, the size of this effect was greater for detection performance than for categorization or identification performance; the latter two benefited to the same extent by the availability of stereoscopic cues. We argue that these results suggest that low level stereoscopic depth cues may play a helpful role, not only in simple detection tasks with psychophysical stimuli, but also in the perception of complex stimuli depicting natural objects. ii

3 Acknowledgements I would like to thank my supervisors and members of my committee for their excellent support. I am extremely grateful to Giampaolo Moraglia for his ideas, support, advice, and patience. I am deeply indebted to Bruce Schneider for his support, knowledge, advice, and patience throughout the course of my thesis and research. I also would like to thank Eyal Reingold many times for his helpful comments, ideas, support, and understanding. It was a pleasure to work with all of them. I would like to thank Payam Ezzatian and Sabrena Donovan for their great help by participating in some of the experiments. I also would like to thank all the people who helped me with their friendship and encouragement along my way. Especially, I would like to thank Rabia Choudhery, Antje Heinrich, and Jennifer Coelho for their great support. iii

4 Table of Contents Object perception and binocular unmasking...1 Stereoscopic depth cues enhance the detection of noise masked psychophysical stimuli 2 Do stereoscopic depth cues also improve the perception of noise masked visual objects? 4 Object perception is not a unitary process...6 Purpose of the present study...9 Binocular unmasking...9 THE INTEROCULAR PHASE DIFFERENCE MODEL OF HENNING AND HERTZ...9 BINOCULAR UNMASKING AND BINOCULAR SUMMATION...10 The binocular summation model...17 Psychophysical and physiological evidence for the binocular summation model...30 Evidence for binocular summation from behavioral studies...31 Experiment I: Binocular unmasking and object detection...38 METHODS...42 RESULTS AND DISCUSSION...48 Experiment II Binocular unmasking and object categorization...59 METHODS...60 RESULTS AND DISCUSSION...61 Experiment III binocular unmasking and object identification...68 METHODS...69 RESULTS AND DISCUSSION...69 General discussion...73 Future directions...91 References...94 iv

5 List of Figures Figure 1. Examples of the two viewing conditions in a binocular unmasking experiment Figure 2. Example of a typical result (simulated) of a binocular unmasking experiment...16 Figure 3. Schematic outline of the binocular summation model in the N0 condition Figure 4. The binocular summation process in the Nd condition Figure 5. An example of the summed noise power spectrum when the noise is shifted horizontally by dx=13.52 arcmin Figure 6. The noise power spectrum as a function of noise disparity Figure 7. Detection accuracy as a function of S/N ratio in the 6.76 arcmin condition of Experiment I Figure 8. Detection accuracy as a function of S/N ratio in the arcmin condition of Experiment I Figure 9. Mean BMLDs of Experiment I shown for each filter condition and object separately Figure 10. Categorization accuracy as a function of S/N ratio in Experiment II Figure 11. Mean BMLDs of Experiment II shown for each filter condition and disparity condition separately Figure 12. Identification accuracy as a function of S/N ratio in Experiment III Figure 13. Mean BMLDs in the object identification task listed for each filter condition separately Figure 14. Predictions of the binocular summation model in Experiment I Figure 15. Predicted and obtained BMLDs in Experiment I when using an upper limit for the internal noise...85 v

6 List of Appendices Appendix STIMULI USED IN THE EXPERIMENTS I-III Objects: 6.76 arcmin condition Objects: arcmin condition Objects: arcmin condition Objects: arcmin condition Faces: arcmin condition vi

7 1 Object perception and binocular unmasking Under most viewing conditions humans can detect, categorize, and identify objects easily and speedily. Consider, for example, the study by Thorpe, Fize, and Marlot (1996) in which images of natural scenes were presented for just 20 ms, the observers task being to decide whether or not a scene contained an animal. Thorpe et al. found that approximately 150 ms after the image was presented, observers event related potentials (ERP) differed depending upon whether or not the scene contained an animal. This suggests that within 150 ms the visual system had been able to process the visual scene sufficiently to categorize objects in the image. There are, however, several viewing conditions that are especially demanding for the human visual system. In these conditions, the perception of objects is more difficult. Certain backgrounds for instance can make an object less visible (visual masking). An extreme example of visual masking in a natural scene is animal mimicry. Here the animal s skin mimics the visual patterns found in the animal s environment. A common interpretation of the function of mimicry is that, by helping the animal blend into the background, it increases the chances that a prey will be overlooked by a predator, and therefore the preys chances of survival are increased (e.g. Julesz, 1964). In the human realm, camouflage is employed, for example, to hide ground targets from aerial reconnaissance. The military also uses camouflage in their battle dresses to decrease the visibility of the soldiers wearing them.

8 2 Stereoscopic depth cues enhance the detection of noise masked psychophysical stimuli Julesz (1964) suggested that the ability of the visual system to perceive the environment s three-dimensionality can improve the perception of masked objects. In particular, the visual system might be able to use the information about object depth in a three-dimensional (3-D) scene to improve object perception. Julez (1964) showed that objects that are rendered nearly invisible by their background become visible if they are located on a different depth plane than the background. To understand how three-dimensional spatial separation improves the perception of masked objects, it is important to recall that the two eyes are separated by approximately 6.5 cm. Hence, each eye receives an image of the visual scene from a slightly different vantage point. The left and right eye retinal images of objects that are located off the fixation plane are therefore shifted relative to each other. This retinal shift is referred to as retinal disparity. Along with monocular depth cues, the visual system uses retinal disparity cues to construct a three-dimensional percept of the visual scene (stereoscopic vision). Indeed, Julesz (1964) has shown that masked objects can be detected and recognized solely on the basis of retinal disparity cues. A powerful demonstration of this feat of the visual system can be seen using a random-dot stereogram (RDS) (Julesz, 1964). A RDS consists of two patches of identical visual noise (a pattern whose pixel luminance is varying randomly from one dot to the next) presented separately to the two eyes. Within one of the two visual noise patterns

9 3 a limited area, e.g. a square, is shifted laterally by a small amount. The square is not visible to the observer when the two patches are placed side-by-side and viewed normally (with two eyes, as in binoptic vision). The square however becomes visible when the patches are viewed through a stereoscope. The stereoscope ensures that each patch is presented to only one eye with the patches being presented to corresponding points (dichoptic vision). When viewed dichoptically, the shifted square area is immediately perceived as a square that appears on a different depth plane than the background. RDS research, thus, shows that the visual system is able to use retinal disparity cues to improve the perception of an otherwise hard-to-see or impossible-to-see object. Several other lines of evidence suggest that the relative retinal shift between the retinal image of the object and the retinal image of the background is important for improved object perception (Westheimer, 1979; Foulkes & Parker, 2003). RDS-based research, which demonstrates that the perception of masked objects improves in the presence of relative disparity cues (dichoptic condition) compared to when these cues are absent (binoptic condition), can therefore be seen as an example of binocular unmasking. Several studies have investigated the nature of the visual mechanisms underlying binocular unmasking. These studies investigated the detection of simple psychophysical stimuli, such as a two-dimensional sinusoidal pattern (grating) masked by visual noise. Based on the results of these experiments, several models have been developed to explain how retinal disparity cues might contribute to improved target detection (e.g. Henning & Hertz, 1973; Schneider,

10 4 Moraglia & Jepson, 1989). These models have been shown to accurately describe the binocular unmasking of simple psychophysical stimuli. The most comprehensively tested model is the binocular summation model developed by Schneider, Moraglia and colleagues (e.g. Schneider et al. 1994; Moraglia & Schneider, 1990). In short, this model proposes that low-level perceptual cues are responsible for binocular unmasking. In particular, binocular unmasking depends on the spatial frequency composition of both target and background in relation to their relative disparity. This model will be outlined in detail later. Do stereoscopic depth cues also improve the perception of noise masked visual objects? Although the binocular summation model has been shown to accurately describe and predict conditions of binocular unmasking for simple stimuli, it is not known whether it is also capable of explaining the perception of masked, images portraying natural objects. Generally, low-level visual-processing models such as the binocular summation model would be expected to affect the perception of masked, objects images to the extent that the perception of such objects depended on these processes. There are, however, several reasons to believe that the perception of natural object images involves additional visual and cognitive levels of processing that are not required for the detection and identification of very simple patterns. First, on a perceptual level, images of objects are geometrically and spectrally more complex than the hitherto used simple patterns. Because object images are visually more complex, their perception might require additional processes that are not employed in the

11 5 detection of a grating. For example, influential theories of object recognition suggest that objects are perceived on the basis of generic volumetric (3-D) primitives and the spatial relationships among them (Biederman, 1985; Marr & Nishihara, 1978). Psychophysical stimuli such as sinusoidal gratings contain no volumetric primitives, and therefore are unlikely to engage any higher-order visual processes that might be crucial for the detection and recognition of complex objects. Therefore, it is possible, and perhaps likely, that retinal disparity cues may not unmask complex objects to the same degree that is observed for simple stimuli. Secondly, on a more cognitive level, objects belong to categories and may have individual identities. Object agnosia, and, in particular, associative agnosia, suggest that associating a meaning with an object is an integral part of object perception. Persons with associative object agnosia find it difficult to recognize objects despite intact sensory functioning and working memory (Farah, 1990). For example, some patients were able to correctly copy a drawing but were unable to report what they had drawn (see Farah 1990 for a review). These findings suggest that higher cognitive processes might be selectively involved in the perception in images showing objects but not in the perception of simple psychophysical stimuli. Hence it is not clear how the binocular unmasking of simple patterns might or might not aid the perception of more complex images of objects.

12 6 Object perception is not a unitary process Clearly, object perception is not a unitary process but consists of a class of different processes. However, the visual processes that are involved in object perception are not fully understood (for a recent review see Peissig & Tarr, 2007). Different theories of object perception define different stages of visual processes involved in object perception (e.g. Marr & Nishihara, 1978; Biederman, 1985). However, common to these theories is a primary stage concerning the extraction/identification of primitive features/elements. In a later stage the features/elements are then combined into general intermediate object representations (e.g. geons or cones). Eventually the object is identified. Along these lines, Grill-Spector and Kanwisher (2005) recently subdivided object perception into object detection, object categorization, and object identification. To clarify the meaning of each term, suppose you are driving a car on a busy city street. At any one moment the visual system detects object in the environment, e.g. objects that are moving in, around, and out of your visual field. Object categorization is required to classify the approaching object as a person who is stepping on a street or as an empty bag that the wind blew on the road. Finally object identification enables the visual system to identify the approaching person as your friend who is waving at you. Note that object categorization in this regard refers to the identification of an object on a basic level, e.g. dog, while object identification is the identification of an object on a subordinate level, e.g. your neighbor s dog.

13 7 Several other researchers have implicitly distinguished between early and late visual processing of objects in a similar manner (e.g. Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976; Jolicoeur, Gluck, & Kosslyn, 1984; Nakayama, He, & Shimojo, 1995), with the earlier processes being associated with detection and the later processes with recognition. For example, Nakayama and colleagues suggested that in vision an object needs to be separated from its background before it can be recognized (He & Nakayama, 1992; He & Nakayama, 1995; Nakayama et al., 1995), but not before it can be detected. Some evidence that object categorization and object detection occur at subsequent stages of visual processing comes from Rosch et al. (1976). They found participants reaction time to be shorter when they had to report the basic level than the subordinate level of a familiar object. Rosch et al. suggested that this difference in reaction time reflects the fact that detection is carried out at an earlier stage of visual processing than object categorization. A more systematic investigation into the stages of object perception was recently conducted by Grill-Spector and Kanwisher (2005). They asked participants to detect, categorize, and identify objects while recording their reaction time. Grill-Spector and Kanwisher found that participants were able to categorize an object as speedily and as accurately as they could detect it. Participants however needed more time to identify an object. Taken together these results suggest that object identification is different from object categorization and object detection. However object detection and object categorization might be mediated by the same mechanisms. Their result is partly

14 8 in contrast to other suggestions which propose that object detection occurs before object categorization or identification (e.g. Nakayama et al. 1995). Object detection and object categorization, however, might be difficult to separate in the detection and recognition tasks used in the above studies. These tasks were generally conducted on a suprathreshold level. Note, however, that object detection and object categorization mechanisms might require different amounts of information about an object to operate successfully. Hence detection and categorization might differ with regards to their sensitivity. Specifically if an object were presented at the threshold of visibility, processes involved in object detection might require less information about the object to operate accurately than processes involved in object categorization. A presentation of object images at a suprathreshold level in detection and recognition tasks might be insensitive to these differences because there is a super-abundance of information. A more fruitful approach therefore might be to investigate object detection and object categorization at threshold levels. It seems reasonable to assume that the presence of an object within an array of visual noise can be inferred when only a part of the object becomes visible. However, this part of the object might not be sufficient to categorize, let alone identify, the object. Unfortunately, little is known about the dissociation of object detection and recognition in visual masking experiments. To this writer s knowledge no study has explicitly addressed this question.

15 9 Based on these theoretical considerations it seems reasonable to subdivide object perception into detection, categorization, and identification, and to investigate each of these stages to determine how they are related. Purpose of the present study The aim of the present thesis was to investigate whether stereoscopic depth cues are beneficial in the perception of noise masked images of natural objects. Specifically we were interested in whether the binocular summation model makes accurate predictions for the detection, categorization and identification of such objects. In Experiment I we investigated whether we find evidence for binocular unmasking with images depicting natural objects in an object detection task. Experiment II examined binocular unmasking in an object categorization task, and lastly Experiment III studied binocular unmasking in identification tasks. We will first review the relevant binocular unmasking literature. Binocular unmasking The interocular phase difference model of Henning and Hertz Henning and Hertz (1973) were the first to conduct binocular unmasking experiments in which a grating was masked by visual noise. Participants detected a vertical sinusoidal grating against a background of one-dimensional visual noise. The one-dimensional visual noise had the same center frequency

16 10 and orientation as the target grating. Perceptually the noise looked like a grating whose contrast and phase randomly varied over space and time. Participants detected a target grating in a two-interval-forced-choice task (2IFC) under two different viewing conditions. In the first condition the grating was identical in both eyes, whereas the grating was shifted 180 out-of-phase between the eyes in the second condition. The results showed that the grating was more easily detected when it had an interocular phase shift of 180 than when it was identical in both eyes. Henning and Hertz (1973) explained their results with a phase vector model. In particular they argued that target detection in their binocular unmasking experiment was based upon the presence of interocular phase differences. That is, when the target was phase shifted between the two eyes, the target interval contained interocular phase differences whereas the non-target interval did not. Henning and Hertz suggested that the visual system might have used the presence of this interocular phase differences in the out-of-phase condition for improving target detection. Because this cue was absent when the gratings were identical in both eyes, grating detection was more difficult in this condition than in the out-of-phase condition. Binocular unmasking and binocular summation Schneider and Moraglia (Schneider et al. 1989, Moraglia & Schneider, 1990; 1991; 1992; Schneider & Moraglia, 1992; 1994; Speranza, Moraglia, & Schneider, 1995; Schneider, Moraglia, & Speranza, 1999; Speranza, Moraglia, & Schneider, 2001) investigated binocular unmasking using a slightly different

17 11 experimental design. In contrast to Henning and Hertz, they used static twodimensional band-limited visual noise. This noise has energy over a wide range of spatial frequencies with the amplitude, phase, and orientation being random across spatial frequencies. Despite this difference both experimental designs provide similar results (Moraglia & Schneider, 1990). In their standard paradigm, Schneider, Moraglia and colleagues require that observers detect a Gabor signal (a 2-D Gaussian enveloped sinusoidal grating) in the presence of broadband two-dimensional Gaussian noise under two different viewing conditions. In the first condition a noise frame appears on the fixation plane and the noise (N) and the Gabor signal (S) are presented off the fixation plane (Figure 1A). Note that the depth effect for the noise and the Gabor is achieved by shifting both patterns by the same amount in one of the eyes. Because the retinal disparity is the same for signal and noise, no relative disparity is present between them. Due to the lack of the relative disparity cue, this condition is referred to as the binoptic condition or NdSd condition (the subscript denotes the disparity). In the second condition, both the signal and the noise frame are presented on the fixation plane, while the noise is presented off the fixation plane (Figure 1B). That is, the noise is presented at disparity d but the Gabor is presented on zero disparity (d=0) (NdS0). Because the differential retinal shift of signal and noise induces a relative disparity d between Gabor and visual noise, this condition is referred to as dichoptic. Using the method of constant stimuli, participants psychometric functions are obtained under both viewing conditions using a two-interval forced-choice

18 12 (2IFC) task. That is, participants see two temporally separated intervals of which only one contains the Gabor (in a few studies the intervals were defined spatially rather than temporally). The participant s task is to identify the interval that contains the target. The contrast of the Gabor presented on a trial is randomly selected from among four or five predetermined contrast values. The participant s accuracy in detecting the Gabor at each contrast level is measured. The psychometric functions are obtained by plotting accuracy as a function of signalto-noise ratio (S/N ratio) for both the binoptic and dichtopic condition. Typically it is found that the detection of the Gabor pattern is better in the dichoptic condition than in the binoptic condition. In particular, the psychometric function in the dichoptic condition is shifted towards lower S/N ratios relative to the psychometric function in the binoptic condition (Figure 2A). Because both conditions are identical except for the presence of the relative disparity between the signal and the noise in the dichoptic condition, the better detection performance in the dichoptic condition can be attributed to the presence of these stereoscopic depth cues. Hence, the usefulness of relative disparity cues can be assessed by measuring the difference between the psychometric functions of the binoptic and dichoptic condition. The difference between the psychometric functions has been typically measured at the 75% accuracy level in a 2IFC task. This detection threshold difference is known as binocular masking level difference (BMLD) and is measured in db BMLD(dB) = 20*log10[(S/N)NdSd / (S/N)NdS0]. (1)

19 13 In several experiments Schneider, Moraglia and colleagues found that the presence of the relative disparity cue improved detection performance by approximately 6-12 db. Importantly, binocular unmasking experiments show that the presence of relative disparity cues alone is essential but not sufficient for binocular unmasking. For example, in most binocular unmasking experiments some but by no means not all Gabors benefit from the presence of the stereoscopic depth cue in the dichoptic condition (see also Figure 2A and 2B). Moraglia and Schneider proposed that binocular unmasking depends on three factors: the amount of relative disparity between the signal and the noise, the spatial frequency content of the mask, and the spatial frequency content of the target. They described the relationship of these variables and how they affect binocular unmasking in their binocular summation model. This model is able to account for most of the results obtained from binocular unmasking experiments.

20 14 A B Figure 1. Examples of the two viewing conditions in a binocular unmasking experiment. The noise frame marks the fixation plane. A: In the binoptic condition (NdSd) both the signal and the noise are presented with the same disparity. Hence, no relative disparity cue is present between the signal and the noise. B: In the dichoptic condition (NdS0) the noise is presented with a disparity d while the signal appears on the fixation plane. Hence the noise and the signal are separated by a disparity d. The display in both conditions is viewed through a stereoscope.

21 15 A B

22 16 Figure 2. A. Example of a typical result (simulated) of a binocular unmasking experiment (d=13.52 arcmin, Gabor s spatial frequency: 2.2 cpd). The psychometric function of the dichoptic condition (solid line) is shifted towards lower signal-to-noise ratios relative to the psychometric function of the binoptic condition (dashed line). The difference between these two psychometric functions is measured at the 75% accuracy threshold and is referred to as the binocular masking level difference (BMLD). B. Simulated predicted results of a binocular unmasking experiment where the disparity is d=13.52 arcmin and the Gabor s spatial frequency is 4.4 cpd. The binocular summation model predicts that no binocular unmasking occurs under these viewing conditions (see text for details). Hence the psychometric curves of the dichoptic and binoptic conditions should be almost identical. For the sake of clarity, the gratings shown in Figures A and B portray suprathreshold gratings and do not depict actual threshold gratings of the respective viewing condition.

23 17 The binocular summation model To help understanding how the binocular summation model explains the finding that stereoscopic depth cues are necessary but not sufficient for binocular unmasking to occur, imagine a viewing condition that has not been discussed so far. Assume that only the noise frame and the noise patch are presented but not the Gabor. Unlike in the binocular experiments described above it is assumed that the noise patch is presented on the fixation plane N0 (Figure 3). Note that when the noise is presented on the fixation plane, the left and right eye retinal images of the noise patch are at corresponding retinal coordinates. According to the binocular summation model, at the first stage of visual processing monocular channels sample visual information from a confined area of the retina (receptive field) and decompose the visual information into its spatial frequency components. These monocular channels exist across the entire retina. Importantly each monocular channel is narrow-band and has a preferred peak spatial frequency and orientation (see also Figure 3 for three examples of these monocular channels each tuned to a different combination of spatial frequency and orientation). In the following we consider only the middle left- and right-eye monocular channels of Figure 3 that are tuned to a vertical spatial frequency. The binocular summation model suggests that monocular channels tuned to the same spatial frequency and orientation converge on a binocular disparity channel. In our example the information of the vertical monocular channels is passed on to a vertical disparity channel. Because both monocular channels

24 18 sample from corresponding retinal points the disparity channel onto which these two channels converge is tuned to zero disparity. The binocular summation model also assumes that other disparity channels exist which receive their input from monocular channels that have their receptive fields at non-corresponding retinal locations. These disparity channels are consequently tuned to other retinal disparities. The disparity channels linearly sum the output of the two monocular channels. Consider the consequences of this summation process in our example. Because the left and the right eye monocular channels sample from the same retinal location, their respective spatial frequency components will be identical. If two identical spatial frequency components are summed, the resultant component has an amplitude that is twice as large as those of its components. Consider next what happens to the monocular channels output in the zero disparity channel if the noise is presented off the fixation plane (see Figure 4) Note that this condition is similar to a binocular unmasking experiment.

25 19 Figure 3. Schematic outline of the binocular summation model in the N0 condition. The left and right eye images of the noise patch in this condition are projected onto corresponding retinal locations in this condition. The white disc marks the receptive fields from which the shown monocular channels sample. Monocular channels decompose the visual information from their receptive field into spatial frequency components. Three examples of monocular channels are shown; each of them is tuned to a different combination of spatial frequency and orientation. The figure shows how the spatial frequency components from the middle monocular channels are passed on to a disparity channel (here the zero disparity channel), which linearly sums the two components. For more information see text.

26 20 Suppose that the right eye retinal noise image is shifted laterally by the amount dx (Nd condition). Let this shift be identical to half of the wavelength of the vertical monocular channel s preferred spatial frequency (see Figure 4). Due to this shift, the spatial frequency component of the right-eye monocular channel is shifted by half of its wavelength. Again, the left- and right-eye monocular channels output converges on the zero disparity channel which sums the two spatial frequency components. Unlike the N0 condition, the spatial frequency of the Gabor in the right monocular channel is shifted relative to the left monocular channel by half of its wavelength. Hence the left and right eye spatial frequency components are 180 out-of-phase. When these two spatial frequency components are summed up in the zero disparity channel during binocular summation they will cancel. Consequently the zero disparity channel will have little activation. Remember that visual noise has energy over a wide range of spatial frequencies with the amplitude, phase, and orientation being random across spatial frequencies. Assume that these spatial frequency components are only summed by a zero disparity channel. In this case, the spatial frequency components of the noise will cancel in those disparity channels whose preferred spatial frequency is an odd multiple integer of the retinal shift dx: frequency cancellation when f = n/(2dx), n is odd integer (2.a)

27 21 Figure 4. The binocular summation process in the Nd condition. In this condition the retinal image in the right eye is shifted by the amount dx. Suppose that the amount of this shift is identical to half of the wavelength of the monocular channel s preferred spatial frequency. In this case the output of the monocular channel will cancel during binocular summation in the zero-disparity channel.

28 22 Likewise, the spatial frequency components of the noise will completely sum in those disparity channels whose preferred spatial frequency is equal to an even multiple of retinal shift dx: frequency summation when f = n/(2dx), n is even integer (2.B) As a result the summed noise power spectrum will have peaks and notches (see e.g. Figure 5). The notches occur at spatial frequencies according to equation 2.A and peaks occur at spatial frequencies according to equation 2.B. More specifically the summed noise power spectrum can be described as follows. Let the power spectral density function of the band-limited twodimensional Gaussian noise be given by Ag, ε 0 ε 0 ε 0 G(ε,η ) = 0, elsewhere and η 0 η 0 η 0, (3) where Ag is the spectrum level, ε, η are the horizontal and vertical spatial frequency components, and ε0, η0 are the spatial frequency limits of the band limited noise. Further let g(x,y) be the left and g(x+dx, y+dy) the right luminance patterns of the noise, where dx, dy refers to the horizontal and vertical shift of the noise respectively. The sum of these luminance patterns is then described by the spectral density function F(ε, η) F(ε, η)= G(ε, η) [2 + 2 cos(2 π ε dx + 2 π η dy)] (4), where ε, η are the horizontal and vertical spatial frequency variables respectively, and dx, dy represent the horizontal and vertical shifts in degrees respectively. An example of a summed noise power spectral density function is -

29 23 shown in Figure 5. This noise was presented with a horizontal disparity of dx=13.52 arcmin. As can be seen, the power of the noise varies sinusoidally along the horizontal spatial frequency axis. With this in mind it can now be easily shown how the binocular summation experiment explains binocular unmasking. Assume that in the dichoptic condition of a binocular unmasking experiment the Gabor has a peak spatial frequency of cpd and the noise is presented with a disparity of arcmin. Because the retinal shift of the noise is dx=13.52 arcmin, which corresponds to half the wavelength of the Gabor the noise will cancel in the zero disparity channel tuned to cpd (see Figure 5). The Gabor however is presented on fixation and therefore its left- and right-eye spatial frequency components are not shifted relative to each other. Thus, the spatial frequency components of the Gabor will completely sum up in the zero disparity channel tuned to cpd. Because the noise s spatial frequency components will cancel in this channel while the Gabor s spatial frequency components will completely add up, the signal-to-noise ratio in the zero disparity channel tuned to cpd is better than in the respective monocular channels (see Figure 2A) (for simplicity we will round spatial frequencies to the first digit from now on). Consider now the binoptic condition of this binocular unmasking experiment. Because both the noise and the Gabor are presented with a disparity of dx=13.52 arcmin, the spatial frequency components of 2.2 cpd will cancel in both noise and Gabor signal in the zero disparity tuned to 2.2 cpd.

30 24 Figure 5. An example of the summed noise power spectrum when the noise is shifted horizontally by dx=13.52 arcmin. According to equation 2A and 2B the noise should have notches at 2.2 and 6.6. cpd and peaks at 4.4 and 8.8 cpd. Higher spatial frequency components are not shown.

31 25 Because the spatial frequency components in the noise and the Gabor cancel, the detection of the target will depend upon the signal-to-noise ratio of the monocular channels. Due to the better signal-to-noise ratio in the zero disparity channel in the NdS0 condition, detection performance should be better in the dichoptic viewing condition than in the binoptic viewing condition. To understand why the sole presence of stereoscopic depth cues is not sufficient for the occurrence of binocular unmasking consider a Gabor of 4.4 cpd. In the dichoptic condition the spatial frequency components of the Gabor will add up in the zero disparity channel tuned to 4.4 cpd. The noise spatial frequency components however will also add up completely according to equation 2.B (see also Figure 5). Hence the spatial frequency components at 4.4 cpd of both the noise and the Gabor will completely add during binocular summation. The signalto-noise ratio is consequently the same as in the respective monocular channels. Similarly, the spatial frequency components at 4.4 cpd of both noise and Gabor will add in the binoptic condition. The resulting signal-to-noise ratio is the same as in the dichoptic condition. Detection performance in the dichoptic condition should be therefore no better than in the binoptic condition. Hence the presence of the stereoscopic depth cues between the Gabor and the noise patch does not facilitate Gabor detection when the Gabor has a spatial frequency of 4.4 cpd. Note that for the NdS0 condition, when the displacement of the noise is half the wavelength of the Gabor, the binocular summation model predicts that the noise will cancel while the signal will completely sum in the zero disparity channel. One would therefore expect that signal detection in this condition be

32 26 equal to the detection of the signal in the absence of visual noise. Clearly this assumption is unreasonable. To explain the higher Gabor detection thresholds in dichoptic viewing conditions, the binocular summation model assumes that independent internally-generated random noise is present in both the left- and right-eye monocular channel. This internal noise decreases the signal-to-noise ratio in the zero disparity channel and increases the detection thresholds in the dichoptic viewing condition. There is abundant evidence for the presence of internal noise in monocular and binocular channels (e.g. Campbell & Green, 1965; for a review see Blake & Fox, 1973 and Blake & Sloane, 1981). So far the discussion only concerned the zero disparity channels. The binocular summation model, however, assumes that several independent disparity channels (cyclopean eyes) exist each tuned to a particular spatial frequency, disparity, and orientation. Generally, disparity channels in which the Gabor s spatial frequencies do not completely sum will produce a lower S/N ratio than the zero disparity channel tuned to 2.2 cpd. For example, in the channel tuned to a disparity dx=13.52, the Gabor s spatial frequency components will cancel in the NdS0 condition because its left and right eye spatial frequency components are 180 out-of-phase. The noise spatial frequency components at 2.2 cpd however will completely sum. Hence this disparity channel has the lowest signal-to-noise ratio. The binocular summation model suggests that the visual system uses the disparity channel with the largest signal-to-noise ratio for target detection. Hence, the best detection strategy in the NdS0 condition in this

33 27 example is to monitor the zero disparity channel tuned to 2.2 cpd as it provides the highest signal-to-noise ratio. The above discussion considered only signal shifts along the horizontal axis. In everyday life vertical shifts are also likely to occur especially when slanted edges are shifted horizontally (Farell, 1998). The binocular summation model assumes that the visual system also has disparity channels that are tuned to vertical disparity or a combination of vertical and horizontal disparity. The same mechanisms that were outlined for the horizontally tuned disparity channels also apply to these disparity channels. Note that the shape of the summed noise power spectrum in a disparity channel depends on the disparity value of the noise, relative to the disparity of the channel. As can be seen in Figure 6, the number of notches, their location in the frequency space and their width change as a function of disparity. For instance, as disparity increases the number of the notches in the noise also increases while the width of the notches in the summed power spectrum decreases. Importantly, the width of the notches affects the binocular summation mechanism. To see why this is the case, note that disparity channels are assumed to have a half-power spatial frequency bandwidth of about one octave (Speranza, Moraglia, & Schneider, 1995). That is, the size of the notch decreases with increasing noise disparity while the disparity channel s bandwidth remains constant. For example, the disparity channel with a center frequency of 2.2 cpd is more activated by the noise in the arcmin disparity channel than in the arcmin condition (see Figure 6 middle and bottom graph). The

34 28 Power d=6.76 arcmin Power d=13.52 arcmin Power d=40.56 arcmin Figure 6. The noise power spectrum as a function of noise disparity. Shown are the summed noise power spectra (solid lines) for a noise disparity of 6.76, 13.52, and arcmin, respectively. The middle and the bottom panel also show a disparity channel (dashed line) tuned to 2.2 cpd.

35 29 signal-to-noise ratio in this disparity channel is therefore better when the noise is presented with arcmin than with arcmin. Hence detection performance should be worse with larger disparities than with smaller noise disparities. Moraglia and Schneider (1990) found that this is indeed the case. The binocular summation model is also able to explain Henning s and Hertz s (1973) results. As previously noted, these researchers presented a grating in front of a narrowband temporally modulated one-dimensional noise that had the same center frequency as the grating. To recapitulate, the noise was presented on zero disparity in both the binoptic and dichtopic conditions. The grating was in-phase between the two eyes in the binoptic condition (N0S0) and a 180 phase-shifted in the dichtopic condition (N0Sd). Henning and Hertz found that detection thresholds were smaller in the dichoptic than in the binoptic condition. These results can be understood with the binocular summation model as follows. In the N0S0 condition no binocular unmasking should occur because both the signal and the noise are presented at the same disparity. The summation process affects their respective spatial frequency components in the same way. Hence the signal-to-noise ratio will not be improved by binocular summation. This is not the case in N0Sd condition. Here the highest signal-tonoise ratio should occur in the disparity channel whose preferred spatial frequency is the same as the grating s spatial frequency and whose preferred disparity d equals half of the grating s wavelength. Because the grating has an interocular shift of 180 deg, the grating s left- and right-eye spatial frequency components reach this disparity channel in-phase. Thus, the grating s spatial

36 30 frequency components will completely sum in the disparity channel tuned to half of the grating s wavelength. On the other hand, the left- and right-eye s noise spatial frequency components will be 180 out-of-phase. Because they will cancel within this disparity channel, the detectability of the grating should be better in the N0Sd than in the N0S0 condition. This is what Henning and Hertz found (1973). Psychophysical and physiological evidence for the binocular summation model Several lines of evidence suggest that stereopsis is processed in spatial frequency and orientation selective disparity channels. Mayhew and Frisby (1976; see also Frisby, Mayhew, King-Smith, Marr, & Ruddock, 1978 for a discussion) found that depth is only seen in RDS when the spatial frequency content in left and right eye noise images is overlapping. For example, no depth was perceived in the RDS when one image of the RDS was low-pass filtered and the other one was high-pass filtered. This observation is in line with the idea that disparity channels sample from a similar spatial frequency range in the left and right eye. Furthermore, Parker, Johnston, Mansfield, & Yang (1991) and Farell (2006) provided evidence that disparity channels are orientation selective. For example, Parker et al. (1991) found that the masking of a line stereogram was reduced when the angle between masking noise elements was increased. The idea of orientation and spatial frequency selective disparity channels in early vision is further supported by results from physiological studies. Single cell recordings in the cats visual cortex found that disparity sensitive simple and

37 31 complex cells are selective for spatial frequency and orientation (Ohzawa & Freeman, 1986a; 1986b). Most simple cells and about 40% of the complex cells were sensitive to interocular phase differences. Both cell types were sensitive to either horizontal or vertical disparity. Furthermore all cells taken together were found to process the entire range of interocular phase differences up to 360. The response properties of these simple and complex cells in the cats visual cortex are clearly compatible with the requirements of the binocular summation model. In Ohzawa et al s model the visual pattern is convolved with the neuron s receptive field profile. This convolution is done for each eye separately; the two results are then added. The half-wave rectified output will roughly describe the neural firing of the cell (the assumption of half-wave rectification is important for physiological models since the number of action potentials released by a cell can only be positive and never be a negative). Similar to the binocular summation model the disparity selectivity in the linear spatial summation model arises from the interocular differences between the monocular receptive field locations. The binocular summation model parallels an important physiological model of stereopsis. Evidence for binocular summation from behavioral studies Binocular unmaking with horizontal disparities Binocular unmasking of horizontally oriented visual patterns seems to be relatively robust with respect to changes in the experimental procedure. Binocular unmasking was observed regardless of whether a fixation dot was presented (Henning & Hertz, 1973), the two intervals were temporally or spatially

38 32 separated (Moraglia & Schneider, 1990), the visual noise was one- or twodimensional (Henning & Hertz, 1973; Henning & Hertz, 1977; Moraglia & Schneider, 1990), the target was a Gabor or a grating (Henning & Hertz, 1973, 1977; Schneider et al. 1989), the interocular disparity was accompanied by perceived depth (Henning & Hertz, 1973); the display was surrounded by bars or a frame, the noise was presented on or off fixation (Henning & Hertz, 1973; Schneider et al. 1989), and the noise was temporally modulated or not (Henning & Hertz, 1977). Despite these differences binocular unmasking led to about a 6-12 db better detection of the target when target and noise had disparity differences such that signal-to-noise ratio was improved in one of the disparity channels compared to the signal-to-noise ratio in either monocular channel (Henning & Hertz, 1973; Henning & Hertz, 1977; Schneider et al., 1989; Moraglia & Schneider, 1990). Binocular unmasking was observed with targets with low and intermediate spatial frequencies. Schneider et al. (1989) found BMLDs of 6-12 db for Gabor signals of 1.1, 2.2, and 4.4 cycles per degree (cpd) at a constant noise disparity of arcmin. Henning and Hertz (1973) found that binocular unmasking decreased with increasing spatial frequency of the target gratings. The BMLDs in their experiment ranged from 5-15 db for gratings with spatial frequencies of 0.24, 0.6, and 2.4 cpd. When they, however, increased the target spatial frequency to 6 cpd, BMLDs dropped to about 0 db. It therefore seems that binocular summation might have an upper spatial frequency limit.

39 33 BMLDs of 6-12 db were also obtained by Moraglia and Schneider when they varied the noise disparity while holding the peak spatial frequency of their target constant at 2.2 cpd. BMLDs ranged between 5 and 12 db for noise disparities of 13.52, 40.56, and arcmin. However, as noted above one would expect BMLDs to decrease with increasing disparity. Moraglia and Schneider argued that the lack of this decline in their experiment might have been due to vergence eye movements in the arcmin condition that compensated for this effect. For example, if participants adjusted their eye position to arcmin, the noise would have an effective disparity of arcmin. Under these conditions the noise would have cancelled at the Gabor s frequency in the 2.2 cpd disparity channel. The Gabor, on the other hand, would appear after the vergence movements with a disparity of arcmin in this channel. Because this shift is equivalent to a shift of the Gabor by twice its wavelength, it should be completely summed within the 2.2 cpd disparity channel. Hence vergence eye movements could have produced a viewing condition in which the noise cancelled and the Gabor completely summed up. In fact, the stimuli in Moraglia and Schneider (1990) were presented long enough (1s) to allow vergence eye movements. In a replication of their experiment, Moraglia and Schneider presented their stimuli for only 90 ms - a period of time smaller than the time needed to complete vergence eye movements (Westheimer & Mitchell, 1956). The reduced exposure time did not decrease unmasking effect at arcmin but affected binocular unmasking to a larger degree at higher disparities. Overall it seems that there might be also an upper

40 34 disparity limit for binocular unmasking. This result is in line with the finding that stereoscopic system typically fuses retinal disparity with a magnitude of arcmin (Panum s area). Moraglia and Schneider (1990) further found that the direction of the disparity, whether crossed or uncrossed, does not have a material impact upon the size of the binocular unmasking. Binocular unmasking with vertical disparities Binocular unmasking was also found with vertical disparities of a magnitude similar to that observed with horizontal disparities across several spatial frequencies of the target (Henning & Hertz, 1973; 1977). Similar results were observed by Moraglia & Schneider (1991) for a vertical disparity of arcmin. Larger vertical noise disparities (40.56 and arcmin) however did not allow binocular unmasking (Moraglia & Schneider, 1991). The finding that Panum s area is larger along the horizontal meridian (32-40 arcmin) than along the vertical meridian (19-25 arcmin) (e.g. Qin, Takamatsu, & Nakashima, 2006) might explain this finding. Because it is reasonable to assume that binocular summation only operates within Panum s limit, one would therefore not expect a binocular advantage to occur for vertical disparities larger than arcmin. Binocular unmasking as a function of orientation When the noise pattern is shifted horizontally between the two eyes, the spectral density function for the summed noise patterns varies along only the

41 35 horizontal spatial frequency variable. Consequently, binocular unmasking should be dependent on changes of the horizontal spatial frequency of the Gabor but relatively independent from changes of the vertical spatial frequency of the Gabor. Likewise, when an interocular shift of the noise occurs in the vertical direction the noise power spectrum varies along the vertical but not along the horizontal spatial frequency axis. Here binocular unmasking should be dependent on changes of the vertical but relatively independent from changes of the horizontal spatial frequency. (Note that, binocular unmasking is not fully independent of changes in the orthogonal spatial frequency because the spatial frequencies channels have relatively broad bandwidths.) Thus, a Gabor whose orientation is perpendicular to the direction of the noise-shift should not be unmasked. In accordance with this prediction of the binocular summation model no binocular unmasking was found for the detection of vertical Gabors when the interocular noise shift was horizontal (Schneider et al., 1989) nor for the detection of horizontal Gabors when the noise shift was vertical (Moraglia & Schneider, 1991). The binocular summation model predicts that binocular unmasking should depend on the Gabor s orientation if the noise patterns have a horizontal and vertical interocular shift. The summed noise power spectrum of a noise that has been shifted interocularly both horizontally and vertically is given in equation (4). If the magnitude of the noise disparity is the same horizontally and vertically, the two-dimensional plot of the summed noise power spectrum shows peaks and notches oriented at -45 degrees. More specifically, peaks occur at ε = η and

42 36 notches at ε = -η. Given that dx=0.338 deg and dy=-0.338, Schneider and Moraglia (1994) showed that a Gabor oriented at 45 deg and with a spatial frequency of 1.05 cpd falls into one of these notches. Conversely, a Gabor of the same spatial frequency with an orientation of -45 deg falls on one of these hills in the summed power spectrum. The binocular summation model therefore predicts that the 45 deg but not the -45 deg oriented Gabor will be unmasked. The results confirmed this prediction (Schneider & Moraglia, 1994). Similarly when the vertical shift is in the opposite direction (dy=0.338), the binocular summation model predicts that the spatial frequency components of 45 deg oriented Gabor will fall onto a peak in the summed noise power spectrum. The spatial frequency components of the -45 deg oriented Gabor, however, will fall into a notch. Again the results supported this prediction. Binocular unmasking of multiple spatial frequencies Little is known about whether the binocular unmasking occurs with target signals that consist of more than one spatial frequency. The only study along these lines was carried out by Schneider, Moraglia, & Speranza (1999). They created several compound gratings by adding two gratings of different spatial frequencies. In the full unmasking condition, both frequencies of the compound gratings fell in a notch of the summed noise power spectrum and could be therefore unmasked. In the partial unmasking condition only the lower spatial frequency component fell into a notch while the higher spatial frequency component fell onto a peak in the summed noise power spectrum. According to

43 37 the binocular summation only the lower spatial frequency could be unmasked but not the higher one. By comparing these two conditions, Schneider et al. examined whether both spatial frequency components had to be unmasked in order for the visual system to identify the pattern more easily. To ensure that participants used both spatial frequencies of the Gabor in the task, binocular unmasking was tested using a discrimination task. Participants discriminated between a target and a foil. For the target, the phase relationship between the two frequency components was +45 and for the foil it was -45. For both the full and partial unmasking condition several targets and foils were generated which differed in terms of their phase-onsets. In all these targets and foils the above described phase relationship between the frequency components was preserved. On a pretrial an example of the target and the foil was shown with the target being presented spatially above the foil. No noise was shown on the pretrial. Then target and foil were presented in two temporally separated intervals along with visual noise. The contrast of only the lower frequency component was varied across trials to determine the 75% accuracy discrimination threshold. Schneider et al. found that that binocular unmasking advantage was seen with discrimination in the full but not in the partial unmasking condition. This result suggests that better phase discrimination performance is only observed if both spatial frequency components are unmasked. If only one spatial frequency component is unmasked, discrimination performance does not benefit from the presence of the stereoscopic depth cue.

44 38 Experiment I: Binocular unmasking and object detection Experiment I examined whether the detection of masked object images is aided by conditions that can promote binocular unmasking. Because findings by Moraglia and Schneider (1991) suggest that binocular unmasking decreases with increasing disparity, we decided to study binocular unmasking of object images using horizontal disparities that are known to produce a binocular unmasking with Gabor stimuli. Horizontal disparities of 6.76 and arcmin have previously been shown to produce reliable unmasking effects with spectrally simple stimuli (Speranza et al. 2001; Moraglia & Schneider, 1991). We started off investigating binocular unmasking of masked natural object images with the images of three objects (a dog, a house, a face) in a 6.76 arcmin disparity condition; we then added two additional images (of a house and of a table) in the arcmin disparity condition. The choice of these images was not inspired by a particular theory. Binocular unmasking was measured for each object separately. That is, objects were not presented together in an alternating fashion across trials. BMLDs were determined using a standard binocular unmasking task as described above. The effect of the spatial frequency composition of these complex images on binocular unmasking was studied by filtering each image in two different ways. In the first filter condition we removed those spatial frequency components in the image that coincide with notches in the spectrum of the summed noise, and left those that coincide with the peaks in the summed noise power spectrum.

45 39 Hence, even if the image is viewed under dichoptic viewing conditions, all its frequency components should be masked by the noise. In this case, we expect no binocular unmasking. The second condition was a reversed version of the first condition. Here the image had maximum power at spatial frequencies that cancel in the noise during binocular summation and minimum energy at spatial frequencies that add in the noise during binocular summation. In this condition all spatial frequency components of the image should be unmasked under dichoptic viewing conditions. We therefore expect binocular unmasking to be maximal in this condition. In addition, in the third condition we used the original unaltered version of the image power spectrum. In this condition some of the images spatial frequency components will be unmasked while others will not. By comparing the performance of these three conditions we should be able to determine whether all, none or part of the image s spatial frequency needs to be unmasked for dichoptic detection advantage to be observed; further, we should be able to test whether the binocular summation model is capable of predicting the observed effects. To obtain the filtered images in these three conditions, we manipulated the images in the following way. First, we calculated the summed noise power spectrum in Experiment I for the given disparity, e.g. dx = arcmin, as predicted by the binocular summation model. The noise power density function is given in (4). As described above the spectral density function varies sinusoidally along the horizontal spatial frequency axis. This change of power along the

46 40 horizontal spatial frequency axis can be described by the equation for a comb filter G[ξ,η ] = 1 [1 + cos( 2π ξ d + θ 2 )] (5), where ξ and η refer to the spatial frequency variables in degrees along the horizontal and vertical direction respectively. The amplitude of this function varies between 0 and 1. We used this function as a comb-filter to filter the object images in the non-shifted filter condition. Note that this filter has the same shape as that of the summed noise s power spectrum when ε = 2.2 cycles/deg and θ = 0. Hence, when this filter is applied to each object image it removes those spatial frequencies from the image that would normally fall in the notches of the summed noise power spectrum; yet, it leaves unchanged those spatial frequencies in the image that would fall on a peak in the summed noise s power spectrum. As outlined above, an object image that was filtered in this way has only energy at spatial frequencies that sum maximally in the noise power spectrum during summation. Thus, all spatial frequency components of the object image should be masked by the noise. The binocular summation model thus predicts that no dichoptic detection advantage in the detection of objects occurs in the nonshifted filter condition. In the shifted filter condition, we used a comb-filter whose spectral profile was shifted by 180 deg relative to the comb-filter of the non-shifted condition. To achieve this we simply set θ = π in equation (5). Hence when this filter is applied to an image it removes those spatial frequencies that would sum completely in the noise power spectrum during binocular summation. Conversely, the filter

47 41 leaves unchanged those frequencies in the image that cancel in the noise frequency spectrum during summation. Hence due to binocular summation the image will only have energy at spatial frequencies that cancel in the noise power spectrum and will have no energy at spatial frequencies that sum maximally. Consequently the noise spatial frequency components do not mask the spatial frequency components of the object image. The binocular summation model therefore predicts that objects in the shifted filter condition should be unmasked and a dichoptic detection advantage should arise. Finally, binocular unmasking was tested with original unfiltered images. In a two dimensional plot of the power spectra in the NdS0 condition, some of the object s spatial frequencies will fall into the notches while others will fall onto the peaks of the summed noise power spectrum. Hence only parts of the spatial frequencies will be unmasked. According to the binocular summation model, binocular unmasking could only occur under these conditions if the visual system is able to monitor the disparity channel with the highest S/N ratio while ignoring the other disparity channels. Schneider et al. s (1999) experiment with compound gratings, however, suggests that all spatial frequency components need to be unmasked in order to discriminate between two simple patterns. Since this is not the case with the unfiltered object images, it might be that no binocular unmasking is found in the unfiltered condition. Binocular unmasking in these three filter conditions (shifted, non-shifted, and unfiltered) was examined for each object separately. Binocular unmasking of

48 42 each object was measured at each disparity level (6.76 and arcmin) in four participants. Methods. Participants. We recruited in total 19 University of Toronto at Mississauga students (age: 18 to 32 years; 12 females and 7 males). Additionally the author participated in the experiment. We assigned four participants to each object condition. We defined normal corrected vision as a Snellen acuity better than 20/40, a contrast sensitivity that is within the chart s provided 95% confidence interval at all six tested spatial frequencies, and a stereoacuity of at least 40 arcsec. All participants had normal vision as measured with Snellen visual acuity, FACT contrast sensitivity using the OPTEC 6500 from Stereo Optical, Chicago, US, and the Frisby stereotest. In this and all following experiments all participants were informed about the purpose of the experiment and gave consent prior to their participation. All experiments in this thesis were conducted in accordance with the ethical guidelines of the University of Toronto. Apparatus and Stimuli. Stimulus generation. A digital picture of each object was taken. With regards to the face we took a picture of one female person with a digital camera. The person did not wear any jewelry or make-up and had a neutral facial expression. The following digital image manipulations were done using the Jasc Paint Shop Pro Software. All images were converted into a 128x128 pixels 8-bit grayscale picture with 256 shades of grey. The background of each object image was removed by setting it to a medium grey level (pixel value 127). For the face, we removed the person s ears, neck, and

49 43 hair by centering an ellipse (height: 107 pixels; width: 93 pixels) over the center of the person s face and setting the color for the area outside this ellipse to the same medium grey level. Finally the images were filtered using a custom written MATLAB program. Three different filter functions were applied. These were based on the twodimensional summed noise power spectrum as predicted by the binocular summation model. We modified the filter function so that its amplitude ranged between 0 and 1. The function is given in equation (5). In this equation θ was set to zero to generate the images in the non-shifted filter conditions. For the images in the shifted-filter conditions θ was equal to π. To generate the non-shifted filter and shifted filter images, the power spectra of the original images were multiplied with the former and latter function respectively with ξ = 2.2 cpd when the disparity shift in the noise was arcmin and ξ = 1.1 cpd when the disparity shift in the noise was 6.72 arcmin. Finally, we did not filter the image in the unfiltered condition. The image RMS was defined as: RMS = ( Li L ) 2 n, where Li is the pixel luminance, L is the mean luminance, and n the number of samples. We adjusted the RMS of each image within the object area to The noise stimuli were generated with a VSG graphics card from Cambridge Research. For each presentation of the noise a new sample of Gaussian random noise was generated. The noise had a mean value of 127. Its standard deviation was adjusted for each participant separately during the pilot runs to provide a performance level that allowed an accurate determination of the

50 44 psychometric function. The screen was set to black outside the stimulus presentation area. To ensure that object detection in noise was not based on occlusion cues, the object and the noise image were merged on each trial in the following way. For each object image, the difference between each pixel s luminance and the mean luminance (Li-127) was calculated. These differences were then added to their corresponding noise pixels. A sum that exceeded 256 or was less than 0 was clipped to 256 and 0 respectively. Stimulus presentation. The stimuli were presented on a Sony GDM F-520 color monitor with a refresh rate of 100Hz and a spatial resolution of 1024 x 769 pixels. Each pixel subtended 3.38 arcmin. The luminance ranged from 0.2 cd/m2 to 9.6 cd/m2 and was partitioned into 256 grey values. The monitor was gamma corrected before each session. A chin rest was used to minimize head movements and to help maintain fixation. Participants responses were collected with a remote controlled CB6 six button button-box (two rows of three buttons) from Cambridge Research. Custom-written software was used for the stimulus presentation. The participants viewed the stimuli through a simple lens stereoscope (Ogle, 1961, page 99), which split the screen in two halves using a thin black separator that ran from between the participant s eyes to the middle of the screen. This separator ensured that each eye saw only one half of the monitor screen. The centers of both screen halves were separated by the average inter-pupillary distance of 6.5 cm.

51 45 The sizes of the stimuli were measured horizontally and vertically at their largest expansion. The size was 5 x 6.7 deg for the image of the dog, 6.4 x 6.3 deg for the house, 3.4 x 6.2 deg for the table, and 5.6 x 6.5 deg for the tree. The face image was an ellipse with a diameter of 6.0 deg horizontally and 5.2 deg vertically. The noise square was 7.9 deg. Both the noise and the object were surrounded by a frame of white noise whose width was deg. When a stimulus was supposed to be presented off the fixation plane, its right eye version was shifted to the right. In the control conditions of Experiment I this shift was equal to either 6.76 (2 pixels) or min of arc (4 pixels). We had two disparity conditions (6.76 and arcmin). The three filter conditions (non-shifted filter, shifted filter, and unfiltered) and two unmasking conditions (S0 & Sd) were completely crossed and nested within an object condition (for 6.76 arcmin: house, dog, face; for arcmin: house, tree, face, dog, table). Because most participants only participated in one condition, we treated object and disparity as between subject factors in our analysis. Filter condition, and unmasking condition were within subject factors. Participants were repeatedly tested on the six unmasking/filter combinations. Thus, binocular unmasking of face detection was assessed in a 5 (object) x 2(disparity condition) x 2 (unmasking) x 3 (filter conditions) nested factorial design. On each trial an object image was randomly adjusted to one of five RMS values. These five RMS values were determined in the first two sessions of the experiment. The RMS values were chosen so that they captured the psychometric function accurately.

52 46 Procedure. The participant was seated in a darkened testing room in front of the apparatus. The participant was instructed to look through the stereoscope while the chinrest and chair height were adjusted at the beginning of each session to the appropriate level. All participants were able to fuse the fixation dots that appeared in the middle of left and right hand side of the screen effortlessly as per self report. On the first session the experimenter explained the task to the participant. Participants conducted a 2IFC detection experiment. A trial started with a white fixation dot in the middle of the stereoscope. The participant then had to press the upper middle button on the button-box to initiate the trial. A blank screen was then displayed for 120 ms, and was followed by two 500 ms intervals of the noise frame and the square noise patch, separated by a 500-ms inter-stimulus interval. The target image was randomly presented in one of these two intervals, always in the center of the display. At the end of this sequence the participant saw a black screen and had to decide in which interval the face had appeared. The participant pressed the left upper button for the first interval and the right upper button for the second interval; a 500 Hz sound signaled the occurrence of a correct response. After the feedback the fixation dot appeared again, to signal the onset of the next trial. A run consisted of 100 trials. The interval containing the target for each trial was determined by the software in a pseudo-random fashion at the beginning of the experiment. On 50% of the trials the target appeared in the first interval and on the remaining trials in the second interval. The object image s contrast varied randomly from trial to trial. Specifically, the image contrast was randomly set to one of 5 possible contrast

53 47 levels (see below). The combinations of contrast levels and the intervals in which the target appeared were counterbalanced across trials. After a run was completed, the participant received another run that had a different unmasking/filter combination. The six unmasking/filter combinations were chosen in random order with the only restriction that all six combinations were to be presented to the participant before they were presented again. Each selected participant confirmed that he or she had no difficulty perceiving the apparent depth of the displays for both disparity conditions. Each participant received several training runs at the beginning of the first session. These runs served to adjust the background noise s RMS level in such a way that the participant s performance ranged between close to 50% accuracy to close to 100% accuracy. During these runs ten contrasts were used. After the appropriate noise level was determined, participants received two runs of each of the six unmasking/filter conditions. After these, the number of contrast levels was reduced from ten to five. To do so the psychometric function was plotted based on these intermediate results and the author picked five representative contrast levels. These five contrast levels served as the contrast levels for the experiment. Each participant completed nine runs of each condition. Hence, 200 data points were collected for each of the five contrast levels for every unmasking/filter combination. Thus, we obtained 6000 data points from each participant for our analysis. Participants were allowed to have breaks at any time. An experimental session was no longer than one hour. After that participants were asked to take

54 48 at least a 30 minute break. On average participants needed 8-12 one-hour sessions to complete the experiment. Results and Discussion In Experiment I we were interested in establishing whether binocular unmasking occurs in detection tasks with spectrally complex signals. We expected to observe the largest advantage of dichoptic viewing in the shiftedfilter condition, a smaller or no binocular unmasking in the unfiltered condition and no dichoptic advantage in the non-shifted filter condition. We calculated the mean for each S/N ratio level for each participant at each contrast level. Using these means we fitted the psychometric function to the data based on the Weibull distribution. The response accuracy ψ was described by a five parameter function: x ψ (x,α, β, γ, λ ) = γ + (1 - γ - λ ) 1 - e α β., where x is the S/N ratio in db, γ is lower performance limit (chance level performance), λ upper performance limit (miss rate), α is the threshold parameter, and β the steepness parameter of the psychometric function. In Experiment I chance level performance was 50% and hence γ =0.5. We allowed α, β, and λ to vary. Wichmann and Hill (2001) have shown that slight deviations from optimal performance at high contrast levels affect the fit of the psychometric function in some cases substantially. To avoid this effect, they suggested to let λ vary within a constrained narrow range [0;0.6]. We followed their recommendation when fitting the psychometric functions.

55 49 The individual results are shown in Figure 7 and 8. The figures show the mean performance at each S/N level together with the fits of the psychometric functions for each object, participant and filter condition separately. The psychometric functions provide a good fit to data in all conditions. Figure 7 shows the results of the 6.76 arcmin disparity condition and Figure 8 those of the arcmin disparity condition. The stars in each graph mark the 75% accuracy threshold in the NdSd (dashed line) and the NdS0 (solid line) condition as estimated by the fitted psychometric function. If this estimated 75% accuracy level was above the highest measured accuracy level, we measured the BMLD at the highest available accuracy level. Binocular unmasking is indicated by a leftward shift of the psychometric function in the NdS0 condition. The difference between the two lines at the 75% accuracy threshold (starred) is the BMLD and measures the amount of binocular unmasking.

56 50 Image:Dog, disparity: 6.56 arcmin non-shifted shifted unfiltered Participant 0s 0sb Image: Face, disparity: 6.56 arcmin non-shifted Participant 0s shifted unfiltered

57 51 Image: House, disparity: 6.56 non-shifted arcmin shifted unfiltered Participant 0s 0sb Figure 7. Detection accuracy as a function of S/N ratio in the 6.76 arcmin condition of Experiment I. The lines show the fitted psychometric functions. The result is shown for each object condition (different panels), filter condition (in columns) and participant (in rows) separately. Triangles and the dashed lines refer to the mean and the fitted psychometric function in the NdSd condition respectively. Circles and solid lines refer to the means and the fitted psychometric function in the NdS0 condition respectively. Stars indicate the 75% accuracy threshold as estimated by the psychometric functions.

58 Image: Dog, disparity: arcmin non-shifted shifted unfiltered Participant 0s 0sb Image:Face, disparity: arcmin non-shifted Participant 0s 0sb 30 5E shifted unfiltered 52

59 53 Image:House, disparity: arcmin non-shifted shifted unfiltered Participant 0e 0s 2E 9E Image: Table, disparity: arcmin non-shifted Participant 0s 0sb shifted unfiltered

60 54 Image: Tree, disparity: arcmin non-shifted shifted unfiltered Participant 0s 0sb Figure 8. Detection accuracy as a function of S/N ratio in the arcmin condition of Experiment I. The lines denote the fitted psychometric functions. The result is shown for each object condition (different panels), filter condition (in columns) and participant (in rows) separately. Triangles and the dashed lines refer to the mean and the fitted psychometric function in the NdSd condition respectively. Circles and solid lines refer to the means and the fitted psychometric function in the NdS0 condition respectively. The stars indicate the 75% accuracy threshold as estimated by the psychometric function.

61 55 The results for the 6.76 and arcmin condition are similar and will be therefore discussed together. Clearly the BMLDs were largest in the shifted filter condition irrespective of object and disparity condition. All but three participants showed a dichoptic detection advantage in this condition for each of the targets and for both levels of disparity. Only participant 37E in the dog condition at 6.76 arcmin, participant 16E in the tree condition at arcmin, and participant 40E in the dog condition at arcmin had BMLDs close to zero in the shifted filter condition. Small BMLDs are found in the other filter conditions. Binocular unmasking seemed to be absent with unfiltered images except for the house. Similarly there was no evidence for binocular unmasking in the non-shifted filter condition except for the house. It seems that the house stimulus is the only stimulus that allows binocular unmasking in all three filter conditions. However binocular unmasking with the house in the unfiltered and non-shifted filter condition appears to be smaller than in the shifted filter condition. Moreover, binocular unmasking with the house seems larger in the unfiltered condition than in the non-shifted filter condition. We tested the effect of disparity in a 5 (objects) x 2 (disparity) x 3 (filter condition) nested ANOVA. We only used the BMLDs of the house, face, and dog condition since they were tested in both experiments. In four cases participants did not reach the 75% detection threshold (6.76 arcmin: participant 24E in the shifted house condition, participant 21E in the unfiltered condition; arcmin: participant 2E unfiltered house condition; participant 34E in the non-shifted table

62 56 condition). In these cases we calculated BMLDs not on the 75% accuracy level but on the highest accuracy level available. We only found significant main effects of filter condition and object condition, F(2,36)=64.91, p<0.001, and F(2,18)=22.01, p<0.001 respectively. Neither the main effect of disparity nor any other interaction was significant, p>0.1. Overall the results suggested that the magnitude of binocular unmasking was the same for disparities of 6.76 and arcmin. To assess whether the BMLDs actually reflect binocular unmasking we constructed confidence intervals for the mean BMLDs. We estimated the standard error necessary for the calculation of confidence intervals for the 6.76 and arcmin conditions separately. Specifically we calculated separate object-by-filter nested ANOVAs for both disparity condition and used the error term of the interaction in the estimation of the standard error. The error terms for the 6.76 and arcmin conditions were MSError=2.697 and MSError=2.017 respectively. The Bonferroni correction was done for each filter condition and disparity condition separately. For the sake of clarification it should be noted here that confidence intervals which enclose the value zero indicate that the amount of binocular disparity does not differ significantly from zero. Figure 9 shows the result of the analysis. The results of the 6.76 condition are given in panel A and the results of the condition are given in panel B. Because we obtained similar results in both disparity conditions we discuss the results together. If not otherwise noted the results apply to both disparity conditions. BMLDs in the shifted filter condition were always significantly larger

63 57 than zero. Especially the mean BMLD of the table in the condition was noticeably larger than those of other objects. BMLDs in non-shifted filter condition were not significantly different from zero. Similarly BMLDs in the unfiltered condition were not significantly different from zero except for the house condition. We examined whether BMLDs obtained in shifted filter condition of the object detection task are comparable to BMLDs in Gabor detection task. The binocular unmasking effect seems smaller for object detection than for Gabor detection at 6.76 arcmin. Speranza et al. (2001) reported mean BMLD of about 12dB in a Gabor detection task while we found an almost 50% smaller mean BMLD of 6.53 db. In contrast, BMLDs of object detection and Gabor detection tasks seem to be comparable at arcmin. We found a mean BMLD of 6.43 db in an object detection task that is comparable to the mean BMLD of 7.58 db reported by Moraglia and Schneider (1990). Overall, the results of Experiment I suggest that binocular unmasking is observed in the detection of spectrally complex object images.

64 arcmin 16 A non-shifted filter 14 shifted-filter unfiltered BMLD (db) dog face house dog face house dog face house Object condition arcmin B 16 non-shifted filter 14 shifted-filter unfiltered BMLD (db) dog face house table tree dog face house table tree dog face house table tree Object condition Figure 9. Mean BMLDs of Experiment I shown for each filter condition and object separately. A: 6.76 arcmin condition. B: arcmin condition. Bars show the 95% Bonferroni corrected confidence interval.

65 59 However, the data suggest that the visual system can only take advantage of binocular summation if all spatial frequencies are unmasked. If only part of the spatial frequency information is unmasked the binocular summation effect is very much reduced or absent. Experiment II Binocular unmasking and object categorization Experiment II explored binocular unmasking effects in an object categorization task. Using the same five objects as in Experiment I, we tested object categorization in a one-interval-forced-choice task. Each of the five objects was assigned to a particular button on the six button button-box. On a given trial one out of the five objects was randomly presented to the participant. The participant specified the object category (house, dog, face, tree, table) by pressing the corresponding button. The contrast was varied across trials to determine the psychometric functions. We further explored whether binocular unmasking for object categorization occurs over a wide range of disparities. Previous experiments with spectrally simple stimuli observed binocular unmasking in detection tasks for disparities up to arcmin (Moraglia & Schneider, 1990). However at larger disparities the magnitude of binocular unmasking decreased. To examine whether object categorization benefits from binocular unmasking over the same range, we tested binocular unmasking in an object categorization task at 6.76, 13.52, 27.04, and arcmin. We used the same images as in Experiment I.

66 60 Methods The methods were the same as in Experiment I with the following exceptions. Participants. In total 11 participants served in this experiment, one of them being the author (age: 18 to 32 years; 6 females and 5 males). All participants had normal Snellen acuity, contrast sensitivity as measured by the FACT contrast sensitivity test, and a stereoacuity test (Frisy s stereotest) (see Experiment I for the definition of normal vision ). Stimuli. To generate the stimuli for the other disparities levels, the five objects were filtered in the same way as described in Experiment I. The filter function was adjusted to each disparity level. Procedure. Participants conducted a categorization experiment in a single-interval, five-alternative,-forced-choice task. The same six-button buttonbox of Experiment I (two rows of three buttons) was used here. The first five buttons on the button box were used to specify the object category. Each participant received a different button-object category assignment. The last button (lower right button) was used to initiate the trial. A trial started with a white fixation dot in the middle of the screen. Following a 120 ms period of a blank screen the noise frame, the square noise patch, and one of the five objects were presented for 500 ms. The object on the current trial was randomly determined before the experiment in a pseudo-random fashion. Each object was presented an equal number of times. At the end of this sequence the participant saw a black screen and had to decide which of the five objects had appeared. The

67 61 participant categorized the object by pressing the corresponding button on the button-box. A 500-Hz sound provided feedback after each correct response. After the feedback the fixation dot appeared again. The participant initiated the next trial by pressing the lower right button. At the beginning of the first session participants familiarized themselves with the objects and the button on the button box that was associated with each object by looking at pictures of the unfiltered objects. Each participant then received several training runs at the beginning of the first session. In these runs the participants simply had to categorize the unfiltered objects that were presented with zero disparity and in the absence of visual noise by pressing the corresponding button on the button box. Participants were required to have an identification accuracy of at least 80% to proceed to the experiment. Results and Discussion We fitted the psychometric functions to the data of each participant using the method described in Experiment I (λ=0.20). The psychometric functions fit the data well. The individual results are shown in Figure 10. Each panel of Figure 10 refers to a different disparity condition. In general we observed binocular unmasking only in the shifted-filter condition. Binocular unmasking was never observed in the non-shifted filter and in the unfiltered condition. Binocular unmasking in the shifted filter condition depended on the disparity level. At 6.76 arcmin binocular unmasking seems to be small. When the disparity level was increased to arcmin binocular unmasking became larger. A further increase of the disparity level seems to erase the binocular

68 62 unmasking effect. These binocular unmasking results for categorization of objects differ somewhat from those found for the detection of simple Gabor patterns. Previous detection studies reported that binocular unmasking in detection experiments is largest at small disparities and decreases with increasing disparity (Moraglia & Schneider, 1990; Speranza et al. 2001). Binocular unmasking was observed up to arcmin in Gabor detection tasks (Moraglia & Schneider, 1990). We examined whether the effect of disparity had an effect on BMLDs in a one-way ANOVA using only the data of the shifted-filter condition. The nonsignificant effect of disparity, F(3,12)=2.05, p=0.160 suggested that disparity did not affect BMLDs significantly. To determine the conditions in which we find binocular unmasking we tested whether BMLDs were significantly different from zero using confidence intervals. We calculated the 95% Bonferroni corrected confidence interval using the error term of the ANOVA. The Bonferroni correction was done for each filter group separately. The results are shown in Figure 11 for each filter condition and disparity condition. None of the BMLDs are significantly different from zero except for the BMLDs of the shifted filter condition at arcmin and 6.76

69 63 non-shifted Disparity 6.76 arcmin shifted unfiltered non-shifted Disparity arcmin shifted unfiltered Participant 0s Participant 0e 0s 0sb 3E

70 64 non-shifted Disparity arcmin shifted unfiltered non-shifted Disparity arcmin shifted unfiltered Participant Participant 0e 0s 4E 5E Figure 10. Categorization accuracy as a function of S/N ratio in Experiment II. The lines denote the fitted psychometric functions. The results are

71 65 displayed for each participant (across rows), filter condition (across columns), and disparity (across panels) separately. Triangles and the dashed lines refer to the mean and the fitted psychometric function in the NdSd condition respectively. Circles and solid lines refer to the means and the fitted psychometric function in the NdS0 condition respectively. The stars indicate the 75% accuracy threshold as estimated by the psychometric function.

72 66 arcmin. The BMLDs between these two conditions did not differ significantly, t(3)=0.73, p= To assess whether the binocular summation effect in an object detection task is similar to the one in an object categorization task, we compared the BMLDs of Experiment I and II in a 3 (filter condition) x 2 (experiment) x 2 (disparity) nested ANOVA. The analysis concerned only those data in the and 6.76 arcmin condition. We observed a significant main effect for experiment, F(1,26)=4.96, p=0.033, and filter condition, F(2,82)=32.46, p< The main effect for disparity, however, was not significant, F(1,26)=0.05, p= The interaction of filter condition and experiment was also significant, F(2,82)=3.69, p= All other interactions were non-significant, p>0.05. Overall these results suggested that BMLDs vary significantly with the type of task (detection or recognition task). However the effect of the filter condition differed across tasks. Disparity had no significant effect whatsoever on the BMLDs. We inspected the significant interaction between task and filter condition using Bonferroni adjusted independent t-tests. The results show that BMLDs only declined significantly in the shifted filter condition between Experiment I and II. The BMLDs of the other filter conditions were statistically the same between Experiment I and II. The significant decrease of BMLDs in the shifted-filter condition across experiments suggests that unmasking is smaller in categorization than in detection tasks. In summary, we found evidence for binocular unmasking also in object categorization tasks. Binocular unmasking was only observed in the 6.76 and

73 non-shifted filter shifted-filter unfiltered 12 BMLD (db) Disparity (arcmin) Figure 11. Mean BMLDs of Experiment II shown for each filter condition and disparity condition separately. Bars show the 95% Bonferroni corrected confidence interval.

74 arcmin condition. In both object detection and object categorization tasks we find binocular unmasking only in the shifted-filter condition. Thus, binocular unmasking seems to occur only if all spatial frequency components can be unmasked. Moreover the effect of binocular unmasking seems to be attenuated in the object categorization task compared to the object detection task. Experiment III binocular unmasking and object identification In Experiment III we investigated the effects of binocular unmasking in an object identification task. We chose to investigate object identification by means of face identification, since this is one of the most frequently conducted identification task by humans. We took pictures of the faces of four more persons (2 female and 2 male) and manipulated them in exactly the same manner as described in Experiment I, by displaying only the oval central part of the face. Participants identified these faces in the same one-interval-forced-choice task that was used in Experiment II. We did not expect binocular unmasking in identification tasks to occur on disparity levels in which we did not observe binocular unmasking in categorization tasks. Hence, we tested face identification only at the arcmin disparity, for it is this disparity level that we had observed the largest unmasking effect in the previous two experiments.

75 69 Methods. The methods were identical to those described in earlier experiments. We will therefore only note deviations from the previous methods. Participants. We recruited four participants for this Experiment (age: 20 to 32; 3 females and 1 male). All participants had normal Snellen acuity, contrast sensitivity as measured by the FACT contrast sensitivity test, and stereoactuity as measured by the Frisby stereotest (see Experiment I for the definition of normal vision ). Stimuli. We created four more face stimuli from the picture of two female and two male persons. The same restrictions and methods as described in Experiment I applied to the generation of these four face stimuli. Procedure. The procedure was identical to Experiment II except that participants were trained on the 5 faces at the beginning of the experiment until they could identify faces with 80% accuracy. Each face was assigned a specific button on the response box (no sham names were used). The face-button assignment was different across participants. Results and Discussion. The individual results along with the fitted psychometric function are shown in Figure 12. The psychometric functions provide a good fit to the data. Similar to the previous experiment, the figure suggests that binocular unmasking was largest in the shifted-filter condition. No evidence for binocular unmasking was found in the non-shifted filter condition. Also no binocular unmasking was

76 70 found in the unfiltered condition except for one participant who seemed to show some evidence of binocular unmasking. We found that filter condition had a significant effect on BMLDs in a face identification task using a one-way within ANOVA, F(2,6)=12.66, p=0007. The confidence interval plot in Figure 13 shows that mean BMLDs in the shifted filter condition are significantly different from zero. No evidence for binocular unmasking was observed in the other filter conditions. We compared differences in binocular unmasking in object detection, object categorization, and object identification tasks at the at arcmin disparity level. Because binocular unmasking occurred only in the shifted filter condition, we compared only BMLDs of the shifted filter condition across all three conditions using a Tukey post-hoc comparison. We did not find any significant difference in unmasking between categorization and identification tasks, p= However, the BMLDs in both categorization and identification tasks were significantly smaller than in the detection task, p= and p= respectively. In conclusion we also found binocular unmasking in object identification tasks when all spatial frequencies are unmasked. Moreover, binocular unmasking improved object identification to the same degree as object categorization.

77 71 non-shifted Disparity arcmin shifted unfiltered Participant 0e 0p 0s 0sb Figure 12. Identification accuracy as a function of S/N ratio in Experiment III. The lines show the fitted psychometric function. The results are given for each participant (across rows), and filter condition (across columns) separately. Triangles and the dashed lines refer to the mean and the fitted psychometric function in the NdSd condition respectively. Circles and solid lines refer to the means and the fitted psychometric function in the NdS0 condition respectively. The stars indicate the 75% accuracy threshold as estimated by the psychometric function.

78 BMLD (db) non-shifted filter shifted-filter unfiltered Filter condition Figure 13. Mean BMLDs in the object identification task listed for each filter condition separately. The bars indicate the 95% Bonferroni corrected confidence interval.

79 73 General discussion The present work investigated whether the perception of noise masked images depicting natural objects can be enhanced by the presence of stereoscopic depth cues. The binocular summation model predicts that object detection, classification, and identification depend upon the presence of stereoscopic depth cues and their relation to the spatial frequency content of the stimuli. We subdivided object perception into object detection, object categorization, and object identification and examined the occurrence of binocular unmasking in these three tasks separately. Furthermore to investigate the influence of the spatial frequency content of the image on binocular unmasking, we created three versions of the images under study. Specifically we determined the spatial frequency content that is critical for binocular unmasking using the binocular summation model. In the non-shifted condition we removed the spatial frequency components of an image that are essential for binocular unmasking. In the shifted filter condition we removed those spatial frequency components of an image that do not contribute to binocular unmasking. And in the unfiltered condition the spatial frequency content of the images was left unaltered. Overall we find that binocular unmasking occurs in the detection, categorization, and identification of noise masked object images. Specifically dichoptic viewing enhances task performance under conditions deemed optimal by the summation model. Under these conditions, the presence of relative disparity cues leads to 3-6 db lower thresholds compared to when they are

80 74 absent. However we find the magnitude of the effect to vary with the type of task. Specifically, depth cues enhanced object detection to a significantly larger extent than object categorization or object identification. Binocular summation however aided the categorization and the identification of objects by the same amount. Interestingly, the effect of binocular unmasking in the detection of noise masked visual objects is as large as in the detection of psychophysical stimuli with certain disparity levels; specifically, with a disparity of arcmin. With the smaller disparities (6.76 arcmin) employed here, however, the detection of psychophysical stimuli benefits more from the presence of stereoscopic depth cues than the detection of object images. As we will explain later this finding might be understood in terms of binocular summation model. We further found that binocular unmasking requires all relevant spatial frequency components of the object image to be unmasked. When an image consists mainly of frequency components that cannot be unmasked (non-shifted condition), binocular unmasking was not observed. Similarly, participants showed no unmasking effects with images where some frequency components cannot be unmasked. An exception is the detection of a house image which leads to binocular unmasking regardless of its spatial frequency content. Possible explanations for this unusual observation are addressed in section on the fitting of the binocular summation model. To investigate the degree to which the binocular summation model is able to explain our findings, we fitted the binocular summation model to the data of the object detection task. Specifically, we calculated the BMLDs as predicted by the

81 75 binocular summation model and compared them to the obtained BMLDs for each condition separately. The binocular summation model assumes that there are multiple disparity channels tuned to different degrees of horizontal and vertical disparities. Let LL[x,y] represent the distribution of luminance in the left-eye retinal image. When the right-eye image is the same as the left-eye image but displaced horizontally by dx and vertically by dy, then LR[x,y] = LL [x+dx, y+dy]. In the model, a disparity channel, D[α,β], is characterized by two parameter, α and β, where α and β represent the horizontal and vertical disparities, respectively, for this channel. When the right-eye image is a displaced version of the left-eye image, the summed binocular image in the D[α,β] channel becomes L D [α β ] [ x, y ] = L L [ x, y ] + L L [ x + d x + α, y + d y + β ] (6) where dx is the horizontal displacement and dy is the vertical displacement in the right-eye image. Let FL [ε,η ] be the spectrum of LL [ x, y ]. According to equation (4) the spectrum of the binocular image in this channel becomes FD[α,β ] [ε,η ] = FL [ε,η ] (2 + 2 cos[2π ε (d x + α ) + 2π η (d y + β )]) (7) In the model, each disparity channel contains an array of orientationspecific, spatial- frequency filters. The spectral profile, C[ξ,η], of each of these spatial-frequency filters is given by C[ε,η ] = e b 2 (η η )2 b 2 (ε ε )2 (8)

82 76, where η, ε is the vertical and horizontal spatial frequency variable respectively; η, ε refers to the vertical and horizontal center frequencies of the filter, and b determines the bandwidth of the of the filter. Now when the input to disparity channel D[α,β] is as specified in equation (7), the output of the filter is obtained by multiplying equation (7) by equation (8) and integrating over spatial frequency. In this way the response of each of the spatial-frequency filters in each disparity channel to input LL[x,y] can be determined. The model assumes that threshold is reached when the S/N ratio in any one of these filters in any one of these channels reaches a critical value. Hence, it is assumed that the visual system monitors the activity of all of the spatialfrequency filters in each disparity channel, and responds that there is an object present when the S/N ratio in any one of these filters reaches a criterion value. Hence, to find the BMLD predicted by the model we found the spatialfrequency filter within the array of disparity channels that produce the highest average S/N ratio in the NdSd and NdS0 conditions. Call these filters FilterMax,NdSd and FilterMax,NdS0, respectively. Note that FilterMax,NdSd and FilterMax,NdS0 could have different values of η, ε (but not of b) and be from different disparity channels. The model assumes that the object would be detected in the NdSd condition when the S/N noise ratio reached a critical value, cs/n, in FilterMax,NdSd. We also assumed that the object would be detected in NdS0 conditions when the S/N ratio also reached cs/n in FilterMax,NdS0. Note that in the NdSd condition, according to our model, detection is based on activity in the disparity channel with 0 shift both vertically and horizontally, that

83 77 is, in disparity channel D[0,0]. The spectral profile of the summed noise in the D[0,0] disparity channel, for condition the NdSd, is given by substituting power spectrum of the noise, G[ε,η], for FL[ε,η] in Equation 7, setting dx= d, dy= 0, α = 0, and β =0. By convolving the spectral profile of the summed noise with each of the spatial-frequency filters in the D[0,0] channel in the NdSd condition, we were able to determine the response of each of these filters to the noise pattern alone. In this model we also assumed that there was a certain amount of internal noise, NInt in each spatial frequency filter, and that this noise was independent of the external noise. Hence, the total response of the filter in the absence of a signal is PTot, = PExt + PInt, where PInt is the average power of the internal noise, and PExt power at the output of the filter due to the external noise. The second step was to determine the average response of each filter in the D[0,0] channel to an object presented at the threshold contrast. To do this we determined the summed power spectrum of the three different versions of each object (filtered, filtered and phase shifted, and unfiltered) in the D[0,0] disparity channel, when that object s contrast was at its threshold value, CNdSd. The summed power spectrum for each of the three versions of an object was obtained by substituting the power spectrum of the object at threshold for FL(ε, η) in Equation (7), again setting dx= d, dy= 0, α = 0, and β =0. By convolving the object s summed power spectrum at threshold with the spectral profile of each filter, we determined the average response of each filter when the input was one of the three versions of an object whose contrast level was such that it was detected 75% of the time. The ratio of the average object power at the output of

84 78 the filter due to the object, to the total noise power in the filter, defined the S/N ratio in that filter. We then looked for the filter, FilterMax,NdSd producing the largest S/N ratio. Because the contrast in the object was set to that which produced 75% correct detection, the S/N in FilterMax,NdSd defined cs/n for the version of the object under consideration in the NdSd condition. To find the channel with the maximum S/N in the NdS0 condition we first note that the power spectrum of the summed left- and right-noises depends on the disparity channel in which the summation occurs. Hence, for disparity channel D[α,β] the spectrum of the summed left- and right-eye noises is given by substituting the power spectrum of the noise, G[ε,η], for FL[ε,η] in Equation 7 and setting dx = d and dy = 0. By convolving this power spectrum with each of the filters in D[α,β] we can obtain the average power passed by each filter in D[α,β] when the input is the noise image only. For a fixed level of contrast we then determine the spectrum of the input to the summed left- and right-eye object images in D[α,β] by substituting the power spectrum of the object for FL[ε,η] in Equation (7), again setting dx= d, and dy= 0. Convolving this power spectrum with each filter in D[α,β] yields the average power output due to the object for each filter in D[α,β]. In this way we can compute the S/N ratio for each filter in each of the disparity channels. If we do this for the entire array of filters for all of the disparity channels, we can then determine the binocular filter, FilterMax,NdS0 having the maximum S/N for condition NdS0 We can then adjust the contrast in the object so that the S/N ratio in FilterMax,NdS0 is equal to cs/n. The value of

85 79 contrast that produces a S/N ratio equal to cs/n in FilterMax,NdS0 is the predicted threshold contrast, pred[cnds0], for condition NdS0. The predicted BMLD, therefore, Pred[BMLD] =20*log[CNdSd/pred[CNdS0]]. (9) In this simple binocular summation model two parameters are free to vary We determined the best fitting BMLDs by varying the disparity channel s bandwidth, b, and the spectral density of the internal noise in the disparity channel. The disparity channel s bandwidth (in octaves) and the internal noise level were held constant across all spatial frequencies in all disparity channels. The model minimized the squared residuals between the predicted and observed BMLDs across all objects of a particular task and disparity level. Finally the disparity resolution in this current implementation of the model was the same as the pixel size of the monitor (3.38 arcmin), and the frequency resolution for the filter array was 0.01 cpd. In the 6.76 arcmin detection condition we obtained the best fit for a disparity channel with a half-power point bandwidth of 0.51 octaves and a spectral density of the internal noise of 1.35 deg2. With these parameters we were able to account for 57.71% of the variance in the BMLD scores. Figure 14A shows that the binocular summation model is able to predict the overall trend within the data. Specifically it predicts that the BMLDs should be close to zero in the non-shifted filter condition, large in the shifted filter condition, and close to zero in the unfiltered condition. The model is however unable to predict the larger BMLD associated with the house in the non-shifted filter and unfiltered condition. The house might be

86 80 better detected than any of the other objects due to its mainly vertical spatial frequencies. Several pieces of evidence suggest that stereoscopic vision is most sensitive to horizontal disparities. Horizontal disparities can be best detected with vertically orientated stimuli (e.g. Ogle, 1955, p. 496; Ebenholz & Walchli, 1965). It is possible that the binocular summation mechanism also had its peak sensitivity for vertical orientations. In this case, stimuli with more energy at vertical spatial frequencies should produce a better S/N ratio in vertically oriented disparity channels than stimuli whose energy is more equally dispersed over several orientations. One possible way for accommodating this within the model would be to set the level of internal noise to be lower in the vertical and horizontal spatial frequency channels. Because the house has more energy at vertical spatial frequencies than any of the other objects regardless of the filter condition, one would expect larger BMLDs with the house. Moreover, the binocular summation model does not predict an elevated BMLD for the face in the shifted filter condition. The reason for this is that the face in the shifted-filter condition has little energy at and around the unmasked spatial frequencies. The S/N ratio in this region is actually so little that other disparity channels with a different center frequency provide a larger S/N ratio in the NdS0 condition. Face detection performance therefore does not benefit from binocular unmasking due to a lack of energy in the unmasked spatial frequency regions. We obtained a better fit for our data with the binocular summation model in the arcmin condition. Assuming that the disparity channels bandwidth is

87 octaves and the spectral density of the internal noise is 1 deg2, the fitted model explained % of the overall variance in the observed BMLD scores. The predicted versus fitted scores is shown in Figure 14B for each object separately. The model is able to predict the overall trend in the data. That is, small BMLDs in the non-shifted filter condition, the largest BMLDs in the shiftedfilter condition, and small BMLDs in the unfiltered condition. The model tends to underestimate the magnitude of BMLDs. The largest underestimations occur for the house in the non-shifted filter condition, the face and the table in the shifted filter condition, and the house and the tree in the unfiltered condition. Only for the tree in the shifted filter condition the model predicts a larger BMLD than the observed BMLD. Interestingly we found that for both the 6.76 arcmin and the arcmin condition the model predicts BMLDs close to zero for a face detection task. However, our data clearly shows that binocular unmasking occurs for faces at least in the shifted filter condition. One reason for this discrepancy might be that

88 82 9 A 8 non-shifted filter shifted filter unfiltered 7 BMLD (db) dog face house dog face house dog face house Object condition 12 B non-shifted filter shifted filter unfiltered 10 BMLD (db) dog face house table tree dog face house table tree dog face house table tree Object condition Figure 14. Predictions of the binocular summation model in Experiment I. The predicted (x) and obtained ( ) BMLD s are shown for each object, and the 6.76 arcmin condition (Panel A) and arcmin condition (Panel B) separately.

89 83 the original face image has most of the energy at the low spatial frequencies (up to 1.3 cpd). There is little energy at higher spatial frequencies in the face image. Yet, the cancellation of noise spatial frequencies occurs at higher spatial frequencies: at 4.4 cpd in the 6.76 arcmin condition, and at 2.2, and 6.6 cpd in the arcmin condition. Hence, the noise frequency components cancel at spatial frequencies at which the face image has no energy. Consequently, a binocular unmasking advantage does not occur because the face image has only little energy at the unmasked noise frequencies. The binocular summation model predicts BMLDs of similar magnitude for the 6.76 and the arcmin condition. The mean BMLD in the 6.76 condition is 4.28 db and in the arcmin condition 3.65 db. Our results also show that the magnitude of binocular unmasking is similar in the 6.76 arcmin and the arcmin condition. Hence unlike the Gabor detection task where the binocular summation model predicts a sharp increase of BMLDs with increasing disparity, the binocular summation model predicts a smaller increase of BMLDs in the detection of objects. The observed non-significant difference between the observed BMLDs in the 6.76 and arcmin conditions might result from an increase that is too small to reach significance. Overall the estimated model parameters deviate largely from previous reports. For example, the best fitting binocular summation model at 6.76 arcmin has an unusually high internal noise level of 1.35 deg2. This value corresponds to 33.75% of the external noise level. Moreover, fits of the binocular summation

90 84 model to the data from Gabor detection tasks found internal noise levels that were several magnitudes smaller. Speranza et al. (2001) found a maximum spectral density of the internal noise levels of about deg2. Moreover we found an unusually small bandwidth of the disparity channel of about 0.5 octaves. Speranza and colleagues (2001) determined the disparity channel s bandwidth in a binocular unmasking task to be about 1 octave which is twice the size of what we found. Other investigators found bandwidths of a similar magnitude to Speranza et al. (Oruç, Landy, & Pelli, 2006; Stromeyer & Julesz, 1972). We were interested in the goodness of fit of the binocular summation with more reasonable internal noise levels. We therefore placed further restrictions on our model fitting. We let the internal noise level and the disparity bandwidth vary from 0 to 0.04 deg2 and 0.5 to 1.5 octaves respectively. Using these restrictions the binocular summation model provides a very poor fit to the data in both the 6.76 arcmin and arcmin condition. The explained variance was only 3.05% in the 6.76 arcmin condition using an internal noise level of zero and a disparity bandwidth of 1.5 octaves. The obtained fit in arcmin condition also decreased but it was still better than in the 6.76 arcmin condition. Here the model can account for 50.02% of the variance. The model parameters were estimated to 0.04 deg2 for the spectrum density of the internal noise and octaves for the disparity channels bandwidth.

91 A non-shifted filter shifted filter unfiltered 7 BMLD (db) dog face house dog face house dog face house Object condition 12 non-shifted filter shifted filter unfiltered 10 B BMLD (db) dog face house table tree dog face house table tree dog face house table tree Object condition Figure 15. Predicted (x) and obtained ( ) BMLDs in Experiment I when using an upper limit for the internal noise. Panel A: results of the 6.76 arcmin condition; Panel B: the results of the arcmin condition.

92 86 The binocular summation model predicts BMLDs that are close to zero for all objects and filter conditions in the 6.76 arcmin condition. The results are shown in Figure 15A. Likewise the binocular summation model predicts BMLDs close to zero in the arcmin condition except for the BMLDs of the house, table, and tree in the shifted filter condition (Figure 15B). For these objects the predicted BMLDs provide a reasonable fit to the observed BMLDs. In summary, the binocular summation model fits the detection data only moderately when very high internal noise levels are assumed. Placing an upper boundary on internal noise levels worsens the obtained fit and results in an overall poor fit of the binocular summation model. Note that the binocular summation model is unable to explain the differences between Experiment I and II. We have shown above that the mean BMLDs are significantly lower in object categorization task than in the object detection task. The binocular summation model is not able to account for this result. It predicts the same BMLDs for both the detection and categorization task since both tasks employ the same stimuli. Finally we fitted the binocular summation model to the data of the identification task. The best fitting model was able to explain 43.64% of the variance. Similar to the best fitting model of the detection task, the disparity channel s bandwidth octaves and the internal noise level was 1.15 deg2. When an upper limit of 0.04 deg2 was placed on the internal noise level the

93 87 binocular summation model provided a poor fit. With this restriction the model is only able to explain 12.68% of the variance. Overall we find binocular unmasking effects in the perception of noisemasked objects. In line with the binocular summation model we find the largest binocular unmasking effect if the spatial frequencies that are removed from the image are not important for binocular summation. However, even under these conditions the fits of the binocular summation model to our data suggests that the model is only able to account for part of the results. Given that natural objects rarely have a spatial frequency spectrum optimized for binocular unmasking it seems unlikely that binocular unmasking will play a significant role in everyday life situation. An exception might be stimuli that naturally contain a large amount of vertical spatial frequency components. It is noteworthy that our moderate to poor fit of the model in the categorization and identification experiments partly results from fitting a very constrained binocular summation model to our data. The binocular summation model assumes that task performance is based on the output of only one disparity channel. This assumption might be correct for detection tasks as detecting the presence of a stimulus does not require any information about the stimulus itself. A single channel therefore might provide sufficient information to determine whether an object is present. However a single disparity channel cannot provide sufficient information to unambiguously distinguish between different objects as several different objects might activate the same disparity channel to the same amount. Hence it seems likely that the categorization and

94 88 identification of an object most likely requires the integration of information across perceptual channels. The fitted binocular summation does not take this integration of information across perceptual channels into account for the categorization and identification experiments. It is therefore not surprising that we obtain a poor fit with the model in categorization and identification tasks. Another piece of evidence that points to the combination of the output of several perceptual channels in the categorization and identification tasks is that unmasking is observed with the image spatial frequency content of the shiftedfilter condition. The same spatial frequency content and additional spatial frequencies are present in the unfiltered condition. Here, however, no evidence of unmasking is found. If unmasking were based on a single perceptual channel, we would expect unmasking to occur in both conditions. The lack of unmasking in this condition can be understood when one assumes that the outputs of several perceptual channels interact when they are combined. Overall the aim of the present study was to investigate binocular unmasking with spectrally complex images showing natural objects in detection, categorization, and identification tasks. This investigation leads one to speculate about binocular unmasking of natural objects as occurring in real life conditions. Although far more complex research is needed in this regard, the results from the unfiltered condition might speak with respect to possible unmasking effects with natural objects in real life conditions. This is so, since these stimuli (compared to the other ones in this thesis) are most similar to natural objects and their frequency spectrum has not been altered. The results from the unmasking

95 89 condition suggest that unmasking might only occur for objects that consist of mainly vertical spatial frequencies because unmasking only occurred for the unfiltered house stimulus in the present experiment. No unmasking was found for any other stimulus. It is also less clear whether unmasking would occur for tasks other than detection. Based on the present results one would be inclined to conclude that binocular unmasking might be restricted to detection tasks since no unmasking was found with unfiltered stimuli in the categorization, and identification tasks. However, it is important to note that although the current experimental conditions were closer to natural viewing conditions than previous detection tasks with Gabor stimuli, large differences between the current experimental conditions and real-life viewing condition still exist. These occur both on the type of stimuli used as well as the tasks employed in the present experiments. For example, as will be outlined below the current stimuli differ from real life objects in several ways: opacity, three-dimensional extent, and mask type. Hence, to make more conclusive statements about the usefulness of binocular unmasking in the perception of cluttered natural objects more research is needed with experiments that simulate real-life situations and viewing demands more adequately. Visual processes involved in the detection, categorization, and identification of objects might differ in terms of their sensitivity. Our results also extend recent findings about the processing stages involved in object perception. These suggest that the detection and

96 90 categorization of objects share the same underlying visual processes while the identification of an object requires additional visual processes (Grill-Spector & Kanwisher, 2005). In particular, Grill-Spector and Kanwhisher (2005) found that object detection and categorization was nearly identical with regards to their accuracy and their response times suggesting that detection and categorization mechanisms work equally accurately and fast. Moreover it seemed that detection and categorization might be closely linked. When participants were given both a detection and categorization task for the same trial, a trial-by-trial analysis showed a high correlation between detection and categorization performance. In contrast, object identification always required more time and was less accurate than either detection or categorization. Taken together Grill-Spector and Kanwisher s suggested that objects detection and categorization might be based on the same mechanism and object identification is mediated by subsequent higher visual processes. If this were the case we expect BMLDs not to differ significantly between the detection and the categorization task. Yet, BMLDs of both the detection and categorization task should differ significantly from the identification task. Clearly our comparison of BMLDs across the three different tasks does not support this suggestion. These different results could be reconciled by assuming that visual processes involved in the detection and categorization of objects are partly independent and in close temporal vicinity. Recent ERP studies indirectly support the suggestion that visual processes dedicated to the detection and categorization of objects are closely temporally linked. Participants categorized

97 91 images as animal and non-animal images in a backward masking task with a stimulus presentation time of 6 ms (Bacon-Mace, Mace, Fabre-Thorpe, Thorpe, 2005). An inter-stimulus interval between image and mask of 31 ms was sufficient for participants to reach an accuracy level of about 75%. The detection and recognition of an object should therefore be completed within a time frame of approximately 37 ms. Hence, reaction time tasks might be not sufficiently sensitive to tease apart these two processes in the temporal domain. To dissociate between these two processes reaction times tasks should be sensitive enough to pick up differences of similar magnitude. An inspection of the Figures in Grill-Spector and Kanwisher (2005) study, however, suggest that standard errors of the reaction times were about ms. One cannot therefore rule out that the differences detection and reaction time tasks were too small to be picked up by a standard reaction time task. Taken together future studies dedicated to the unveiling of the processes underlying object perception should try to differentiate detection, categorization, and identification processes at least along three dimensions: time, accuracy, and sensitivity. Future directions The results of the present experiments point to several interesting topics that might be worthwhile investigating in future research. As noted, binocular summation appears to be most sensitive to vertical spatial frequencies. This orientation sensitivity of the binocular summation process can be simply investigated with a binocular unmasking task in which the

98 92 detection of a Gabor pattern, is measured as a function of the Gabor s orientation. If the binocular summation is most sensitive to vertical spatial frequencies, one would expect the largest BMLD with vertically orientated Gabors and smaller BMLDs with Gabors whose orientation deviates from vertical meridian. Future research could investigate binocular summation with simulated real-life objects (e.g. in virtual environments). Although the present study investigated binocular summation with more ecological stimuli, our stimuli still differ in three major ways from real-life objects. First, our objects always appear as being transparent. This corresponds to a natural viewing condition in which the observer, say, looks at the reflection of an object in a window. Beyond that, of course, objects rarely appear transparent in real life. To further examine the ecological validity of the binocular summation process, it would be interesting to determine whether the effects of binocular summation on task performance are also found with non-transparent (solid) objects. Secondly, objects were masked by visual noise in the present study. This corresponds, for example, to an everyday viewing condition in which the observer has to perceive the object in fog. Often however objects are perceived amidst other objects in real-life situations, which possibly mask the object. Future research could investigate whether binocular summation is also beneficial in the perception of an object that is masked by other objects. Third, real objects have a three-dimensional extent and therefore activate disparity channels tuned to different disparities. To date it is not known whether and in which way the current binocular summation model

99 93 has to be modified in order to account for future binocular unmasking data of three-dimensional objects. The fit of the binocular summation model to the data has shown that the simple binocular summation model does not account well for the perception of spectrally complex images. In particular, as outlined above it seems unlikely that the visual system is only monitoring the disparity channel with the highest S/N ratio in categorization and identification tasks. The current findings therefore open up the question of how the outputs of disparity channels are combined when spectrally complex images are categorized and identified. Research directed at this issue might help to close the gap between our knowledge about lower order visual processes and higher order visual processes. Furthermore this research would aid our knowledge about how the output of monocular channels is combined when spectrally complex images are viewed. Note that irrespective of the ecological validity of the binocular summation process, the further investigation of binocular summation therefore might enhance our understanding of object recognition.

100 94 References Ahumada, A. J., & Beard, B. L. (1996). Object detection in a noisy scene. In Rogowitz, B. & Allebach, J (eds.). Human Vision, Visual Processing, and Digital Display VII, SPIE Proceedings, 2657, Bellingham, WA: SPIE. Bacon-Mace, N., Mace, M.J.-M., Fabre-Thorpe, M., Thorpe, S.J. (2005). The time course of visual processing: Backward masking and natural scene categorization. Vision Research, 45, Biederman, I. (1985). Human image understanding: recent research and a theory. Computer Vision, Graphics, And Image Processing, 32, Blake, R., & Fox, R. (1973). The psychophysical inquiry into binocular summation. Perception & Psychophysics, 14, Blake, R., Sloane, M., & Fox, R. (1981). Further developments in binocular summation. Perception & Psychophysics, 30, Campbell, F. W., & Green, D. G. (1965). Monocular versus binocular visual acuity. Nature, 208, Ebenholz, S.M., Walchli, R.M. (1975) Stereoscopic thresholds as a function of head- and object-orientation. Vision Research, 5,

101 95 Farah, M.J. (1990). Visual agnosia: disorders of object recognition and what they tell us about normal vision. The MIT Press: Massachusetts. Farell, B (1998) Two-dimensional matches from one-dimensional stimulus components in human stereopsis. Nature, 395, Farell, B. (2006). Orientation-specific computation in stereoscopic vision. Journal of Neuroscience. 26, Foulkes, A. J. M., & Parker, A. J. (2003). The effect of absolute and relative disparity noise on stereoacuity. Journal of Vision, 3, 64. Frisby J. P., Mayhew, J. E. W., King-Smith, P.E., Marr, D., Ruddock, K.H. (1978). Spatial Frequency Tuned Channels: Implications for Structure and Function from. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 290, Grill-Spector, K, & Kanwisher, N. (2005). Visual recognition: as soon as you know it is there, you know what it is. Psychological Science, 16, He, Z. J. & Nakayama, K. (1992). Surfaces versus features in visual search. Nature, 359,

102 96 He, Z.J. & Nakayama, K. (1995). Visual attention to surfaces in threedimensional space. Proceedings to the National Academy of Science USA, 92, Henning G. B., & Hertz, B. G. (1973). Binocular masking level differences in sinusoidal grating detection. Vision Research, 13, Henning G. B., & Hertz, B. G. (1977). The influence of bandwidth and temporal properties of spatial noise on binocular masking-level differences. Vision Research, 17, Jolicoeur, P., Gluck, M. A., & Kosslyn, S. M. (1984). Pictures and names: making the connection. Cognitive Psychology, 16, Julesz, B. (1964). Binocular depth perception without familiarity cues. Science, 145, Marr, D, & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society London: Series B Biological Sciences, 200,

103 97 Mayhew, J. E. W., & Frisby, J. P. (1976) Rivalrous texture stereograms. Nature, 264, Mayhew, J. E. W., & Frisby, J. P. (1978) Texture discrimination and Fourier analysis in human vision. Nature, 275, Moraglia, G., & Schneider, B. A. (1990). Effects of direction and magnitude of horizontal disparities on binocular unmasking. Perception, 19, Moraglia, G., & Schneider, B. A. (1991). Binocular unmasking with vertical disparity. Canadian Journal of Psychology. 45, Moraglia, G., & Schneider, B. A. (1992). On binocular unmasking of signals in noise: Further tests of the summation hypothesis. Vision Research, 32, Nakayama, K., He, Z.J., & Shimojo, S. (1995). Visual surface representation: a critical link between lower-level and higher-level vision. In Kosslyn, S.M. & Osherson, D.N. (Eds.), An Invitation To Cognitive Science: Visual cognition (pp. 1 70). Cambridge, MA: MIT Press. Ogle, K.N. (1955). Stereopsis and vertical disparity. Archives of Ophthalmology. 53,

104 98 Ogle, K.N. (1961). Optics. Thomas, Springfield, IL. Ohzawa, I.; & Freeman, R.D. (1986a). The binocular organization of simple cells in the cat's visual cortex. Journal of Neurophysiology, 56, Ohzawa, I.; & Freeman, R.D. (1986b). The binocular organization of complex cells in the cat's visual cortex. Journal of Neurophysiology, 56, Oruç, I., Landy, M., & Pelli, D. (2006). Noise masking reveals channels for second-order letters. Vision Research. 46, Parker, A. J., Johnston, E.B., Mansfield, J.S., Yang, Y. (1991). Stereo surfaces and the shape. In Landy, M.S., Movshon, J.A. (Eds.). Computational models of visual processing MIT Press, Cambridge, MA. Peissig, J. J., & Tarr, M. J. (2007). Visual object recognition. Do we know more now than we did 20 years ago? Annual Review in Psychology, 58, Qin, D., Takamatsu, M., Nakashima, Y. (2006). Disparity limit on binocular fusion in fovea. Optical Review, 13,

105 99 Rosch, E., Mervis, C.B., Gray, W. D., Johnson, D. M., & Boyes, Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, Schneider, B. A., & Morgalia, G., Jepson, A. (1989). Binocular unmasking: an analogue to binaural unmasking. Science, 243, Schneider, B. A., & Morgalia, G. (1992). Binocular unmasking with unequal interocular contrast: the case for multiple cyclopean eyes. Perception & Psychophysics, 52, Schneider, B. A., & Morgalia, G. (1994). Binocular enhances target detection by filtering the background. Perception, 23, Schneider, B., Moraglia, G, & Speranza, F. (1999). Binocular vision enhances phase discrimination by filtering the background. Perception & Psychophysics., 61, Speranza, F, Moraglia, G., & Schneider, B. A. (1995). Age-related changes in binocular vision: detection of noise-masked targets in young and old observers. Journal of Gerontology: Psychological Sciences, 50B,

106 100 Speranza, F, Moraglia, G., & Schneider, B. A. (2001). Binocular detection of masked patterns in young and old observers. Psychology and Aging, 16, Stromeyer, C.F., & Julesz, B. (1972) Spatial frequency masking in vision: critical bands and spread of masking. Journal of the Optical Society of America, A, 2, Thorpe, S., Fize, D. Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, Westheimer, G. (1979). Cooperative neural processes involved in stereoscopic acuity. Experimental Brain Research, 36, Westheimer G., Mitchell D.E. (1956). Eye movement responses to convergence stimuli. Archives of Ophthalmology, 55, Wichmann, F.A., Hill, N. J. (2001). The psychometric function I: Fitting, sampling, and the goodness of fit. Perception & Psychophysics, 63,

107 101 Appendix Stimuli used in the Experiments I-III The stimuli of each filter and disparity condition are shown along with their normalized power spectrum. The objects are displayed along rows. The filter conditions are presented along the columns. The disparity conditions are shown on different panels. In each cell, the object is shown on the left hand side and the normalized power spectrum is shown on the right hand side. The normalized power spectrum is displayed in db. The dc component is in the center of the power spectrum. The spatial frequency increases as the distance to the dc component increases. Objects: 6.76 arcmin condition Unfiltered Shifted filter Non-shifted filter

108 102

109 103 Objects: arcmin condition Unfiltered Shifted filter Non-shifted filter

110 104 Objects: arcmin condition Unfiltered Shifted filter Non-shifted filter

lecture 10 - depth from blur, binocular stereo

lecture 10 - depth from blur, binocular stereo This lecture carries forward some of the topics from early in the course, namely defocus blur and binocular disparity. The main emphasis here will be on the information these cues carry about depth, rather

More information

Binocular Stereo Vision

Binocular Stereo Vision Binocular Stereo Vision Properties of human stereo vision Marr-Poggio-Grimson multi-resolution stereo algorithm CS332 Visual Processing Department of Computer Science Wellesley College Properties of human

More information

COMP 558 lecture 22 Dec. 1, 2010

COMP 558 lecture 22 Dec. 1, 2010 Binocular correspondence problem Last class we discussed how to remap the pixels of two images so that corresponding points are in the same row. This is done by computing the fundamental matrix, defining

More information

Neurophysical Model by Barten and Its Development

Neurophysical Model by Barten and Its Development Chapter 14 Neurophysical Model by Barten and Its Development According to the Barten model, the perceived foveal image is corrupted by internal noise caused by statistical fluctuations, both in the number

More information

Computational Foundations of Cognitive Science

Computational Foundations of Cognitive Science Computational Foundations of Cognitive Science Lecture 16: Models of Object Recognition Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk February 23, 2010 Frank Keller Computational

More information

Binocular cues to depth PSY 310 Greg Francis. Lecture 21. Depth perception

Binocular cues to depth PSY 310 Greg Francis. Lecture 21. Depth perception Binocular cues to depth PSY 310 Greg Francis Lecture 21 How to find the hidden word. Depth perception You can see depth in static images with just one eye (monocular) Pictorial cues However, motion and

More information

Basic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues

Basic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues Distance, depth, and 3D shape cues Pictorial depth cues: familiar size, relative size, brightness, occlusion, shading and shadows, aerial/ atmospheric perspective, linear perspective, height within image,

More information

Important concepts in binocular depth vision: Corresponding and non-corresponding points. Depth Perception 1. Depth Perception Part II

Important concepts in binocular depth vision: Corresponding and non-corresponding points. Depth Perception 1. Depth Perception Part II Depth Perception Part II Depth Perception 1 Binocular Cues to Depth Depth Information Oculomotor Visual Accomodation Convergence Binocular Monocular Static Cues Motion Parallax Perspective Size Interposition

More information

THE APERTURE PROBLEM IN STEREOPSIS

THE APERTURE PROBLEM IN STEREOPSIS THE APERTURE PROBLEM IN STEREOPSIS M.J. MORGAN & E. CASTET2 Department of Visual Science, Institute of Ophthalmology, Bath Street, London ECV 9EL, England 2 Laboratoire de Psychophysique,, Université Louis

More information

The Absence of Depth Constancy in Contour Stereograms

The Absence of Depth Constancy in Contour Stereograms Framingham State University Digital Commons at Framingham State University Psychology Faculty Publications Psychology Department 2001 The Absence of Depth Constancy in Contour Stereograms Dawn L. Vreven

More information

Size-disparity correlation in stereopsis at contrast threshold

Size-disparity correlation in stereopsis at contrast threshold H. S. Smallman and D.. A. MacLeod Vol. 11, No. 8/August 1994/J. Opt. Soc. Am. A 2169 Size-disparity correlation in stereopsis at contrast threshold Harvey S. Smallman and Donald. A. MacLeod Department

More information

Using surface markings to enhance accuracy and stability of object perception in graphic displays

Using surface markings to enhance accuracy and stability of object perception in graphic displays Using surface markings to enhance accuracy and stability of object perception in graphic displays Roger A. Browse a,b, James C. Rodger a, and Robert A. Adderley a a Department of Computing and Information

More information

Representing 3D Objects: An Introduction to Object Centered and Viewer Centered Models

Representing 3D Objects: An Introduction to Object Centered and Viewer Centered Models Representing 3D Objects: An Introduction to Object Centered and Viewer Centered Models Guangyu Zhu CMSC828J Advanced Topics in Information Processing: Approaches to Representing and Recognizing Objects

More information

Perception, Part 2 Gleitman et al. (2011), Chapter 5

Perception, Part 2 Gleitman et al. (2011), Chapter 5 Perception, Part 2 Gleitman et al. (2011), Chapter 5 Mike D Zmura Department of Cognitive Sciences, UCI Psych 9A / Psy Beh 11A February 27, 2014 T. M. D'Zmura 1 Visual Reconstruction of a Three-Dimensional

More information

Artifacts and Textured Region Detection

Artifacts and Textured Region Detection Artifacts and Textured Region Detection 1 Vishal Bangard ECE 738 - Spring 2003 I. INTRODUCTION A lot of transformations, when applied to images, lead to the development of various artifacts in them. In

More information

Robert Collins CSE486, Penn State Lecture 08: Introduction to Stereo

Robert Collins CSE486, Penn State Lecture 08: Introduction to Stereo Lecture 08: Introduction to Stereo Reading: T&V Section 7.1 Stereo Vision Inferring depth from images taken at the same time by two or more cameras. Basic Perspective Projection Scene Point Perspective

More information

Optimal disparity estimation in natural stereo- images (Supplement) Johannes Burge & Wilson S. Geisler

Optimal disparity estimation in natural stereo- images (Supplement) Johannes Burge & Wilson S. Geisler Optimal disparity estimation in natural stereo- images (Supplement) Johannes Burge & Wilson S. Geisler Computing the log likelihood with populations of simple and complex cells Simple cells in visual cortex

More information

Human Perception of Objects

Human Perception of Objects Human Perception of Objects Early Visual Processing of Spatial Form Defined by Luminance, Color, Texture, Motion, and Binocular Disparity David Regan York University, Toronto University of Toronto Sinauer

More information

Perceived visual direction near an occluder

Perceived visual direction near an occluder Vision Research 39 (1999) 4085 4097 www.elsevier.com/locate/visres Perceived visual direction near an occluder Raymond van Ee a,b, *, Martin S. Banks a,c, Benjamin T. Backus a,d a School of Optometry and

More information

Multidimensional image retargeting

Multidimensional image retargeting Multidimensional image retargeting 9:00: Introduction 9:10: Dynamic range retargeting Tone mapping Apparent contrast and brightness enhancement 10:45: Break 11:00: Color retargeting 11:30: LDR to HDR 12:20:

More information

Bio-inspired Binocular Disparity with Position-Shift Receptive Field

Bio-inspired Binocular Disparity with Position-Shift Receptive Field Bio-inspired Binocular Disparity with Position-Shift Receptive Field Fernanda da C. e C. Faria, Jorge Batista and Helder Araújo Institute of Systems and Robotics, Department of Electrical Engineering and

More information

Processing Framework Proposed by Marr. Image

Processing Framework Proposed by Marr. Image Processing Framework Proposed by Marr Recognition 3D structure; motion characteristics; surface properties Shape From stereo Motion flow Shape From motion Color estimation Shape From contour Shape From

More information

Stereo Matching Precedes Dichoptic Masking

Stereo Matching Precedes Dichoptic Masking Pergamon Vision Res. Vol. 34, No. 8, pp. 1047-1060, 1994 Copyright 1994 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0042-6989/94 $6.00 + 0.00 Stereo Matching Precedes Dichoptic Masking

More information

USE R/G GLASSES. Binocular Combina0on of Images. What can happen when images are combined by the two eyes?

USE R/G GLASSES. Binocular Combina0on of Images. What can happen when images are combined by the two eyes? Binocular Combina0on of Images 3D D USE R/G GLASSES What can happen when images are combined by the two eyes? Fusion- when images are similar and disparity is small Monocular Suppression / Rivalry- when

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

A SYNOPTIC ACCOUNT FOR TEXTURE SEGMENTATION: FROM EDGE- TO REGION-BASED MECHANISMS

A SYNOPTIC ACCOUNT FOR TEXTURE SEGMENTATION: FROM EDGE- TO REGION-BASED MECHANISMS A SYNOPTIC ACCOUNT FOR TEXTURE SEGMENTATION: FROM EDGE- TO REGION-BASED MECHANISMS Enrico Giora and Clara Casco Department of General Psychology, University of Padua, Italy Abstract Edge-based energy models

More information

Realtime 3D Computer Graphics Virtual Reality

Realtime 3D Computer Graphics Virtual Reality Realtime 3D Computer Graphics Virtual Reality Human Visual Perception The human visual system 2 eyes Optic nerve: 1.5 million fibers per eye (each fiber is the axon from a neuron) 125 million rods (achromatic

More information

Introduction to visual computation and the primate visual system

Introduction to visual computation and the primate visual system Introduction to visual computation and the primate visual system Problems in vision Basic facts about the visual system Mathematical models for early vision Marr s computational philosophy and proposal

More information

Stereoscopic matching and the aperture problem

Stereoscopic matching and the aperture problem Chapter 2 Stereoscopic matching and the aperture problem Abstract In order to perceive stereoscopic depth, the visual system must define binocular disparities. Consider an oblique line seen through an

More information

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS This chapter presents a computational model for perceptual organization. A figure-ground segregation network is proposed based on a novel boundary

More information

Reliable disparity estimation through selective integration

Reliable disparity estimation through selective integration Visual Neuroscience (1998), 15, 511 528. Printed in the USA. Copyright 1998 Cambridge University Press 0952-5238098 $12.50 Reliable disparity estimation through selective integration MICHAEL S. GRAY, 1,2

More information

Practice Exam Sample Solutions

Practice Exam Sample Solutions CS 675 Computer Vision Instructor: Marc Pomplun Practice Exam Sample Solutions Note that in the actual exam, no calculators, no books, and no notes allowed. Question 1: out of points Question 2: out of

More information

Stereoscopic matching and the aperture problem

Stereoscopic matching and the aperture problem Perception, 2004, volume 33, pages 769 ^ 787 DOI:10.1068/p5263 Stereoscopic matching and the aperture problem Loes C J van Dam, Raymond van Eeô Helmholtz Institute, Utrecht University, Princetonplein 5,

More information

The 'uniqueness constraint 7. and binocular masking

The 'uniqueness constraint 7. and binocular masking Perception, 1995, volume 24, pages 49-65 The 'uniqueness constraint 7 and binocular masking Suzanne P McKee, Mary J Bravo, Harvey S Smallman Smith-Kettlewell Eye Research Institute, 2232 Webster Street,

More information

Object perception by primates

Object perception by primates Object perception by primates How does the visual system put together the fragments to form meaningful objects? The Gestalt approach The whole differs from the sum of its parts and is a result of perceptual

More information

What is the Depth of a Sinusoidal Grating?

What is the Depth of a Sinusoidal Grating? ournal of Vision (04) 4, 1-3 http://journalofvision.org/4/1/1/ 1 What is the Depth of a Sinusoidal Grating? Suzanne P. McKee Preeti Verghese art Farell Smith-Kettlewell Eye Research Institute 2318 Fillmore

More information

Correcting User Guided Image Segmentation

Correcting User Guided Image Segmentation Correcting User Guided Image Segmentation Garrett Bernstein (gsb29) Karen Ho (ksh33) Advanced Machine Learning: CS 6780 Abstract We tackle the problem of segmenting an image into planes given user input.

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Stereo Vision 2 Inferring 3D from 2D Model based pose estimation single (calibrated) camera Stereo

More information

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection - 2. Importance of edge detection in computer vision Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

More information

Project 4 Results. Representation. Data. Learning. Zachary, Hung-I, Paul, Emanuel. SIFT and HoG are popular and successful.

Project 4 Results. Representation. Data. Learning. Zachary, Hung-I, Paul, Emanuel. SIFT and HoG are popular and successful. Project 4 Results Representation SIFT and HoG are popular and successful. Data Hugely varying results from hard mining. Learning Non-linear classifier usually better. Zachary, Hung-I, Paul, Emanuel Project

More information

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi Journal of Asian Scientific Research, 013, 3(1):68-74 Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 FEATURES COMPOSTON FOR PROFCENT AND REAL TME RETREVAL

More information

Chapter 7. Widely Tunable Monolithic Laser Diodes

Chapter 7. Widely Tunable Monolithic Laser Diodes Chapter 7 Widely Tunable Monolithic Laser Diodes We have seen in Chapters 4 and 5 that the continuous tuning range λ is limited by λ/λ n/n g, where n is the index change and n g the group index of the

More information

Stereovision. Binocular disparity

Stereovision. Binocular disparity Stereovision Binocular disparity Retinal correspondence Uncrossed disparity Horoptor Crossed disparity Horoptor, crossed and uncrossed disparity Wheatsteone stereoscope (c. 1838) Red-green anaglyph How

More information

Think-Pair-Share. What visual or physiological cues help us to perceive 3D shape and depth?

Think-Pair-Share. What visual or physiological cues help us to perceive 3D shape and depth? Think-Pair-Share What visual or physiological cues help us to perceive 3D shape and depth? [Figure from Prados & Faugeras 2006] Shading Focus/defocus Images from same point of view, different camera parameters

More information

Stereo Vision, Models of

Stereo Vision, Models of Stereo Vision, Models of Synonyms Models of stereopsis, models of 3D vision. Definition Stereo vision refers to the perception of depth based on slight disparities between the images seen by the two eyes,

More information

17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES

17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES 17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES The Current Building Codes Use the Terminology: Principal Direction without a Unique Definition 17.1 INTRODUCTION { XE "Building Codes" }Currently

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

On the road to invariant object recognition: How cortical area V2 transforms absolute to relative disparity during 3D vision

On the road to invariant object recognition: How cortical area V2 transforms absolute to relative disparity during 3D vision On the road to invariant object recognition: How cortical area V2 transforms absolute to relative disparity during 3D vision Stephen Grossberg, Karthik Srinivasan, and Arash Yazdanbakhsh Center for Adaptive

More information

A Hierarchial Model for Visual Perception

A Hierarchial Model for Visual Perception A Hierarchial Model for Visual Perception Bolei Zhou 1 and Liqing Zhang 2 1 MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, and Department of Biomedical Engineering, Shanghai

More information

An Accurate Method for Skew Determination in Document Images

An Accurate Method for Skew Determination in Document Images DICTA00: Digital Image Computing Techniques and Applications, 1 January 00, Melbourne, Australia. An Accurate Method for Skew Determination in Document Images S. Lowther, V. Chandran and S. Sridharan Research

More information

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri] Correspondence and Stereopsis Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri] Introduction Disparity: Informally: difference between two pictures Allows us to gain a strong

More information

Visual Secret Sharing Scheme with Autostereogram*

Visual Secret Sharing Scheme with Autostereogram* Visual Secret Sharing Scheme with Autostereogram* Feng Yi, Daoshun Wang** and Yiqi Dai Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China Abstract. Visual secret

More information

Depth cue integration: stereopsis and image blur

Depth cue integration: stereopsis and image blur Vision Research 40 (2000) 3501 3506 www.elsevier.com/locate/visres Depth cue integration: stereopsis and image blur George Mather *, David R.R. Smith Laboratory of Experimental Psychology, Biology School,

More information

(0, 1, 1) (0, 1, 1) (0, 1, 0) What is light? What is color? Terminology

(0, 1, 1) (0, 1, 1) (0, 1, 0) What is light? What is color? Terminology lecture 23 (0, 1, 1) (0, 0, 0) (0, 0, 1) (0, 1, 1) (1, 1, 1) (1, 1, 0) (0, 1, 0) hue - which ''? saturation - how pure? luminance (value) - intensity What is light? What is? Light consists of electromagnetic

More information

Spatial-scale contribution to the detection of mirror symmetry in fractal noise

Spatial-scale contribution to the detection of mirror symmetry in fractal noise 2112 J. Opt. Soc. Am. A/Vol. 16, No. 9/September 1999 S. J. M. Rainville and F. A. A. Kingdom Spatial-scale contribution to the detection of mirror symmetry in fractal noise Stéphane J. M. Rainville and

More information

Da Vinci decoded: Does da Vinci stereopsis rely on disparity?

Da Vinci decoded: Does da Vinci stereopsis rely on disparity? Journal of Vision (2012) 12(12):2, 1 17 http://www.journalofvision.org/content/12/12/2 1 Da Vinci decoded: Does da Vinci stereopsis rely on disparity? Centre for Vision Research, Department of Psychology,

More information

Locating ego-centers in depth for hippocampal place cells

Locating ego-centers in depth for hippocampal place cells 204 5th Joint Symposium on Neural Computation Proceedings UCSD (1998) Locating ego-centers in depth for hippocampal place cells Kechen Zhang,' Terrence J. Sejeowski112 & Bruce L. ~cnau~hton~ 'Howard Hughes

More information

Saliency Detection for Videos Using 3D FFT Local Spectra

Saliency Detection for Videos Using 3D FFT Local Spectra Saliency Detection for Videos Using 3D FFT Local Spectra Zhiling Long and Ghassan AlRegib School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA ABSTRACT

More information

Prof. Feng Liu. Spring /27/2014

Prof. Feng Liu. Spring /27/2014 Prof. Feng Liu Spring 2014 http://www.cs.pdx.edu/~fliu/courses/cs510/ 05/27/2014 Last Time Video Stabilization 2 Today Stereoscopic 3D Human depth perception 3D displays 3 Stereoscopic media Digital Visual

More information

Mahdi Amiri. May Sharif University of Technology

Mahdi Amiri. May Sharif University of Technology Course Presentation Multimedia Systems 3D Technologies Mahdi Amiri May 2014 Sharif University of Technology Binocular Vision (Two Eyes) Advantages A spare eye in case one is damaged. A wider field of view

More information

A Coarse-to-Fine Disparity Energy Model with Both Phase-Shift and Position-Shift Receptive Field Mechanisms

A Coarse-to-Fine Disparity Energy Model with Both Phase-Shift and Position-Shift Receptive Field Mechanisms LETTER Communicated by Hugh Wilson A Coarse-to-Fine Disparity Energy Model with Both Phase-Shift and Position-Shift Receptive Field Mechanisms Yuzhi Chen chen@mail.cps.utexas.edu Ning Qian nq6@columbia.edu

More information

Final Exam Study Guide

Final Exam Study Guide Final Exam Study Guide Exam Window: 28th April, 12:00am EST to 30th April, 11:59pm EST Description As indicated in class the goal of the exam is to encourage you to review the material from the course.

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Physics 1CL WAVE OPTICS: INTERFERENCE AND DIFFRACTION Fall 2009

Physics 1CL WAVE OPTICS: INTERFERENCE AND DIFFRACTION Fall 2009 Introduction An important property of waves is interference. You are familiar with some simple examples of interference of sound waves. This interference effect produces positions having large amplitude

More information

Adaptive Fingerprint Image Enhancement Techniques and Performance Evaluations

Adaptive Fingerprint Image Enhancement Techniques and Performance Evaluations Adaptive Fingerprint Image Enhancement Techniques and Performance Evaluations Kanpariya Nilam [1], Rahul Joshi [2] [1] PG Student, PIET, WAGHODIYA [2] Assistant Professor, PIET WAGHODIYA ABSTRACT: Image

More information

Resolution of Temporal-Multiplexing and Spatial-Multiplexing Stereoscopic Televisions

Resolution of Temporal-Multiplexing and Spatial-Multiplexing Stereoscopic Televisions Current Optics and Photonics Vol. 1, No. 1, February 2017, pp. 34-44 ISSN: 2508-7266(Print) / ISSN: 2508-7274(Online) DOI: https://doi.org/10.3807/copp.2017.1.1.034 Resolution of Temporal-Multiplexing

More information

Experimental Humanities II. Eye-Tracking Methodology

Experimental Humanities II. Eye-Tracking Methodology Experimental Humanities II Eye-Tracking Methodology Course outline 22.3. Introduction to Eye-Tracking + Lab 1 (DEMO & stimuli creation) 29.3. Setup and Calibration for Quality Data Collection + Lab 2 (Calibration

More information

Why study Computer Vision?

Why study Computer Vision? Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications building representations of the 3D world from pictures automated surveillance (who s doing what)

More information

Stereoscopic transparency: a test for binocular vision s disambiguating power 1

Stereoscopic transparency: a test for binocular vision s disambiguating power 1 Vision Research 38 (1998) 2913 2932 Stereoscopic transparency: a test for binocular vision s disambiguating power 1 Sergei Gepshtein 2, Alexander Cooperman * Brain Research Center, Department of Neurobiology,

More information

Texture Segmentation Using Gabor Filters

Texture Segmentation Using Gabor Filters TEXTURE SEGMENTATION USING GABOR FILTERS 1 Texture Segmentation Using Gabor Filters Khaled Hammouda Prof. Ed Jernigan University of Waterloo, Ontario, Canada Abstract Texture segmentation is the process

More information

COLOR FIDELITY OF CHROMATIC DISTRIBUTIONS BY TRIAD ILLUMINANT COMPARISON. Marcel P. Lucassen, Theo Gevers, Arjan Gijsenij

COLOR FIDELITY OF CHROMATIC DISTRIBUTIONS BY TRIAD ILLUMINANT COMPARISON. Marcel P. Lucassen, Theo Gevers, Arjan Gijsenij COLOR FIDELITY OF CHROMATIC DISTRIBUTIONS BY TRIAD ILLUMINANT COMPARISON Marcel P. Lucassen, Theo Gevers, Arjan Gijsenij Intelligent Systems Lab Amsterdam, University of Amsterdam ABSTRACT Performance

More information

Stereospatial masking and aftereffect with normal and transformed random-dot patterns*

Stereospatial masking and aftereffect with normal and transformed random-dot patterns* Perception & Psychophysics 1974, Vot15,No.2,24~248 Stereospatial masking and aftereffect with normal and transformed random-dot patterns* NIGEL LONG and RAY OVERt University ofqueensland. S1. Lucill4067,

More information

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016 edestrian Detection Using Correlated Lidar and Image Data EECS442 Final roject Fall 2016 Samuel Rohrer University of Michigan rohrer@umich.edu Ian Lin University of Michigan tiannis@umich.edu Abstract

More information

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VI (Nov Dec. 2014), PP 29-33 Analysis of Image and Video Using Color, Texture and Shape Features

More information

PERCEIVING DEPTH AND SIZE

PERCEIVING DEPTH AND SIZE PERCEIVING DEPTH AND SIZE DEPTH Cue Approach Identifies information on the retina Correlates it with the depth of the scene Different cues Previous knowledge Slide 3 Depth Cues Oculomotor Monocular Binocular

More information

The efficiency of depth discrimination for non-transparent and transparent stereoscopic surfaces

The efficiency of depth discrimination for non-transparent and transparent stereoscopic surfaces Vision Research 44 (2004) 2253 2267 www.elsevier.com/locate/visres The efficiency of depth discrimination for non-transparent and transparent stereoscopic surfaces Julian Michael Wallace *, Pascal Mamassian

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

CS 523: Multimedia Systems

CS 523: Multimedia Systems CS 523: Multimedia Systems Angus Forbes creativecoding.evl.uic.edu/courses/cs523 Today - Convolutional Neural Networks - Work on Project 1 http://playground.tensorflow.org/ Convolutional Neural Networks

More information

This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane?

This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane? Intersecting Circles This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane? This is a problem that a programmer might have to solve, for example,

More information

Does the Brain do Inverse Graphics?

Does the Brain do Inverse Graphics? Does the Brain do Inverse Graphics? Geoffrey Hinton, Alex Krizhevsky, Navdeep Jaitly, Tijmen Tieleman & Yichuan Tang Department of Computer Science University of Toronto How to learn many layers of features

More information

Lecture 6: Edge Detection

Lecture 6: Edge Detection #1 Lecture 6: Edge Detection Saad J Bedros sbedros@umn.edu Review From Last Lecture Options for Image Representation Introduced the concept of different representation or transformation Fourier Transform

More information

Poor visibility of motion in depth is due to early motion averaging

Poor visibility of motion in depth is due to early motion averaging Vision Research 43 (2003) 385 392 www.elsevier.com/locate/visres Poor visibility of motion in depth is due to early motion averaging Julie M. Harris a, *, Simon K. Rushton b,c a Department of Psychology,

More information

CS 787: Assignment 4, Stereo Vision: Block Matching and Dynamic Programming Due: 12:00noon, Fri. Mar. 30, 2007.

CS 787: Assignment 4, Stereo Vision: Block Matching and Dynamic Programming Due: 12:00noon, Fri. Mar. 30, 2007. CS 787: Assignment 4, Stereo Vision: Block Matching and Dynamic Programming Due: 12:00noon, Fri. Mar. 30, 2007. In this assignment you will implement and test some simple stereo algorithms discussed in

More information

Computational Models of V1 cells. Gabor and CORF

Computational Models of V1 cells. Gabor and CORF 1 Computational Models of V1 cells Gabor and CORF George Azzopardi Nicolai Petkov 2 Primary visual cortex (striate cortex or V1) 3 Types of V1 Cells Hubel and Wiesel, Nobel Prize winners Three main types

More information

Visual Acuity. Adler s Physiology of the Eye 11th Ed. Chapter 33 - by Dennis Levi.

Visual Acuity. Adler s Physiology of the Eye 11th Ed. Chapter 33 - by Dennis Levi. Visual Acuity Adler s Physiology of the Eye 11th Ed. Chapter 33 - by Dennis Levi http://www.mcgill.ca/mvr/resident/ Visual Acuity Keeness of Sight, possible to be defined in different ways Minimum Visual

More information

IMPORTANT INSTRUCTIONS

IMPORTANT INSTRUCTIONS 2017 Imaging Science Ph.D. Qualifying Examination June 9, 2017 9:00AM to 12:00PM IMPORTANT INSTRUCTIONS You must complete two (2) of the three (3) questions given for each of the core graduate classes.

More information

Spatial Enhancement Definition

Spatial Enhancement Definition Spatial Enhancement Nickolas Faust The Electro- Optics, Environment, and Materials Laboratory Georgia Tech Research Institute Georgia Institute of Technology Definition Spectral enhancement relies on changing

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Stereo Vision 2 Inferring 3D from 2D Model based pose estimation single (calibrated) camera > Can

More information

Filtering Images. Contents

Filtering Images. Contents Image Processing and Data Visualization with MATLAB Filtering Images Hansrudi Noser June 8-9, 010 UZH, Multimedia and Robotics Summer School Noise Smoothing Filters Sigmoid Filters Gradient Filters Contents

More information

Natural Viewing 3D Display

Natural Viewing 3D Display We will introduce a new category of Collaboration Projects, which will highlight DoCoMo s joint research activities with universities and other companies. DoCoMo carries out R&D to build up mobile communication,

More information

BINOCULAR DISPARITY AND DEPTH CUE OF LUMINANCE CONTRAST. NAN-CHING TAI National Taipei University of Technology, Taipei, Taiwan

BINOCULAR DISPARITY AND DEPTH CUE OF LUMINANCE CONTRAST. NAN-CHING TAI National Taipei University of Technology, Taipei, Taiwan N. Gu, S. Watanabe, H. Erhan, M. Hank Haeusler, W. Huang, R. Sosa (eds.), Rethinking Comprehensive Design: Speculative Counterculture, Proceedings of the 19th International Conference on Computer- Aided

More information

OPTIMIZING A VIDEO PREPROCESSOR FOR OCR. MR IBM Systems Dev Rochester, elopment Division Minnesota

OPTIMIZING A VIDEO PREPROCESSOR FOR OCR. MR IBM Systems Dev Rochester, elopment Division Minnesota OPTIMIZING A VIDEO PREPROCESSOR FOR OCR MR IBM Systems Dev Rochester, elopment Division Minnesota Summary This paper describes how optimal video preprocessor performance can be achieved using a software

More information

Binocular Stereo Vision

Binocular Stereo Vision Binocular Stereo Vision Neural processing of stereo disparity Marr-Poggio-Grimson multi-resolution stereo algorithm Stereo vision applications CS332 Visual Processing Department of Computer Science Wellesley

More information

QUALITY, QUANTITY AND PRECISION OF DEPTH PERCEPTION IN STEREOSCOPIC DISPLAYS

QUALITY, QUANTITY AND PRECISION OF DEPTH PERCEPTION IN STEREOSCOPIC DISPLAYS QUALITY, QUANTITY AND PRECISION OF DEPTH PERCEPTION IN STEREOSCOPIC DISPLAYS Alice E. Haines, Rebecca L. Hornsey and Paul B. Hibbard Department of Psychology, University of Essex, Wivenhoe Park, Colchester

More information

Color Universal Design without Restricting Colors and Their Combinations Using Lightness Contrast Dithering

Color Universal Design without Restricting Colors and Their Combinations Using Lightness Contrast Dithering Color Universal Design without Restricting Colors and Their Combinations Using Lightness Contrast Dithering Yuki Omori*, Tamotsu Murakami**, Takumi Ikeda*** * The University of Tokyo, omori812@mail.design.t.u-tokyo.ac.jp

More information

A Survey of Light Source Detection Methods

A Survey of Light Source Detection Methods A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

Foveated Vision and Object Recognition on Humanoid Robots

Foveated Vision and Object Recognition on Humanoid Robots Foveated Vision and Object Recognition on Humanoid Robots Aleš Ude Japan Science and Technology Agency, ICORP Computational Brain Project Jožef Stefan Institute, Dept. of Automatics, Biocybernetics and

More information

(Refer Slide Time: 0:51)

(Refer Slide Time: 0:51) Introduction to Remote Sensing Dr. Arun K Saraf Department of Earth Sciences Indian Institute of Technology Roorkee Lecture 16 Image Classification Techniques Hello everyone welcome to 16th lecture in

More information

Coarse spatial scales constrain the range of binocular fusion on fine scales

Coarse spatial scales constrain the range of binocular fusion on fine scales Wilson et al. Vol. 8, No. 1/January 1991/J. Opt. Soc. Am. A 229 Coarse spatial scales constrain the range of binocular fusion on fine scales Hugh R. Wilson Research Laboratories, University of Chicago,

More information