Outline. Immersive Visual Communication Pipeline. Lec 02 Human Vision System

Size: px

Start display at page:

Download "Outline. Immersive Visual Communication Pipeline. Lec 02 Human Vision System"

Felicia Fisher
5 years ago
Views:

1 Outline CS/EE 5590 / (Class Ids: 44873, 44874) Fall 2016, Tue 5-8:15pm@Edu Room 260 Special Topics: Advanced Multimedia Communication Lec 02 Human Vision System Re-Cap of Lec 01 Human Vision System Human Vision Perception Stereo Vision Summary Zhu Li Dept of CSEE, UMKC Office: FH560E, lizhu@umkc.edu, Ph: x Z. Li, Adv. Multimedia Communciation, 2016 Fall p.1 Z. Li, Adv. Multimedia Communciation, 2016 Fall p : What is New? Image Sensors Communication Applications mmwave 5G Immersive/3D Visual Communication FD-MIMO Immersive Visual Communication Pipeline End-to-End pipeline Latency Compression Efficiency: HMD requires very high resolution Human Vision System Characteristics LiDAR Depth Sensor Free View Point Video Device Centric Networking 360 Camera Massive Connectivity & QoS Slicing Precise/Personalized Medicine: Genome Data Compression Image stitching, projection, and mapping encode decode Rendering Hyperspectral p.3 p.4

Synthetic Content to VR HMD Playing desktop game on HMD with direct body and gesture

at the edge and stream to HMD with very low latency Precise body/head motion/gesture tracking

Very low latency/complexity decoding Mixed transcoding and local rendering scalable coding from

gesture/body motion tracking Point Cloud Capture and Compression Point Cloud Capture Tech

Compression Static: Octree decomposition Dynamic: no good solutions yet New approach: Graph

5 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.

most content on DVD are just HD (720p) Scalable coding: stream an HD version to phones, extra

Sparse Coupled Dictionaries: o Image super-resolution via sparse representation, J.

Learning: o Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang: Image Super-Resolution Using

2 Synthetic Content to VR HMD Playing desktop game on HMD with direct body and gesture interaction Challenges Mobile device GPU is weak, cannot render UHD quality game content Render at the edge and stream to HMD with very low latency Precise body/head motion/gesture tracking Innovation Synthetic content transcoding with full 3D object, texture, motion and lighting info Very low latency/complexity decoding Mixed transcoding and local rendering scalable coding from low-res local rendering HVS enabled pre-filtering of the signal Low latency precise gesture/body motion tracking Point Cloud Capture and Compression Point Cloud Capture Tech Stereo Camera Array ToF depth sensor Structure Light + Stereo Camera Point Cloud Data Compression Static: Octree decomposition Dynamic: no good solutions yet New approach: Graph Signal Processing Z. Li, Adv. Multimedia Communciation, 2016 Fall p.5 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.6 Image/Video Super Resolution Useful for display adaptation and compression TV now UHD, but most content on DVD are just HD (720p) Scalable coding: stream an HD version to phones, extra for TV as super resolved version Light Field Sensorial Data: Light Field Compression Approaches Sparse Coupled Dictionaries: o Image super-resolution via sparse representation, J. Yang, J Wright, TS Huang, Y Ma, IEEE transactions on image processing 19 (11), Deep Learning: o Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang: Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2): (2016) Utilizing existing image/video codec Z. Li, Adv. Multimedia Communciation, 2016 Fall p.7 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.8

Genome Data Compression Outline Precision Medicine Genome data for personalized diagnosis & treatment Genome

MPEG Genome Data Compression Group Started the work FastQ: non-aligned data SAM/BAM: aligned data New approach

Vision System Human Vision Perception Stereo Vision Summary Z. Li, Adv. Multimedia Communciation, 2016 Fall p.

10 Anatomy of human eye The Human Vision System Pipeline The Human Eye optics: Lens: cornea and aqueous humour

3 Genome Data Compression Outline Precision Medicine Genome data for personalized diagnosis & treatment Genome data is HUGE Need compression: low delay, high thruput. MPEG Genome Data Compression Group Started the work FastQ: non-aligned data SAM/BAM: aligned data New approach Deep learning to learn a better context model to drive Arithmetic coding efficiency Re-Cap of Lec 01 Human Vision System Human Vision Perception Stereo Vision Summary Z. Li, Adv. Multimedia Communciation, 2016 Fall p.9 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.10 Anatomy of human eye The Human Vision System Pipeline The Human Eye optics: Lens: cornea and aqueous humour Lens control: muscle group called zonula, changes the shape and position of the lens Aperture control: iris is a muscle that change the size of pupil. Human eye sensors: Photon sensors: the back of the eye is called retina, photo sensor cells concentrate around fovea Blind spot: where optical nerve terminates Top down view The signal path: * + Z. Li, Adv. Multimedia Communciation, 2016 Fall p.11 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.12

about 2.5-3mm in diameter In the center of the macular, approx. 0.3mm in diameter, has no rods, called fovea centralis, for high acuity vision.

4 The Retina Circuits Retina photon sensor cells Approx. 120 million rods Approx. 6 million cones Approx less than 1 million optical nerves (ganlion) connecting to brain Visual Functions at Retina Vision function at retina Cones concentrated around the yellow spot, or macular, about 2.5-3mm in diameter In the center of the macular, approx. 0.3mm in diameter, has no rods, called fovea centralis, for high acuity vision. Rods are distributed sparsely away from fovea, and are good for low light vision, and motion detection. Nigh vision 2 nd blind spot: on fovea. Rods for low light vision, cones for normal light high resolution vision Z. Li, Adv. Multimedia Communciation, 2016 Fall p.13 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.14 Lateral Geniculate Nucleus Retina is doing low level luminance processing via rods/cones Approx 1 million optical nerves connect the signal to LGN (Lateral Geniculate Nucleus) : mid level vision LGN has 6 layers More on the contrasts and movements First stage of stereo vision processing Color vision: Paired response for red-green and blue-yellow signals Primary and secondary visual cortex Optical radiations connect to primary visual cortex Primary is then connected to secondary cortex Complete higher level of vision tasks 1000x1000 color Z. Li, Adv. Multimedia Communciation, 2016 Fall p.15 Lateral Inhibition Edge Perceiving Edge info processing at Retina circuits More rods/cones than optical nerves Not all photon reception is feedback to brains, the ganlion cells have this lateral inhibition function to suppress the amount of information fired back to visual cortex No inhibition Inhibition: enhance edge Mach Band: the edge perception with inhibition Z. Li, Adv. Multimedia Communciation, 2016 Fall p.16

Human Vision Perception FOV: Field of View (approx.

vision Z. Li, Adv. Multimedia Communciation, 2016 Fall p.17 Z.

18 Spatial Contrast Sensitive Function Spatial CSF Contrast: (I

Contrast Temporal Contrast Sensitivity Function Greater than

rate Contrasts sensitivity is low at both higher and lower freq.

5 Human Vision Perception FOV: Field of View (approx. 130 o /120 o ) Overall FoV FoV Mono vision Stereo vision Foveated vision Z. Li, Adv. Multimedia Communciation, 2016 Fall p.17 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.18 Spatial Contrast Sensitive Function Spatial CSF Contrast: (I max I min )/(I max + I min ) Flickering test: Temporal CSF Contrast Temporal Contrast Sensitivity Function Greater than 60hz, no flickering- that is why LED panel need > 60hz refresh rate Contrasts sensitivity is low at both higher and lower freq. Spatial Freq Z. Li, Adv. Multimedia Communciation, 2016 Fall p.19 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.20

Spatio-temporal CSF and Application Render error tolerance map Seeing in

Perspective Projection Cues Bigger is closer, if have known size, Ames

Greenberg: Spatiotemporal sensitivity and visual attention for efficient

20(1): 39-65 (2001) Z. Li, Adv. Multimedia Communciation, 2016 Fall p.

22 Mono View Depth Illusion Julian Beever s work: perspective cue

a sense of protruding vs receding patterns, prob developed during the

6 Spatio-temporal CSF and Application Render error tolerance map Seeing in 3D Depth Perception Pinhole camera and Perspective Projection Perspective Projection Cues Bigger is closer, if have known size, Ames room illusion Yangli Hector Yee, Sumanta N. Pattanaik, Donald P. Greenberg: Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments. ACM Trans. Graph. 20(1): (2001) Z. Li, Adv. Multimedia Communciation, 2016 Fall p.21 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.22 Mono View Depth Illusion Julian Beever s work: perspective cue illusion Mono Vision: Lighting & Occlusion Depth cue from Lighting Have a sense of protruding vs receding patterns, prob developed during the human vision evolution Occlusion Z. Li, Adv. Multimedia Communciation, 2016 Fall p.23 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.24

7 Mono Vision Cue for 3D - Focus Depth from Focus Human eyes constantly re-focus to get a sense of depth HMD VR has fixed depth content, and a major cause for fatigue and nauseate Light Field! Stereo Vision Depth Perception For closer views, stereoscopic vision provides more accurate depth perception Nvidia LF HMD From The Art of Photography, Canon Z. Li, Adv. Multimedia Communciation, 2016 Fall p.25 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.26 Binocular stereo Depth from binocular disparity Given a calibrated binocular stereo pair, fuse it to produce a depth image Humans can do it Human Perception of Depth P: converging point C: object nearer projects to the outside of the P, disparity = + Stereograms: Invented by Sir Charles Wheatstone, 1838 Sign and magnitude of disparity F: object farther projects to the inside of the P, disparity = - Z. Li, Adv. Multimedia Communication, 2016 p.27

Binocular Optical Range Finder - WWII Depth from disparity Binocular stereo Given a

Alabama Yamato Z. Li, Adv. Multimedia Communciation, 2016 Fall p.

30 Basic stereo matching algorithm Simplest Case: Parallel images Image planes of

height Focal lengths are the same For each pixel in the first image Find corresponding

best match Triangulate the matches to get depth information Simplest case: epipolar

8 Binocular Optical Range Finder - WWII Depth from disparity Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Humans can do it USS Alabama Yamato Z. Li, Adv. Multimedia Communciation, 2016 Fall p.29 Autostereograms: Z. Li, Adv. Multimedia Communication, 2016 p.30 Basic stereo matching algorithm Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same For each pixel in the first image Find corresponding epipolar line in the right image Examine all pixels on the epipolar line and pick the best match Triangulate the matches to get depth information Simplest case: epipolar lines are scanlines When does this happen? Z. Li, Adv. Multimedia Communication, 2016 p.31 Z. Li, Adv. Multimedia Communication, 2016 p.32

9 Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same Then, epipolar lines fall along the horizontal scan lines of the images Stereo Correspondence Determine Pixel Correspondence Pairs of points that correspond to same scene point epipolar line X epipolar plane C 1 C 2 epipolar line Epipolar Constraint Reduces correspondence problem to 1D search along conjugate epipolar lines Java demo: Z. Li, Adv. Multimedia Communication, 2016 p.33 Depth from disparity X Correspondence problem Multiple matching hypotheses satisfy the epipolar constraint, but which one is correct? x x z f f O Baseline O B B f B f disparity : d x x depth : z z d Disparity is inversely proportional to depth! Z. Li, Adv. Multimedia Communication, 2016 p.35 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.36

Correspondence problem Let s make some assumptions to simplify the matching problem The baseline is relatively small (compared to the depth of scene points) Then most scene points are

Multimedia Communciation, 2016 Fall p.37 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.

block matching or SSD (sum squared differences) d is the disparity (horizontal motion) Matching cost disparity Slide a window along the right scanline and compare contents of that window

10 Correspondence problem Let s make some assumptions to simplify the matching problem The baseline is relatively small (compared to the depth of scene points) Then most scene points are visible in both views Also, matching regions are similar in appearance Correspondence problem Let s make some assumptions to simplify the matching problem The baseline is relatively small (compared to the depth of scene points) Then most scene points are visible in both views Also, matching regions are similar in appearance Z. Li, Adv. Multimedia Communciation, 2016 Fall p.37 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.38 Correspondence search with similarity constraint Image registration (revisited) scanline Left Right How do we determine correspondences? block matching or SSD (sum squared differences) d is the disparity (horizontal motion) Matching cost disparity Slide a window along the right scanline and compare contents of that window with the reference window in the left image Matching cost: SSD or normalized correlation What is the proper window size? Z. Li, Adv. Multimedia Communciation, 2016 Fall p.39 Stereo matching 40

disparity maps Less detail Left view Right

Occlusions, repetition Depth Ground truth

11 Effect of window size The similarity constraint Smaller window + More detail More noise Corresponding regions in two images should be similar in appearance and non-corresponding regions should be different Larger window + Smoother disparity maps Less detail Left view Right view When will the similarity constraint fail? W = 3 W = 20 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.41 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.42 Limitations of similarity constraint Where pixel similarity fails Results with window search Windowed search: SSD of windowed pixels Textureless surfaces Occlusions, repetition Depth Ground truth Window-based matching Non-Lambertian surfaces Z. Li, Adv. Multimedia Communciation, 2016 Fall p.43 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.44

12 Improve Window Method:Non-local constraints Uniqueness For any point in one image, there should be at most one matching point in the other image Non-local constraints Uniqueness For any point in one image, there should be at most one matching point in the other image Ordering Corresponding points should be in the same order in both views Occlusion: ordering constraint doesn t hold Z. Li, Adv. Multimedia Communciation, 2016 Fall p.45 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.46 Non-local constraints Uniqueness For any point in one image, there should be at most one matching point in the other image Ordering Corresponding points should be in the same order in both views Smoothness We expect disparity values to change slowly (for the most part) Achieved by various smoothness penalties. Scanline stereo Try to coherently match pixels on the entire scanline Different scanlines are still optimized independently Left image Right image Z. Li, Adv. Multimedia Communciation, 2016 Fall p.47 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.48

Left occlusion Shortest paths for scan-line stereo I Left image Right image I Coherent stereo on 2D grid Scanline stereo generates streaking artifacts S left Right occlusion q C corr C occl t s p S

13 Left occlusion Shortest paths for scan-line stereo I Left image Right image I Coherent stereo on 2D grid Scanline stereo generates streaking artifacts S left Right occlusion q C corr C occl t s p S right C occl Can be implemented with dynamic programming Ohta & Kanade 85, Cox et al. 96 Slide credit: Y. Boykov Z. Li, Adv. Multimedia Communciation, 2016 Fall p.49 Can t use dynamic programming to find spatially coherent disparities/ correspondences on a 2D grid Z. Li, Adv. Multimedia Communciation, 2016 Fall p.50 Stereo matching as energy minimization Energy functions of this form can be minimized using graph cuts I 1 I 2 D W 1 (i) W 2 (i+d(i)) D(i) Stereo matching as energy minimization I 1 I 2 D W 1 (i) W 2 (i+d(i)) D(i) E data is the energy from pixel matching E smooth penalizes the large displacement with a monotonically increasing function ρ E E I, I, D) E ( D) E W 2 1( i) W2 ( i D( i data )) i data( 1 2 smooth Esmooth D( i) D( j) neighborsi, j Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.51 P D I, I ) P( I, I D) P( ) ( D Probabilistic interpretation: we want to find a Maximum A Posteriori (MAP) estimate of disparity image D: log P( D I1, I2) log P( I1, I2 D) log P( D) E Edata ( I1, I2, D) Esmooth ( D) Z. Li, Adv. Multimedia Communciation, 2016 Fall p.52

The role of the baseline Small baseline: large depth error that is why 15m range finder Large

Foreshortening Matching with fixed-size windows will fail!

Baseline Large Baseline Source: S. Seitz Z. Li, Adv. Multimedia Communciation, 2016 Fall p.

54 Active stereo with structured light Active stereo with structured light Project structured

one camera camera projector Magic Leap L. Zhang, B. Curless, and S. M. Seitz.

14 The role of the baseline Small baseline: large depth error that is why 15m range finder Large baseline: difficult search problem due to occlusion Problem for wide baselines: Foreshortening Matching with fixed-size windows will fail! Possible solution: adaptively vary window size Another solution: model-based stereo Small Baseline Large Baseline Source: S. Seitz Z. Li, Adv. Multimedia Communciation, 2016 Fall p.53 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.54 Active stereo with structured light Active stereo with structured light Project structured light patterns onto the object Simplifies the correspondence problem Allows us to use only one camera camera projector Magic Leap L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming. 3DPVT 2002 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.55 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.56

Laser scanning Laser scanned models Optical triangulation Project a single stripe of laser light

scanning Digital Michelangelo Project http://graphics.stanford.edu/projects/mich/ Source: S.

57 The Digital Michelangelo Project, Levoy et al. Source: S.

0 mm resolution (total 56 million triangles) The Digital Michelangelo Project, Levoy et al. Z.

15 Laser scanning Laser scanned models Optical triangulation Project a single stripe of laser light Scan it across the surface of the object This is a very precise version of structured light scanning Digital Michelangelo Project Source: S. Seitz Z. Li, Adv. Multimedia Communciation, 2016 Fall p.57 The Digital Michelangelo Project, Levoy et al. Source: S. Seitz Z. Li, Adv. Multimedia Communciation, 2016 Fall p.58 Laser scanned models Details: Laser scanned models 1.0 mm resolution (total 56 million triangles) The Digital Michelangelo Project, Levoy et al. Z. Li, Adv. Multimedia Communciation, 2016 Fall Source: S. Seitz p.59 The Digital Michelangelo Project, Levoy et al. Source: S. Seitz Z. Li, Adv. Multimedia Communciation, 2016 Fall p.60

Aligning range images A single range scan is not sufficient to describe a complex surface Need techniques to register multiple range images Check out Point Cloud Library! B. Curless and M.

16 Aligning range images A single range scan is not sufficient to describe a complex surface Need techniques to register multiple range images Check out Point Cloud Library! B. Curless and M. Levoy, A Volumetric Method for Building Complex Models from Range Images, SIGGRAPH 1996 Z. Li, Adv. Multimedia Communciation, 2016 Fall p.61 Summary Human Vision System Retina : serving as lens and photon sensor units, as well as some low level vision processing LGN: 6 layers with mid level vision functions Visual Cortex: higher level vision functions Human Depth perception: Mono vs Stereo vision Depth Perception Computational Approach Stereo matching Structured light assisted stereo matching Next Class: Stereo Depth Estimation Depth Map Compression Potential project: Sunny Optics Structured Light Stereo Depth sensor for gesture recognition. Z. Li, Adv. Multimedia Communciation, 2016 Fall p.62

Stereo vision. Many slides adapted from Steve Seitz

Stereo vision. Many slides adapted from Steve Seitz Stereo vision Many slides adapted from Steve Seitz What is stereo vision? Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape What is