INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.
Computer Vision in Mobile Tegra K1 It s time!
AGENDA Use cases categories Underlying technologies examples Performance and power considerations Software considerations and dilemmas
VISION FUNCTIONALITY TAXONOMY 3D Reconstruction Markets UI / Smart TV / STB Gaming Automotive Social/Media E-commerce Modeling/Architecture/DIY/3D printing Tracking User Facing Scene Facing User Facing Scene Facing Small Scale Object Reconstruction Face, eye and hand gesture tracking Environmental Feature Tracking Facial Modeling Large Scale Body Modeling Scene Reconstruction Body Tracking Indoor/Outdoor Positional Tracking
UNDERLYING TECHNOLOGY: DEPTH EXTRACTION Obtain a depth map for many points on a 2D picture Not necessarily per every pixel From there, we can calculate: 3D geometry and model Body position and movement Face features and expression Aggregating models is easy From different shots From different sources
3D SCANNING: THE TECHNOLOGIES Different approaches: Structured light Project IR pattern Find the pattern symbols on the image Triangulate to find depth Stereo Capture two or more images Find corresponding points Triangulate to find depth Structure from Motion (SfM) Similar to Stereo but using same camera over time (instead of multiple cameras) Coded / multiple aperture Project different patterns and solve for depth Time of Flight Project pulse of light Capture returned phase A IR B
UNDERLYING TECHNOLOGY: VISUAL ODOMETRY The use of data from cameras to estimate device change in position over time 1. Uses either single, stereo, or omnidirectional cameras 2. Image correction for lens distortion 3. Feature detection 4. Construct optical flow field 5. Estimation of the camera motion from the optical flow 1. Kalman filter or cost function minimization 6. Check potential tracking errors and remove outliers 7. Periodic repopulation of points to maintain coverage across the image Images from Davide Scaramuzza
ARE WE THERE YET? Performance Do the algorithms fit in the HW? Is the HW fast enough? Do they leave enough headroom for the actual application? Do the algorithms and the applications work together efficiently? Power Cost Does it fit the constrains of thermal, max current and battery life? New sensors, light sources, etc. SW infrastructure Do the right APIs exist? Is the imaging pipeline flexible enough? Are there programming languages/environment to support this?
Audio Processor ARM7 TEGRA K1 28 nm HPM 23x23mm, 0.7mm pitch HS-FCBGA Quad Cortex-A15 4x Cores (1+ GHz) NEON SIMD 2 MB L2 (Shared) ARM Trust Zone Shadow LP C-A15 CPU HD Video Processor 1080p24/30 Video Decode 1080p24/30 Video Encode H.264 MPEG4 VC1 MPEG2 VP8 TEGRA K1: A MAJOR LEAP FORWARD FOR MOBILE & EMBEDDED APPLICATIONS SATA2 x1 USB 2.0 x3 PCIe* G2 x4 + x1 Image Processor 25MP Sensor Support ISP 1080p60 Enhanced JPEG Engine Kepler GeForce GPU w/cuda OpenGL-ES nextgen 192 Stream Processors 2D Graphics/Scaling KEPLER GPU, 192 CORES CUDA USB 3.0* x2 UART x4 I2C x5 SPI x4 SDIO/MMC x4 Display x2 HDMI edp/lvds 12GB/S BANDWIDTH VIDEO IMAGE COMPOSITOR (VIC) CSI x4 + x4 NOR Flash DDR3 Ctlr 64b 800+ MHz Security Engine DAP x5 (1 2 S/TDM) DESIGNED FOR MOBILE DEVICES
Audio Processor ARM7 GPU 28 nm HPM 23x23mm, 0.7mm pitch HS-FCBGA Quad Cortex-A15 4x Cores (1+ GHz) NEON SIMD 2 MB L2 (Shared) ARM Trust Zone Shadow LP C-A15 CPU HD Video Processor 1080p24/30 Video Decode 1080p24/30 Video Encode H.264 MPEG4 VC1 MPEG2 VP8 KEPLER Architecture 192 CUDA Cores, SM3.2 ISA Compatible to GeForce, Quadro, Tesla SATA2 x1 USB 2.0 x3 PCIe* G2 x4 + x1 Image Processor 25MP Sensor Support ISP 1080p60 Enhanced JPEG Engine Kepler GeForce GPU w/cuda OpenGL-ES nextgen 192 Stream Processors 2D Graphics/Scaling 64kb L1 Cache and Shared Memory 128kb L2 Cache 128 kb Register File USB 3.0* x2 UART x4 I2C x5 SPI x4 SDIO/MMC x4 Display x2 HDMI edp/lvds CSI x4 + x4 NOR Flash DDR3 Ctlr 64b 800+ MHz Security Engine DAP x5 (1 2 S/TDM)
SW CONSIDERATIONS Need APIs and frameworks to develop SW Flexible and complete enough for experimentation Fast and stable enough for productization Portable for installed base APIs and libraries Android Camera HAL v.3 OpenCV OpenVX StreamInput VisionWorks CUDA
ANDROID CAMERA HAL V3 Camera HAL v3 is a fundamentally new API Flexible primitives for building sophisticated use-cases Interface is clean and easily extensible Apps can have more control, and more responsibility Enables sophisticated camera applications Faster time to market and higher quality 1 Request 1 capture 1 result metadata + N image buffers
OPENCV LIBRARY Version 2.4.5 >900 functions (x the datatypes) OpenCV4Tegra acceleration: CUDA, NEON, GLSL, TBB multithreading OpenCV Image processing General Image Processing Segmentation Machine Learning, Detection Image Pyramids Transforms Fitting Video, Stereo, and 3D Camera Calibration Features Depth Maps Optical Flow Inpainting Tracking
VISIONWORKS Sobel Convolve Bilateral Filter Integral Image Integral Histogram Corner Harris Corner FAST Image Pyramid Optical Flow PyrLK Optical Flow Farneback Warp Perspective Hough Lines Fast NLM Denoising Stereo Block Matching IME (Iterative Motion Estimation) HOG (Histogram of Oriented Gradients) Soft Cascade Detector Object Tracker TLD Object Tracker SLAM Path Estimator MedianFlow Estimator
IT IS HAPPENING! Use cases emerging Tegra K1 mobile compute power in mobile devices Software Infrastructure
THANKS