A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video

Size: px

Start display at page:

Download "A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video"

Barry Butler
5 years ago
Views:

1 A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video Amrita Mazumdar Armin Alaghi Jonathan T. Barron David Gallup Luis Ceze Mark Oskin Steven M. Seitz University of Washington Google University of Washington 1

2 virtual reality video with omnidirectional stereo (ODS) 2

3 the Google Jump camera rig can capture ODS video easily 16 GoPros x 4K camera feed 3.6 GB/s raw video 3

4 the Google Jump camera rig can capture ODS video easily 4 Anderson et al., SIGGRAPH Asia 2016

5 the Google Jump camera rig can capture ODS video easily 5 Anderson et al., SIGGRAPH Asia 2016

6 processing video from Google Jump is slow 10 hours on 1000 cores 1 hour of video 6 Anderson et al., SIGGRAPH Asia 2016

7 Google Jump pipeline breakdown sensor preprocessing alignment optical flow compositing download to viewer 7 Anderson et al., SIGGRAPH Asia 2016

8 Google Jump pipeline breakdown the bilateral solver dominates processing time sensor preprocessing alignment optical flow compositing download to viewer 12% 2% 69% 17% 8 Anderson et al., SIGGRAPH Asia 2016

9 The bilateral solver produces an image that is smooth and accurate. blocky flow field upsample into noisy flow field transform to bilateral grid and solve output result: smooth flow field input pair (from two cameras) 9 Anderson et al., SIGGRAPH Asia 2016

10 this work: a hardware-friendly bilateral solver (HFBS)

11 The bilateral solver is hard to parallelize second-order global optimization global communication prevents aggressive parallelization high-dimensional, sparse matrices sparsity results in significant divergence on GPUs why not a dense grid? too large to store on-chip 11

12 HFBS is easier to parallelize Barron Poole 2016 HFBS (our work) detailed formulation in paper includes color grayscale only dense matrix too big to fit in memory dense matrix fits in memory global communication required local communication only iterative bistochastization before solving partial, non-iterative bistochastization

13 HFBS demonstrates imperceptible accuracy loss input image Barron Poole 2016 task: Ferstl et al., ICCV 2013, data: Middlebury stereo dataset noisy depth map HFBS (this work) 13

14 algorithm optimizations make it easier to implement bilateral solver in parallel hardware 14

15 algorithm optimizations make it easier to implement bilateral solver in parallel hardware plan: exploit this parallelism with a custom hardware accelerator 15

16 Mapping HFBS to hardware sensor preprocessing alignment optical flow compositing download to viewer 16

17 Mapping HFBS to hardware sensor preprocessing alignment optical flow compositing download to viewer CPU FPGA load video pair construct bilateral grid per pair slice out solution into output images perform hardwarefriendly bilateral solver 17

18 microarchitecture HFBS controller z-axis memory bank bilateral filter worker main memory AXI memory interface z-axis memory custom bank memory layout memory access selector bilateral filter worker fixed-point datapath CPU z-axis memory controller z-axis memory bank bilateral filter worker 18

19 Floating-point resource requirements limit hardware parallelism float64 32-bit fixed 64-bit fixed 47-bit fixed DSPs per worker Maximum # workers Error (MSE) x x x

20 Fixed-point datapath conversion Bitwidth E-02 Error (MSE relative to float64) 1E-04 1E-06 1E-08 1E-10 Max Error 1E-12 40% 50% 60% 70% 80% 90% Decimal Precision (Fraction of Bitwidth) 20

21 z-axis slicing for bilateral grid memory layout x:0,y:0,r:255,g:172,b:0 x:0,y:1,r:255,g:172,b: x:100,y:100,r:255,g:172,b:0 z = 0 21

22 Evaluation

23 Evaluation Does HFBS improve runtime? How does parallelization affect power? sensor preprocessing alignment optical flow compositing download to viewer 12% 2% 69% 17% 23

24 Experimental Setup CPU: Intel Xeon E GPU: NVIDIA GTX 1080 Ti FPGA: Xilinx Virtex Ultrascale+ Baseline: Barron Poole et al (CPU only) 256 iterations of optimization Varied bilateral grid vertices count 4 KB GB grid sizes 24

25 HFBS is faster and more scalable than prior work Prior Work (CPU) CPU GPU FPGA log Runtime (ms) , ,000 10,000,000 log Bilateral Grid Vertices 25

26 HFBS is faster and more scalable than prior work Prior Work (CPU) CPU GPU FPGA log Runtime (ms) FPS and better , ,000 10,000,000 log Bilateral Grid Vertices 26

27 HFBS-FPGA is more power-efficient than other platforms Prior Work CPU GPU FPGA Ops / Watt Improvement x 0.45x 2.12x 30.72x Power-efficiency relative to prior work 27

28 building a VR video camera rig with HFBS

29 this work full system 29

30 HFBS-FPGA consumes much less power than a GPU for the same task 16 FPGAs full system = 400 W 16 GPUs = 4,560 W 30

31 HFBS makes real-time VR video more feasible with FPGAs on-node with FPGAs offloaded to cloud sensor preprocessing alignment optical flow compositing download to viewer 31

32 to conclude fast, parallel implementation of bilateral solving with little accuracy loss fixed-point datatypes and a custom bilateral-grid memory layout for improved FPGA performance hardware-software codesign to reduce latency and improve quality for future VR applications 32

33 A Hardware-Friendly Bilateral Solver for Real-Time Virtual Reality Video parallel algorithm for bilateral solving FPGA architecture 50x faster, 30x more power-efficient 33

Fast Guided Global Interpolation for Depth and. Yu Li, Dongbo Min, Minh N. Do, Jiangbo Lu

Fast Guided Global Interpolation for Depth and Yu Li, Dongbo Min, Minh N. Do, Jiangbo Lu Introduction Depth upsampling and motion interpolation are often required to generate a dense, high-quality, and