Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics

Size: px

Start display at page:

Download "Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics"

Ariel Horn
6 years ago
Views:

1 Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics Jutta Pirkl, Andreas Becher, Jorge Echavarria, Jürgen Teich, and Stefan Wildermann Hardware/Software Co-Design, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) SCOPES 17, St. Goar, Germany, June 1, 17

Approximate Computing A New Design Paradigm Portable

Underlying Idea Trading accuracy of computations against

consumption and/or performance and/or circuit area.

com/embedded-wireless-modules/tiwiconnect, http://www.

com/business-intelligence/the-data-tsunami-is-coming.

2 Approximate Computing A New Design Paradigm Portable battery-powered devices Rapid workload increase Underlying Idea Trading accuracy of computations against disproportionate improvements with respect to power consumption and/or performance and/or circuit area. Sources: Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 1

3 Motivation Problem: Error-tolerance depends on both input data and application context Quality-configurability is the key principle of prospective AC platforms 1 Dynamic Partial Reconfiguration Adaptation of the approximation level Approximate Computing Approach: Dynamic autonomous swapping of filters with different degrees of approximation utilizing reconfigurable hardware 1 S. Venkataramani et al. Approximate computing and the quest for computing efficiency. In: 15 5nd ACM/EDAC/IEEE Design Automation Conference (DAC). June 15, pp Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17

4 Outline Concepts of Self-Adaptive Image Processing Approximate D-Convolution Filters Quality Evaluation Reconfiguration Management Experimental Results Quality-Configurable Control Mechanism Partial Reconfiguration Overhead Summary Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 3

5 Concepts of Self-Adaptive Image Processing

6 Approximate D-Convolution Filters Basic filter building block: D-convolution filter wrapper with a kernel size of 3 3 n Filter kernel m Y[m, n] = i H[i, j] X[m i, n j] j Output Filter kernel Input Parallel Multiply-Accumulate (MAC) operation in a pipelined adder tree structure Replacement of all adders by the same approximate version Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17

7 Approximate Adder Structures on FPGAs Most Significant Part Least Significant Part an 1 bn 1 a... b... am bm am 1 bm 1 am bm a... b... a b LUT6_... LUT6_ LUT5 LUT6_... LUT6_ o o o o o o cout c... cm all1 c... c sn 1 s... sm m sm 1 sm s... s Case 1: Carry suppression MSP LSP A. Becher et al. A LUT-based approximate adder. In: Proceedings of the th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines. FCCM 16. Washington DC, USA, May 16. Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 5

8 Approximate Adder Structures on FPGAs Most Significant Part Least Significant Part an 1 bn 1 a... b... am bm am 1 bm 1 am bm a... b... a b LUT6_... LUT6_ LUT5 LUT6_... LUT6_ o o o o o o cout c... cm all1 c... c sn 1 s... sm m sm 1 sm s... s Case 1: Carry suppression MSP LSP error reduction mechanism A. Becher et al. A LUT-based approximate adder. In: Proceedings of the th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines. FCCM 16. Washington DC, USA, May 16 Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 5

9 Approximate Adder Structures on FPGAs Most Significant Part Least Significant Part an 1 bn 1 a... b... am bm am 1 bm 1 am bm a... b... a b LUT6_... LUT6_ LUT5 LUT6_... LUT6_ o o o o o o cout c... cm all1 c... c sn 1 s... sm m sm 1 sm s... s Case 1: Carry suppression MSP LSP Case : Carry prediction MSP LSP error reduction mechanism no approximation error A. Becher et al. A LUT-based approximate adder. In: Proceedings of the th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines. FCCM 16. Washington DC, USA, May 16 Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 5

10 Case Study Approximate Gaussian Lowpass Filter m=1 m= Artifacts: Brightness decrease underestimating adder Cartoon effect Jutta Pirkl m= m=6 m=8 m = 1 Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing m=9 SCOPES 17 6

11 Impact of the Carry Chain Splitting Point on the Output Quality Dependency of the average Peak Signal-to-Noise Ratio (PSNR) on m among the Kodak Lossless True Color Image Suite Average PSNR [db] Splitting Position of the Carry Chain (m) 3 R. Franzen. True Color Kodak Images. Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 7

12 Quality Evaluation Problem: Requirement of a no-reference metric to assess the quality at runtime Approach: Feature extraction from the histograms of in- and output images Input Image m = 1 m = 3 6, 6, 6,,,,,,, Frequency m = 5 6,, m = m = 9 6, Gray level More and more pixels are mapped onto exactly the same brightness values cartoon -effect Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 8

13 Quality Evaluation Frequency 6 1 m = m = m = Gray level Gray level Gray level Gauss kernel: ( ) Distinctive peaks created by the all1-signal erroneous sums are mapped onto n m values Example: m = 9, output bit width after normalization n = 8 before normalization: x x after normalization by 16: x 11 x 1 x smallest collection bin at b (31 d ) further peaks at a distance of 5 = 3 Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 9

14 Quality Evaluation Frequency 6 1 m = m = m = Gray level Gray level Gray level Gauss kernel: ( ) Distinctive peaks created by the all1-signal erroneous sums are mapped onto n m values Example: m = 9, output bit width after normalization n = 8 before normalization: x x after normalization by 16: x 11 x 1 x smallest collection bin at b (31 d ) further peaks at a distance of 5 = 3 Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 9

15 Quality Evaluation Amount of counters to sample the histograms constitutes a trade-off between overhead and fidelity Counting of the pixels with gray levels 31, 63, 17 and 191 in both in- and output image Definition QM Ratio of the maximum peak height of the four bins and the corresponding amount in the input image QM Progression of QM with increasing m Splitting Position of the Carry Chain (m) Large metric value indicates bad quality Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 1

effect Jutta Pirkl m= m=6 m=8 m = 1 Hardware/Software

16 Case Study Approximate Gaussian Lowpass Filter m=1 m= Artifacts: Brightness decrease underestimating adder Cartoon effect Jutta Pirkl m= m=6 m=8 m = 1 Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing m=9 SCOPES 17 1

17 Reconfiguration Management Objective: Minimization of the critical path while maintaining a given quality boundary successive approximation of m to the fastest configuration m opt Based on a bang-bang controller with integrated hysteresis Decision logic for setting the degree of approximation Case 1: Quality is still acceptable in- or decrement the splitting position m in the direction of m opt Case : Quality boundary is exceeded in- or decrement m in the opposite direction of m opt Case 3: Quality metric is within dead zone keep configuration Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 11

18 Experimental Results

19 Results Input-Based Adaptivity System behavior at runtime for the approximate Gaussian filter QM m τ QM , Frames , Frames Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 1

20 Results Input-Based Adaptivity System behavior at runtime for the approximate Gaussian filter QM m static m = 6 τ QM , Frames , Frames Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 1

21 Results Input-Based Adaptivity System behavior at runtime for the approximate Gaussian filter QM m τ QM , Frames static m = 5 static m = , Frames Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 1

22 Results Input-Based Adaptivity System behavior at runtime for the approximate Gaussian filter QM m τ QM , Frames dynamic static m = 5 static m = , Frames Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 1

23 Results Input-Based Adaptivity System behavior at runtime for the approximate Gaussian filter QM m τ QM , Frames dynamic static m = 5 static m = , Frames Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 1

24 Results Requirement-Based Adaptivity System evaluation results,5 9 8 DoA 6 Output Quality mavg PSNRavg [db] low medium high low medium high Quality Requirement Quality Requirement aspen redkayak snowmnt touchdownpass pedestrian area demo video Test videos: Derf s collection + self-shot demo video, resolution of 6 8, grayscale 8 bits/pixel 5 Evaluation Parameters: Adaptation rate of at a frame rate of 3 fps sec Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 13

25 Analysis of the Partial Reconfiguration Overhead Reconfiguration time for the partial bitstreams 6 XC7Z Partial Bitstream Bitstream Size [KB] Reconfiguration Time [ms] 1.6 Download Rate [MB/s] 37.8 Approximately linear correlation between configuration time and bitstream size Remaining time slot for the filtering process at 3 fps: ms ms = ms Partial reconfiguration requires.86 % of the time frame 6 This table contains only the largest bitstream among the approximate variants for the Gaussian filter which determines the slowest transfer 7 Full binary bitstream size for the xc7z device:,5,56 Bytes Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 1

26 Summary

27 Summary Challenge: Input-dependent approximation error behavior requires self-adaptive methods Proposition of a no-reference metric for online output quality monitoring based on histogram information Our concept offers better exploitation of a given error tolerance than static approximation a user control knob to select the desired output quality at runtime Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 15

28 Summary Challenge: Input-dependent approximation error behavior requires self-adaptive methods Proposition of a no-reference metric for online output quality monitoring based on histogram information Our concept offers better exploitation of a given error tolerance than static approximation a user control knob to select the desired output quality at runtime Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 15

29 Summary Challenge: Input-dependent approximation error behavior requires self-adaptive methods Proposition of a no-reference metric for online output quality monitoring based on histogram information Our concept offers better exploitation of a given error tolerance than static approximation a user control knob to select the desired output quality at runtime Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 15

30 Summary Challenge: Input-dependent approximation error behavior requires self-adaptive methods Proposition of a no-reference metric for online output quality monitoring based on histogram information Our concept offers better exploitation of a given error tolerance than static approximation a user control knob to select the desired output quality at runtime Thank you for listening! Any questions? Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 15

31 Backup Slides

32 System Level Overview SoC Processing System Programmable Logic Main Memory User application + Reconfiguration Manager Driver Modules Linux Kernel /dev/image_filter /dev/xdevcfg HW/SW Interface Filter Wrapper Reconfigurable Partition Quality Evaluation Filter Controller PR m = 1 m = m = 3 Software Hardware Reconfiguration Manager: Quality-control loop Linux device drivers as hardware interfaces Approximate Filter Operators: Partial bitstreams for various degrees of approximation Quality Evaluation: Online quality monitoring Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 16

33 Correlation Between the Proposed Quality Metric and PSNR 15 QM PSNR [db] Inverse relation: Increasing tendency of the metric with decreasing PSNR Jutta Pirkl Hardware/Software Co-Design (FAU) Self-Adaptive FPGA-Based Image Processing SCOPES 17 17

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,