Design and Optimization of Geometry Acceleration for Portable 3D Graphics

Size: px

Start display at page:

Download "Design and Optimization of Geometry Acceleration for Portable 3D Graphics"

Mabel Gaines
6 years ago
Views:

M.S. Thesis Design and Optimization of Geometry Acceleration for Portable 3D Graphics Ju-ho Sohn 2002.12.

1 M.S. Thesis Design and Optimization of Geometry Acceleration for Portable 3D Graphics Ju-ho Sohn oratory Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology MS Thesis 1

2 Outline Introduction Design and Implementation of Bandwidth Equalizer Mobile Graphics Library Proposed 3D Geometry Coprocessor SATINE Conclusion and Further Works MS Thesis 2

3 Outline Introduction Portable 3D Graphics RAMP-IV Multimedia Processor Problem Definitions Performance Requirements of Mobile System Design and Implementation of Bandwidth Equalizer Mobile Graphics Library Proposed 3D Geometry Coprocessor SATINE Conclusion and Further Works MS Thesis 3

. Most challenging part High computing complexity Hugh Memory

4 Introduction Portable 3D Graphics Most attractive application 3D ad., 3D avatar and 3D game.. Most challenging part High computing complexity Hugh Memory Bandwidth Mobile System Memory CPU Accelerator Small screen size Low performance requirements MS Thesis 4

5 Introduction RAMP-IV Portable Multimedia Processor [*] 0.16um pure DRAM process Provide the full 3D graphics pipeline with texture mapping 32b RISC MAC Interface Logic Rendering Accelerator Embedded DRAM Geometry Operations Rendering Operations [*] R.Woo, et al, ISSCC 2003 MS Thesis 5

6 Introduction Problem Definitions Efficient interface logic: Bandwidth Equalizer Efficient graphics library: MobileGL Efficient geometry accelerator: SATINE SW Graphics Library (Geometry Operations) 32b RISC Data Transfer Interface Logic Rendering Accelerator Communication cost MS Thesis 6

7 Introduction Required Performance of Graphics System For mobile device of 320 by 240 screen resolution Frame rate : 15 fps for animation Geometry stage : 144K polygon/sec Rendering stage : 2M pixel/sec Test Scenes 5868 Polygons 6833 Polygons MS Thesis 7

8 Outline Introduction Design and Implementation of Bandwidth Equalizer Requirements Architecture Chip Implementation Performance Mobile Graphics Library Proposed 3D Geometry Coprocessor SATINE Conclusion and Further Works MS Thesis 8

9 Bandwidth Equalizer Requirements Compensate the different datawidth and clock frequency 32bit from RISC 128bit to Rendering Accelerator Low power consumption with reduced communication cost Less than 5 Reduce the CPU cycles for data transfer Increase the memory utilization Dual Port SRAM 32bit RISC 32 bit Interface Controller 128 bit Redering Accelerator Interrupt Hold MS Thesis 9

10 Bandwidth Equalizer Architecture Flow control Dynamic threshold : Reduce the CPU interrupts Adaptive bank utilization : Reduce the power consumption Need more memory Need less memory MS Thesis 10

11 Bandwidth Equalizer Architecture Scratch Pad Memory Increase the memory utilization BEQ Dual Port SRAM Frequently Accessed Data (*) (*) DCT Table in MPEG decoding Cache 32b RISC Rendering Accelerator MS Thesis 11

12 Bandwidth Equalizer Chip Implementation Hynix 0.16 um DRAM process 1KB DP-SRAM 4.5 by 1 (11 by 11 ) (210 ) Flow control 16.7 % Power reduction Avoid CPU interrupts MS Thesis 12

13 Bandwidth Equalizer Performance Geometry Stage Rendering Stage Polygon Rate (K Poly/Sec) Pixel Fill Rate (Pixel/Sec) k 20k 30k 40k 50M 60M 70M Conventional RAMP-IV w/o BEQ O b Zc ^ _ /=/: H RAMP-IV w/ BEQ Improved Communication Cost O b Zc ^ _ /=/;J MS Thesis 13

14 Outline Introduction Design and Implementation of Bandwidth Equalizer Mobile Graphics Library (MobileGL) Definition Simulation Environment Performance Result Implementation Proposed 3D Geometry Coprocessor SATINE Conclusion and Further Works MS Thesis 14

15 Mobile Graphics Library (MobileGL) Definition Graphics library for real time 3D graphics in mobile applications OpenGL Compatible Based on the optimization of graphics system for mobile applications [*] [*] J-H Sohn, et al, ISCAS 2002 MS Thesis 15

16 MobileGL Simulation Environment Target platform: ARM processor platform Graphics Library Performance Result ARM SDT ARM Processor Integer Datapath Cache Floating Point Unit MS Thesis 16

17 MobileGL Target configurations Suitable for mobile system such as PDA Floating vs Fixed point system Floating Point Graphics Library Floating Point Graphics Library Fixed Point Graphics Library Integer Datapath RISC Integer Datapath RISC HW Floating Point Unit Integer Datapath RISC Conventional MS Thesis 17

18 MobileGL Performance result Un-lit polygon O b Zc ^ _ /=/: H Lit Polygon Geometry Stage Polygon Rate (K Poly/Sec) Floating Point w/o FPU Floating Point w/ FPU Fixed Point 10 Times MS Thesis 18

19 MobileGL Implementation Fixed point arithmetic Assembly math library for SQRT/DIV Efficient in ARM architecture Optimized to RAMP-IV Graphics system Geometry: ARM9 + fast MAC = 153K polygon/sec Solve the high computing complexity Rendering: Rendering Accelerator = 66M pixel/sec Solve the huge memory bandwidth requirements MS Thesis 19

20 MobileGL Implementation Available on any ARM platform MS Thesis 20

21 MobileGL Performance Enhancement O b Zc ^ _ /=/: H Geometry Stage Polygon Rate (K Poly/Sec) Conventional RAMP-IV w/o BEQ RAMP-IV w/ BEQ 153 RAMP-IV with MobileGL 10 Times MS Thesis 21

22 Outline Introduction Design and Implementation of Bandwidth Equalizer Mobile Graphics Library Proposed 3D Geometry Coprocessor SATINE Motivation Related Works Architecture Performance Estimation Conclusion and Further Works MS Thesis 22

23 Proposed 3D Geometry Coprocessor Motivation Performance bottleneck in RAMP-IV system Whole system is limited by Geometry Performance Only 4~10% utilization of rendering accelerator can be obtained For high performance with low power consumption SATINE : Tightly coupled geometry coprocessor optimized to 3D operations System Memory ARM RISC Coprocessor interface SATINE Rendering Processor MS Thesis 23

24 SATINE Motivation (cont d) Cycle usages of operations in geometry stage Normalized Clock (3D pipeline) Rendering Stage (0.79) Geometry Stage (0.21) 3D Pipeline Operations pattern 91% Geometry only Normalized Clock (Geometry only) MUL DIV SQRT Clipping Perspective Specular Light Diffuse Light Ambient Light Transformation 5.3% 3.4% MS Thesis 24

25 SATINE Related Works #1 Intel GPP, 2002 #2 Intel Wireless MMX, 2202 RISC SW RISC Co-Proc SIMD SIMD SIMD Unavoidable performance limit Not optimized to 3D operations #3 PowerMBX, 2001 #4 Mitsubishi Z-3D, 2001 BUS Vertex Engine RISC falu Complex hardware cost Not suitable to PDA size #1, #2: #3: #4: MS Thesis 25

26 SATINE Architecture SIMD coprocessor Brings new 53 instructions 4 Way 128 bit integer SIMD datapath Datapath 32x8 32x8 32x8 32x8 Multiplier ALU ALU ALU ALU 32b Shifter MS Thesis 26

27 SATINE Architecture SIMD acceleration for 3D operations (1) Vertex transformation: VMMV instruction x V V.x V.y V.z V.w X x0 x0 x0 x0 X.x X.y X.z X.w Elements of SIMD variable Broadcasting m0 m1 m2 m3 m4 m5 m6 m7 m8 m12 m9 m13 m10 m14 m11 m15 V.x V.y V.z V.w m0 m1 m2 m3 V.x V.x V.x V.x V.x m4 m5 m6 m7 V.y V.y V.y V.y V.y m8 m9 m10 m11 V.z V.z V.z V.z V.z m12 m13 m14 m15 V.w V.w V.w V.w V.w V.x V.y V.z V.w M Multiply cycle M1 M2 M3 M4 A1 A2 A3 A4 M1 M2 M3 M4 A1 A2 A3 A4 A Add cycle M1 M2 M3 M4 A1 A2 A3 A4 M1 M2 M3 M4 A1 A2 A3 A4 MS Thesis 27

28 SATINE Architecture SIMD acceleration for 3D operations (2) Clip operations : TCLIP instruction in out in out in out in out (a) No clip (b) Culled (c) Clipped (d) Clipped Distance MS Thesis 28

29 SATINE Architecture Script Execution and Stream Processor Script Download SATINE Memory Attribute Parameter SATINE Script Memory Input Output Vertex Input Coordination Transformation Normal Transformation Lighting Callculation Vertex Output MS Thesis 29

30 SATINE Architecture Master Request Interface (MRI) C/Assembly function in ARM and SATINE script can be executed with passing the parameters in parallel DIV/SQRT call the script end of script N execute request send result ARM Y MRC return MRC MCR CP1 SSR MQCR MQRR Request Get CP0 SSR: Script Status Register MQCR: Master Request Control Register MQRR: Master Request Result Register VMULL VMAC... VMSTREQ #sqrt VADD... VFINREQ #sqrt... VENDSCR MS Thesis 30

31 SATINE Architecture Multi-Mode Register File bank 0 bank 1 For efficient register use in 3D operations Mat_mode Lit_mode Front cp0_r0 cp0_r1 cp0_r2 cp0_r3 cp0_r4 cp0_r5 cp0_r6 cp0_r7 cp0_r8 cp0_r9 cp0_r10 cp0_r11 cp0_r12 cp0_r13 cp0_r14 cp0_r15 Back cp0_r0 cp0_r1 cp0_r2 cp0_r3 cp0_r4 cp0_r5 cp0_r6 cp0_r7 Front cp0_r0 cp0_r1 cp0_r2 cp0_r3 cp0_r4 cp0_r5 cp0_r6 cp0_r7 cp0_r8 cp0_r9 cp0_r10 cp0_r11 cp0_r12 cp0_r13 cp0_r14 cp0_r15 Back cp0_r0 cp0_r1 cp0_r2 cp0_r3 cp0_r4 cp0_r5 cp0_r6 cp0_r7 MS Thesis 31

32 SATINE Performance Estimation Simulation Environment Graphics Library Performance Result ARM SDT ARM Processor Integer Datapath Cache SATINE Profiler SATINE Instruction Set Simulator MS Thesis 32

33 SATINE Performance Estimation Estimated performance with ARM Times Geometry Stage Polygon Rate (K Poly/Sec) O b Zc ^ _ /=/: H Conventional RAMP-IV w/o BEQ 153 Times RAMP-IV w/ BEQ 153 RAMP-IV with MobileGL 1670 RAMP-IV with SATINE MS Thesis 33

34 SATINE Performance Estimation Estimated Layout Area 1.48 times of ARM9 Estimated Power Consumption MS Thesis 34

35 SATINE Performance Estimation Performance Comparison Geometry Performance (KPolygons/mW) Z-3D MBX SATINE MS Thesis 35

36 Outline Introduction Design and Implementation of Bandwidth Equalizer Mobile Graphics Library Proposed 3D Geometry Coprocessor SATINE Conclusion and Further Works MS Thesis 36

37 Conclusion and Further Works Conclusion Efficient interface block for portable 3D processor was implemented 0.16 um DRAM process 3.5 mw power consumption with reduced communication cost Mobile graphics library was developed by system level simulation for RAMP-IV portable multimedia processor 153K polygon/sec with fixed point arithmetic 3D Geometry coprocessor suitable for mobile applications was proposed 1.67M polygon/sec Further work Hardware implementation of proposed architecture MS Thesis 37

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System