A Cost Effective Spatial Redundancy with Data-Path Partitioning. Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST

Size: px

Start display at page:

Download "A Cost Effective Spatial Redundancy with Data-Path Partitioning. Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST"

Myra Cobb
5 years ago
Views:

1 A Cost Effective Spatial Redundancy with Data-Path Partitioning Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST 1

2 Outline Introduction Data-path Partitioning for a dependable processor Simple Multiplexing (DPSM) Compressed Multiplexing (DPCM) Evaluation Conclusions 2

3 Introduction Dependability of computer systems is one of the most important design constrains!! The field where high reliability is demanded Online transaction processing Artificial satellite Medical treatment Traffic control Electric money and private information 3

4 Fault Period of time Caused by Permanent fault long Internal factors Destruction of semiconductor junction Short circuit Disconnection Temporal fault very short External factors Unexpected temperature Vibration Alpha particle Cosmic ray 4

5 Conventional fault detection Temporal Redundancy Spatial Redundancy How to detect Execution Time Hardware Cost Detectable Fault Temporal Fault Temporal Fault Permanent Fault 5

6 Our Goal Execution Time Hardware Cost Detectable Fault Temporal Redundancy Temporal Fault Spatial Redundancy Temporal Fault Permanent Fault Supporting Spatial Redundancy without increasing the hardware cost!! 6

7 Approach: Data-Path Partitioning Data-Path Partitioning for a dependable processor: Data-path is partitioned some narrow-width datapath An instruction is executed in each partitioned datapath in parallel Comparing the execution results generated from the partitioned data-path Fault is detected!! 7

8 Approach: Data-Path Partitioning Implementation alternatives Simple Multiplexing (DPSM) Compressed Multiplexing (DPCM) Assumptions Baseline Processor has a 32-bit Data-Path Each 32-bit data execution can be completed in 1 clock cycle Degree of redundancy to be realized is two or four 32-bit data-path 1 (SR1) 32-bit data-path 16-bit data-path 2 (SR2) 8-bit data-path 4 (SR4) 8

9 Simple Multiplexing (DPSM) 32bit 1[SR1] 32bit 16bit 2 [SR2] 8bit 4 [SR4] 16bit 16bit 8bit 8bit 8bit 8bit 3CC Instruction1 6CC Instruction2 Instruction3 CC: Clock Cycle Increasing Execution time!! Detecting Permanent fault!! 12CC 9

10 Effective Bit-width The data which instruction uses Data-path bit-width *D. Brooks and M. Martonosi. Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance. HPCA, pp.13-22, Bit-width to be unnecessary for each instruction execution Bit-width to be required for each instruction execution Effective bit-width Many applications have small effective bit-width!! In SPECint95, 50% of instructions have both operands less than or equal to 16 bits* Media applications mainly deal with 8-bit pixel value 10

11 Compressed Multiplexing (DPCM) Effective bit-width Bit-width of the partitioned data-path 11

12 12 Coverage P C Instruction Memory Data Memory read address Sign Extend Register File read register1 read register2 write register data read data1 read data2 Instruction M U X M U X M U X ALU result Shift left read adress write data data Add Add result

13 Evaluation Evaluation Purpose Primary evaluation for the impact of Data-Path Partitioning on processor performance Experimental setup Simulator SimpleScalar(ver.3.0d) Instruction-level simulation Benchmark program SPEC2000 benchmark suite 164.gzip, 175.vpr, 176.gcc, 181.mcf 197.parser, 255.vortex, 256.bzip Input: small input data set Assumption Perfect cache 13

14 Execution time IC : Instruction Count CPI : Clock Cycle Per Instruction 1 CCT : Clock Cycle Time fixed value SR2:IC org + IC gt16b 1 SR4:IC org + IC gt8b 3 IC org :The number of instructions with 32-bit data-path IC gt16b : The number of instructions with Effective bit-width 16 IC gt8b : The number of instructions with Effective bit-width 8 14

15 Execution time overhead (DPCM) 4.0 SR2 SR Normalized execution time 164.gzip 175.vpr 176.gcc 181.mcf 197.parser 255.vortex 256.bzip benchmark

16 Breakdown for compressed instruction Branch load/store ALU Others Execution Frequency 18.23% 37.32% 41.59% 2.86% % of Compressible Instruction SR2 SR % 75.61% 0.00% 0.00% 52.62% 36.32% 0.00% 0.00% Base address is set to a large value!! Replacement of data by a compiler is necessary to reduce execution time increase 16

17 Conclusions This work Primary evaluation for the impact of data-path partitioning on processor performance Simple Multiplexing (DPSM) Compressed Multiplexing (DPCM) DPSM DPCM Normalized execution time SR SR Future work Establishing the complete microarchitecture to support the proposed idea 17

18 Thank you!! 18

Low-Complexity Reorder Buffer Architecture*

Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower