Ultra Low Power (ULP) Challenge in System Architecture Level - New architectures for 45-nm, 32-nm era ASP-DAC 2007 Designers' Forum 9D: Panel Discussion: Top 10 Design Issues Toshinori Sato (Kyushu U)
Global View Helps ULP Design Only to reduce power is not enough Variation tolerance, Soft error tolerance, and still High performance High-level consideration of power reduction is required Software optimization increases flexibilities of design Speculation can create new frontiers for optimizations Architecture selection can change characteristics of circuits Variation-aware (VA) ULP design examples
VA ULP Cache Architecture Process variations create ultra leaky transistors Fortunately, leakage current of an SRAM cell depends on the logic value stored Store leakage-safe values on entering into standby mode Power saving with negligible performance penalty 1 transistor out of 512K-bit SRAM Large Leak Mean Large Delay 5σ Vth =0.3V 100 tr. Threshold Voltage Delay is 2x of the average Leakage is 1,400x higher than average! 330x ±σ: 68.3% ±2σ: 95.4% ±3σ: 99.7% ±4σ: 99.9936% 1.8x ±5σ: 99.99994% M. Goudarzi: A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation, Session 9A @Room 411+412, just NOW.
VA ULP Cache Architecture Process variations create ultra leaky transistors Fortunately, leakage current of an SRAM cell depends on the logic value stored Store leakage-safe values on entering into standby mode Power saving with negligible performance penalty 0 1 2 3 4 5 6 7 4-way set-associative cache memory tag0 data0 tag1 data1 tag2 data2 tag3 data3 0110100101 0110100101 1110110011 1110110011 1-leaky cells 0-leaky cells M. Goudarzi: A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation, Session 9A @Room 411+412, just NOW.
VA ULP Cache Architecture Process variations create ultra leaky transistors Fortunately, leakage current of an SRAM cell depends on the logic value stored Store leakage-safe values on entering into standby mode Power saving with negligible performance penalty 1500 Power saving (nw) 1200 900 600 300 ARM920 M32R 0 0 10 20 30 40 50 60 70 Performance loss (ns) M. Goudarzi: A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation, Session 9A @Room 411+412, just NOW.
VA ULP Logic Architecture Typical-case design Optimizing not for worst cases but for typical cases Combination of two circuits Examples Main for power reduction Checker for correctness Razor FF Canary FF Potential of over 30% of energy reduction Ltd. soft error tolerance clk logic stage delayed clk Razor FF logic stage error comparator D. Ernst: Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation, MICRO, 2003. T. Sato: A Simple Flip-Flop Circuit for Typical-Case Designs for DFM, ISQED, 2007.
VA ULP Logic Architecture Typical-case design Optimizing not for worst cases but for typical cases Combination of two circuits Examples Main for power reduction Checker for correctness Razor FF Canary FF Potential of over 30% of energy reduction Ltd. soft error tolerance 40% 30% 20% 10% 0% clk logic stage clk delay Canary FF logic stage trigger comparator gzip vpr gcc parser vortex bzip2 D. Ernst: Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation, MICRO, 2003. T. Sato: A Simple Flip-Flop Circuit for Typical-Case Designs for DFM, ISQED, 2007.
VA ULP CMP Architecture Statistical characteristics of circuit delay As the number of critical paths increases, the mean delay increases and the standard deviation decreases CMP with simple CPU cores reduces critical path delay, and increases the number of critical paths is more variation-tolerant 1.2 1 0.8 0.6 0.4 0.2 0 100 x2 5 6 7 8 9 10 11 M. Hashimoto: Increase in Delay Uncertainty by Performance Optimization, ISCAS, 2001. T. Sato: Architectures Study beyond Physical Limitations, NGArch Forum, July 2006.