EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2
Major Roadblocks. Power is a limiting factor But performance is what sells 2. Robustness issues Variations, soft errors, coupling 3. Cost of integrated circuits is increasing Mask costs are more than $M in 90nm technology 4. Flexibility is needed With this cost increases, low volume ASICs are too expensive 5. Managing complexity How to design a 3 billion transistor chip? And what to use all these transistors for? 3 Mask Costs 2500 2000 45nm Cost [in $00] 500 00 65nm 90nm 0.3 µm 500 0.8 µm 0.25 µm 0 996 998 2000 2002 2004 2006 2008 Mask costs follow Moore s law as well 4
Cost Increases Lithography is more complex Like painting a cm line with a 3cm brush 93nm 57nm laser Cost of exposure system Cost of proximity correction, phase shift masks Cost of mask repair But mask costs drop in subsequent years Economic settings for maskless lithography Design costs increase with added complexity Chip starts ~$M 5 The Interconnect Scare 6
Productivity Trends,000,000,000,000,000,000 Complexity 0,000 0,000,000 0 0. 0.0 Logic Tr./Chip Tr./Staff Month. x x x x x x x x 58%/Yr. compounded Complexity growth rate 2%/Yr. compound Productivity growth rate 0,000,000,000,000,000,000 0,000,000,000 0 0. 0.00 0.0 98 983 985 987 989 99 993 995 997 999 200 2003 2005 2007 Logic Transistor per Chip(M) 2009 Productivity (K) Trans./Staff - Mo. Source: Sematech Complexity outpaces design productivity 7 Device Variations Control of minimum features does not track feature scaling Relative device/interconnect variations increase Sources: Random dopant fluctuations Feature size, oxide thickness variations Effects: Speed Power, primary leakage Yield 8
Impact of Variations Wafer-to-wafer Die-to-die Within-die Frequency (Normalized) Leakage (Normalized) 9 EECS4 vs. EE24 EECS 4: Basic transistor and circuit models Basic circuit design styles Project: practical datapath design for performance EE 24: Design under constraints: power-constrained, flexible, robust, Transistor models of varying accuracy Making design decisions, choosing styles of implementation Analysis/design project All in perspective of scaling, power-constrained design
Goals of Technology Scaling Design new devices to be: Faster? Smaller? Lower power? Add new features? Bottom line: Growth of semiconductor industry has been fueled by the ever cheaper transistor Want to sell more functions (transistors) per chip for the same money Build same products cheaper, sell the same part for less money Technology Scaling Benefits of scaling the dimensions by 30%: Reduce gate delay by 30% (increase operating frequency by 43%) Double transistor density Reduce energy per transition by 65% (50% power savings @ 43% increase in frequency Technology generation spans 2-3 years, but µp speed doubles every generation (not increased only by 43%), IEEE Micro, July 999. 2
Moore s Law in Microprocessors Transistors (MT) 00 0 0. 0.0 0.00 2X growth in.96 years! 286 386 8085 8086 4004 8008 8080 Pentium proc 486 970 980 990 2000 20 Transistors on Lead Microprocessors double every 2 years 3 Moore s Law - Logic Density 00 Logic Transistors/mm 2 Logic Density 0 386 i860 Pentium II (R) 486 Pentium Pro (R) Pentium (R).5µ.0µ 0.8µ 0.6µ Source: Intel 0.35µ 0.25µ 0.8µ 0.3µ 2x trend Shrinks and compactions meet density goals New micro-architectures drop density 4
Die Size Growth 0 Die size (mm) 8080 8085 8008 4004 8086 286386 486 Pentium proc ~7% growth per year ~2X growth in years 970 980 990 2000 20 Die size grows by 4% to satisfy Moore s Law 5 Frequency Frequency (Mhz) 000 00 0 0. Doubles every 2 years Pentium proc 486 8085 386 8086 286 8080 8008 4004 970 980 990 2000 20 Lead Microprocessors frequency doubles every 2 years 6
Processor Frequency Trend,000 Intel IBM Power PC DEC Gate delays/clock Processor freq scales by 2X per generation 0 Mhz,000 0 987 386 989 486 99 2264S 264A 2264 264A Pentium(R) 264 II 266 MPC750 604 604+ Pentium Pro 60, 603 (R) Pentium(R) 993 995 997 999 Frequency doubles each generation Number of gates/clock reduce by 25% 200 2003 2005 Gate Delays/ Clock V.De, ISLPED 99 7 Power 0 Power (Watts) 8085 8080 8008 4004 8086 286 386 486 Pentium proc 0. 97 974 978 985 992 2000 Lead Microprocessors power continues to increase 8
Obeying Moore s Law Transistors (MT) 000 00 0 0. 0.0 0.00 286 386 8085 4004 8008 8080 8086 900M 425M 200M 486 Pentium proc.8b 970 980 990 2000 20 200M--.8B transistors on the Lead Microprocessor 9 If Die Size Increases Die size (mm) 0 4 36 32 28 486 Pentium proc 386 8080 286 8086 8085 ~7% growth per year 8008 ~2X growth in years 4004 970 980 990 2000 20 Die size will have to grow to 30-40mm 20
Frequency Will Increase Frequency (Mhz) 0000 000 00 0 0. 30GHz 4GHz 6.5GHz 3 Ghz Pentium proc 486 8085 286 386 8086 8080 8008 4004 970 980 990 2000 20 3-30Ghz Frequency 2 Supply Voltage Will Reduce.00 Supply Voltage (V).00 0. 970 980 990 2000 20 Only 5% Vcc reduction to meet frequency demand 22
Processor Power 0 Max Power (Watts) 386 486 386 Pentium II (R) Pentium Pro (R) Pentium(R) 486 Pentium(R) MMX?.5µ µ 0.8µ 0.6µ 0.35µ 0.25µ 0.8µ 0.3µ Lead processor power increases every generation Compactions provide higher performance at lower power Source: Intel 23 Active Power Scaling. If Power Vcc = 0.7, and = CV 2 f = ( 0.7 Freq.4 = ( ), 0.7 2 ) (0.7 2 ) ( 0.7 ) =.3 2. If Power Vcc = 0.7, and = CV 2 f = ( 0.7 Freq =.4 2 2, ) (0.7 2 ) (2) =.8 3. If Power Vcc = 0.85, and = CV 2 f = ( 0.7 Freq.4 = 2 2, ) (0.85 2 ) (2) = 2.7 24
Leakage Power Increases Ioff (na/u) 0,000,000,000 0 0.8u 0.3u 0.u 0.07u 0.05u Drain Leakage Power 50% 40% 30% 20% % 2W 88W 400W.7KW 8KW 30 40 50 60 70 80 90 0 Temp (C) 0% 2000 2002 2004 2006 2008 Drain leakage will have to increase to meet freq demand Results in excessive leakage power 25 Power Will Be a Problem Power (Watts) 0000 000 00 0 0. 8085 8086286 386 486 4004 80088080 Pentium proc 8KW 5KW.5KW 500W 97 974 978 985 992 2000 2004 2008 Power delivery and dissipation will be prohibitive 26
A Closer Look at the Power Power (Watts) 0,000,000,000 0 Should be....5kw 500W 35W 50W Will be... 5KW 90W 8KW 220W 2002 2004 2006 2008 27 Power Density Will Increase Power Density (W/cm2) 000 00 0 4004 8008 8080 Sun s Surface Rocket Nozzle Nuclear Reactor 8086 Hot Plate 8085 286 386 486 Pentium proc 970 980 990 2000 20 Power density too high to keep junctions at low temp 28
Power Delivery Challenges Icc (amp),000.00 0.00.00 Pentium proc 486 L(di/dt)/Vdd.00 8086 386 8080 286 0. 8085 4004 8008 0.0 970 980 990 2000 20.E+07.E+06.E+05.E+04.E+03.E+02.E+0 8086 386.E+00.E-0 8080 286 8085.E-02.E-03 4004 8008.E-04 Pentium proc 486 970 980 990 2000 20 High supply currents at low voltage: Challenges: IR drop and L(di/dt) noise 29 Moore s Law Challenge Double transistors every two years (Obey Moore s law) Stay within the expected power trend Still deliver the expected performance Power-limited scaling regime 30
Example: 20nm Power Density With Vdd ~.2V, 20nm devices are quite fast. FO4 delay is <5ps If we continue with today s architectures, we could run digital circuits at 30GHz But - we will end up with 20kW/cm 2 power density. Lower supply to 0.6V, we are down to 5kW/cm 2. Speeds will be a bit lower, too, FO4 = ps, lowering the frequencies to ~GHz [Tang, ISSCC 0], and lowering power Assume that a high performance DG or bulk FET can be designed with kw/cm 2, with FO4 = ps [Frank, Proc IEEE, 3/0] 3 Power is a Limiting Factor If we have 2cm x 2cm die in a high-performance microprocessor, we will end up with 4kW power dissipation. If our power has to be limited to 200W, we can afford to have only 5% of these devices with 0.6V supply on the die, given that nothing else dissipates power. 32
Restrict transistor leakage Frequency (Mhz) 000 00 0 Pentium proc 7 GHz 5.5 GHz 4 GHz 2.5 Ghz 386 486 985 990 995 2000 2005 20 Reduce leakage Frequency will not double every 2 years 33 Do Not Increase the Die Size Die Size (mm) 45 40 35 30 25 20 5 5 0 Reduce die size Will be... 2000 2002 2004 2006 2008 Restrict die size to ~ 20 mm 34
Possible Scenario Example: 0.5 % of devices will be of highest performance 35% is leakage (assume: 20% drain, % gate, 5% drain-to-body) 65% is active power, if just 0.5% of these CV 2 = 3W, leakage 7W How would other 99.5% devices that populate the 2cmx2cm die look like? 35 Microprocessors Today 20nm Cache Cache µp Core 2GHz µp Core Dedicated Logic 7- GHz 36
Add Dedicated Datapath Can execute e.g. DIVX decoder, graphics Vdd Logic Block Freq = Vdd = Throughput = Power = Area = Pwr Den = Vdd/2 Logic Block Logic Block Freq = 0.5 Vdd = 0.5 Throughput = Power = 0.25 Area = 2 Pwr Den = 0.25 Leakage Curr. = 2 Will run at x lower frequency, at 0.5-0.7 of the processor V DD = 0.25-0.35V Thresholds for critical paths V Th = 50mV Need leakage power management another threshold or control of V T 37 Power Density is Reduced Power Density (W/cm2) 000 00 0 4004 8008 8080 8086 Rocket Nozzle Nuclear Reactor Hot Plate 8085 286 386 486 Pentium proc 970 980 990 2000 20 Full chip power density is reduced But local power density will be high 38
How About Low Power Devices? Cell Phone Small Signal RF Power RF Units Digital Cellular Market (Phones Shipped) 996 997 998 999 2000 48M 86M 62M 260M 435M Power Management Analog Baseband Digital Baseband (DSP + MCU) (data from Texas Instruments) 39 Power Trends Power (W) 0 0. x4 / 3years 0.0 80 85 90 95 Published at ISSCC [Kuroda] 40
Shannon Beats Moore s Law 000000 00000 0000 000 00 0 2G Algorithmic Complexity (Shannon s Law) 3G Processor Performance (~Moore s Law) G Battery Capacity 980 984 988 992 996 2000 2004 Source: Data compiled from multiple sources 2008 202 206 2020 4 Architecture Choices.5-5 MIPS/mW Flexibility 0-00 MOPS/mW -0 MOPS/mW Embedded FPGA Embedded Processor DSP (lparm) (e.g. TI 320CXX ) Reconfigurable Processors (Maia) Factor of 0-00 Direct Mapped Hardware Area or Power 42
Device Challenges Summary Conventional planar CMOS continues as long as possible Transistor gets smaller, faster and (plenty) leakier Off-current and gate-current will both increase to meet design limit Circuit design techniques needed to address standby power dissipation Deep sub-micron effects (VT-variation, drain-induced effects, hotcarrier) impact predictability Non-planar transistors separate shrinks from performance improvements Dual-gate devices help to suppress DSM effects 43